4  Adverse Event Analysis

Objectives

This guide demonstrates: - StudyPlan-driven batch generation for production use - Three-step pipeline architecture for flexibility and extensibility - Standardized ARD structure for consistent data handling

4.1 Setup

First, let’s import the required packages and load our study plan.

import os
import sys
from pathlib import Path
import polars as pl
import rtflite as rtf 

# Add src to path for imports
sys.path.insert(0, 'src')

from rtflite import LibreOfficeConverter
try:
    converter = LibreOfficeConverter()
except Exception:
    converter = None
    print("WARNING: LibreOffice not found. PDF conversion will be skipped.")
from csrlite import load_plan, study_plan_to_ae_summary
from csrlite.ae.ae_summary import ae_summary_ard, ae_summary_df, ae_summary_rtf, ae_summary

4.2 StudyPlan-Driven Workflow

For production environments, define all analyses in a YAML file following the Review-Oriented Development (ROD) philosophy.

4.2.1 Load Study Plan

The study plan contains population definitions, observation periods, parameters, and data source specifications:

# Load study plan from YAML
study_plan = load_plan("studies/xyz123/yaml/plan_xyz123.yaml")
study_plan.get_plan_df().filter(pl.col("analysis") == "ae_summary")
2026-02-03 15:25:53,900 - csrlite.common.plan - INFO - Successfully loaded dataset 'adsl' from 'studies/xyz123/yaml/../../../data/adsl.parquet'
2026-02-03 15:25:53,903 - csrlite.common.plan - INFO - Successfully loaded dataset 'adae' from 'studies/xyz123/yaml/../../../data/adae.parquet'
2026-02-03 15:25:53,904 - csrlite.common.plan - INFO - Successfully loaded dataset 'adie' from 'studies/xyz123/yaml/../../../data/adie.parquet'
2026-02-03 15:25:53,906 - csrlite.common.plan - INFO - Successfully loaded dataset 'adpd' from 'studies/xyz123/yaml/../../../data/adpd.parquet'
shape: (2, 5)
analysis population observation parameter group
str str str str str
"ae_summary" "apat" "week12" "any;rel;ser" "trt01a"
"ae_summary" "apat" "week24" "any;rel;ser" "trt01a"

4.2.2 Batch Generate All Outputs

The study_plan_to_ae_summary function automatically generates RTF outputs for all AE summary analyses defined in the plan:

output_files = study_plan_to_ae_summary(study_plan)
studies/xyz123/rtf/ae_summary_apat_week12_any_rel_ser.rtf
studies/xyz123/rtf/ae_summary_apat_week24_any_rel_ser.rtf

4.2.2.1 Week 12

4.2.2.2 Week 24

How it works: 1. Reads the expanded plan DataFrame 2. Filters for analysis == "ae_summary" 3. For each row, extracts population/observation/parameter/group keywords 4. Uses StudyPlanParser to convert keywords to DataFrames and filters 5. Calls ae_summary() to generate each RTF file 6. Returns list of generated file paths

When to use: - Production environments with multiple analyses - YAML-first workflow (specifications drive code) - Need reproducibility and traceability

5 Design Philosophy

5.1 Three-Step Pipeline Architecture

The AE summary analysis follows a three-step pipeline that separates concerns:

  1. ae_summary_ard: Generate Analysis Results Data (ARD)
    • Input: Raw datasets with filters
    • Output: Standardized long-format DataFrame with columns: __index__, __group__, __value__
    • Purpose: Data processing and statistical computation
  2. ae_summary_df: Transform to display format
    • Input: ARD (long format)
    • Output: Wide-format DataFrame (groups as columns)
    • Purpose: Reshape data for table layout
  3. ae_summary_rtf: Generate formatted output
    • Input: Display DataFrame
    • Output: RTFDocument object
    • Purpose: Apply formatting and styling

5.2 Why This Separation?

  • Testability: Each step can be tested independently
  • Reusability: ARD can be transformed to different output formats (CSV, Excel, HTML)
  • Extensibility: Easy to add new output formats without touching analysis logic
  • Debugging: Inspect intermediate data at each stage

5.3 ARD Data Structure

All *_ard functions return a standardized long-format DataFrame:

  • __index__: Row labels (e.g., “Any Adverse Events”, “Serious Adverse Events”)
  • __group__: Treatment groups (e.g., “Placebo”, “Treatment A”)
  • __value__: Formatted values (e.g., “12 (34.5%)”)

This structure enables consistent data handling across different analyses.

5.4 Function Wrapper

The ae_summary function wraps all three steps for convenience:

ae_summary = ae_summary_ard -> ae_summary_df -> ae_summary_rtf -> write to file
Extension Points

To extend functionality: - Add new statistics: Modify ae_summary_ard - Change table layout: Modify ae_summary_df - Add new output formats: Create new ae_summary_* function using ARD - Batch processing: Use study_plan_to_ae_summary pattern

5.5 Complete Pipeline

The ae_summary function provides a complete pipeline that executes all three steps and writes the RTF output to a file:

adsl = pl.read_parquet("data/adsl.parquet")
adae = pl.read_parquet("data/adae.parquet")

ae_summary(
    population=adsl,
    observation=adae,
    population_filter="SAFFL = 'Y'",
    observation_filter=None,
    id=("USUBJID", "Subject ID"),
    group=("TRT01A", "Treatment Group"),
    variables=[
        ("TRTEMFL = 'Y'", "Any Adverse Events"),
        ("AESER = 'Y'", "Serious Adverse Events")
    ],
    title=[
        "Analysis of Adverse Event Summary",
        "(Safety Analysis Population)"
    ],
    footnote=["Every participant is counted a single time for each applicable row and column."],
    source=["Source: ADSL and ADAE datasets"],
    output_file="studies/xyz123/rtf/ae_summary.rtf",
    total=True,
    missing_group="error"
)
studies/xyz123/rtf/ae_summary.rtf
'studies/xyz123/rtf/ae_summary.rtf'

5.6 Step-by-Step Pipeline

This section demonstrates each step of the pipeline individually, allowing you to inspect intermediate outputs and understand the data transformation at each stage.

5.6.1 Step 1: Generate Analysis Results Data (ARD)

The ae_summary_ard function processes raw data and generates standardized long-format output:

Key Parameters: - population_filter: SQL WHERE clause to subset subjects (e.g., "SAFFL = 'Y'" for safety population) - observation_filter: SQL WHERE clause to subset observations (can be None) - group: Tuple of (variable_name, label) for treatment grouping - variables: List of tuples [(filter, label)] defining which events to count

_ard = ae_summary_ard(
    population=adsl,
    observation=adae,
    population_filter="SAFFL = 'Y'",
    observation_filter=None,
    group=("TRT01A", "Treatment Group"),
    variables=[
        ("TRTEMFL = 'Y'", "Any Adverse Events"),
        ("AESER = 'Y'", "Serious Adverse Events")
    ],
    id=("USUBJID", "Subject ID"),
    total=True,
    missing_group="error"
)

_ard
shape: (16, 3)
__index__ __group__ __value__
enum enum str
"Participants in population" "Placebo" "86"
"Participants in population" "Xanomeline High Dose" "84"
"Participants in population" "Xanomeline Low Dose" "84"
"Participants in population" "Total" "254"
"" "Placebo" ""
"Any Adverse Events" "Total" "218 ( 85.8)"
"Serious Adverse Events" "Placebo" "  0 (  0.0)"
"Serious Adverse Events" "Xanomeline High Dose" "  2 (  2.4)"
"Serious Adverse Events" "Xanomeline Low Dose" "  1 (  1.2)"
"Serious Adverse Events" "Total" "  3 (  1.2)"

Output Structure: Long format with __index__, __group__, __value__ columns.

5.6.2 Step 2: Transform to Display Format

The ae_summary_df function pivots the ARD to wide format where groups become columns:

_df = ae_summary_df(_ard)
_df
shape: (4, 5)
__index__ Placebo Xanomeline High Dose Xanomeline Low Dose Total
enum str str str str
"Participants in population" "86" "84" "84" "254"
"" "" "" "" ""
"Any Adverse Events" " 65 ( 75.6)" " 76 ( 90.5)" " 77 ( 91.7)" "218 ( 85.8)"
"Serious Adverse Events" "  0 (  0.0)" "  2 (  2.4)" "  1 (  1.2)" "  3 (  1.2)"

Output Structure: Wide format with __index__ as row labels and treatment groups as columns.

5.6.3 Step 3: Generate RTF Output

The ae_summary_rtf function creates a formatted RTF document:

ae_summary_rtf(
    _df,
    title=[
        "Analysis of Adverse Event Summary",
        "(Safety Analysis Population)"
    ],
    footnote=["Every participant is counted a single time for each applicable row and column."],
    source=["Source: ADSL and ADAE datasets"],
    col_rel_width=[4, 2, 2, 2, 2]  # Optional: defaults to auto-calculated widths
).write_rtf("studies/xyz123/rtf/ae_summary_step.rtf")
studies/xyz123/rtf/ae_summary_step.rtf

Output: RTFDocument object that can be written to file using .write_rtf().

6 Getting Started for Developers

6.1 Which Approach to Use?

Use StudyPlan-driven workflow (study_plan_to_ae_summary) when: - Working in production with validated YAML specifications - Need to generate multiple analyses at once - Want YAML as single source of truth

Use manual workflow (ae_summary) when: - Developing new analyses or debugging - Need one-off custom analyses - Want direct control over parameters

Use step-by-step workflow (individual functions) when: - Adding new output formats (e.g., Excel, HTML) - Debugging data transformations - Building custom analysis pipelines

6.2 Common Enhancement Patterns

Add new statistics to ARD: 1. Modify ae_summary_ard to add new columns to the long-format output 2. Ensure all values are formatted as strings in __value__ column 3. Add to __index__ categories in the correct order

Create new output format (e.g., Excel): 1. Create new function ae_summary_xlsx(df, ...) that takes display DataFrame 2. Apply Excel-specific formatting and styling 3. ARD and display transformation remain unchanged

Batch process with custom logic: 1. Follow study_plan_to_ae_summary pattern 2. Loop through plan rows 3. Use StudyPlanParser to extract filters and parameters 4. Call appropriate analysis functions