4 Adverse Event Analysis

Objectives

This guide demonstrates: - StudyPlan-driven batch generation for production use - Three-step pipeline architecture for flexibility and extensibility - Standardized ARD structure for consistent data handling

4.1 Setup

First, let’s import the required packages and load our study plan.

import os
import sys
from pathlib import Path
import polars as pl
import rtflite as rtf 

# Add src to path for imports
sys.path.insert(0, 'src')

from rtflite import LibreOfficeConverter
try:
    converter = LibreOfficeConverter()
except Exception:
    converter = None
    print("WARNING: LibreOffice not found. PDF conversion will be skipped.")

from csrlite import load_plan, study_plan_to_ae_summary
from csrlite.ae.ae_summary import ae_summary_ard, ae_summary_df, ae_summary_rtf, ae_summary

4.2 StudyPlan-Driven Workflow

For production environments, define all analyses in a YAML file following the Review-Oriented Development (ROD) philosophy.

4.2.1 Load Study Plan

The study plan contains population definitions, observation periods, parameters, and data source specifications:

# Load study plan from YAML
study_plan = load_plan("studies/xyz123/yaml/plan_xyz123.yaml")
study_plan.get_plan_df().filter(pl.col("analysis") == "ae_summary")

2026-02-03 15:25:53,900 - csrlite.common.plan - INFO - Successfully loaded dataset 'adsl' from 'studies/xyz123/yaml/../../../data/adsl.parquet'
2026-02-03 15:25:53,903 - csrlite.common.plan - INFO - Successfully loaded dataset 'adae' from 'studies/xyz123/yaml/../../../data/adae.parquet'
2026-02-03 15:25:53,904 - csrlite.common.plan - INFO - Successfully loaded dataset 'adie' from 'studies/xyz123/yaml/../../../data/adie.parquet'
2026-02-03 15:25:53,906 - csrlite.common.plan - INFO - Successfully loaded dataset 'adpd' from 'studies/xyz123/yaml/../../../data/adpd.parquet'

shape: (2, 5)

analysis	population	observation	parameter	group
str	str	str	str	str
"ae_summary"	"apat"	"week12"	"any;rel;ser"	"trt01a"
"ae_summary"	"apat"	"week24"	"any;rel;ser"	"trt01a"

4.2.2 Batch Generate All Outputs

The study_plan_to_ae_summary function automatically generates RTF outputs for all AE summary analyses defined in the plan:

output_files = study_plan_to_ae_summary(study_plan)

studies/xyz123/rtf/ae_summary_apat_week12_any_rel_ser.rtf
studies/xyz123/rtf/ae_summary_apat_week24_any_rel_ser.rtf

4.2.2.1 Week 12

4.2.2.2 Week 24

How it works: 1. Reads the expanded plan DataFrame 2. Filters for analysis == "ae_summary" 3. For each row, extracts population/observation/parameter/group keywords 4. Uses StudyPlanParser to convert keywords to DataFrames and filters 5. Calls ae_summary() to generate each RTF file 6. Returns list of generated file paths

When to use: - Production environments with multiple analyses - YAML-first workflow (specifications drive code) - Need reproducibility and traceability

5 Design Philosophy

5.1 Three-Step Pipeline Architecture

The AE summary analysis follows a three-step pipeline that separates concerns:

ae_summary_ard: Generate Analysis Results Data (ARD)
- Input: Raw datasets with filters
- Output: Standardized long-format DataFrame with columns: __index__, __group__, __value__
- Purpose: Data processing and statistical computation
ae_summary_df: Transform to display format
- Input: ARD (long format)
- Output: Wide-format DataFrame (groups as columns)
- Purpose: Reshape data for table layout
ae_summary_rtf: Generate formatted output
- Input: Display DataFrame
- Output: RTFDocument object
- Purpose: Apply formatting and styling

5.2 Why This Separation?

Testability: Each step can be tested independently
Reusability: ARD can be transformed to different output formats (CSV, Excel, HTML)
Extensibility: Easy to add new output formats without touching analysis logic
Debugging: Inspect intermediate data at each stage

5.3 ARD Data Structure

All *_ard functions return a standardized long-format DataFrame:

__index__: Row labels (e.g., “Any Adverse Events”, “Serious Adverse Events”)
__group__: Treatment groups (e.g., “Placebo”, “Treatment A”)
__value__: Formatted values (e.g., “12 (34.5%)”)

This structure enables consistent data handling across different analyses.

5.4 Function Wrapper

The ae_summary function wraps all three steps for convenience:

ae_summary = ae_summary_ard -> ae_summary_df -> ae_summary_rtf -> write to file

Extension Points

To extend functionality: - Add new statistics: Modify ae_summary_ard - Change table layout: Modify ae_summary_df - Add new output formats: Create new ae_summary_* function using ARD - Batch processing: Use study_plan_to_ae_summary pattern

5.5 Complete Pipeline

The ae_summary function provides a complete pipeline that executes all three steps and writes the RTF output to a file:

adsl = pl.read_parquet("data/adsl.parquet")
adae = pl.read_parquet("data/adae.parquet")

ae_summary(
    population=adsl,
    observation=adae,
    population_filter="SAFFL = 'Y'",
    observation_filter=None,
    id=("USUBJID", "Subject ID"),
    group=("TRT01A", "Treatment Group"),
    variables=[
        ("TRTEMFL = 'Y'", "Any Adverse Events"),
        ("AESER = 'Y'", "Serious Adverse Events")
    ],
    title=[
        "Analysis of Adverse Event Summary",
        "(Safety Analysis Population)"
    ],
    footnote=["Every participant is counted a single time for each applicable row and column."],
    source=["Source: ADSL and ADAE datasets"],
    output_file="studies/xyz123/rtf/ae_summary.rtf",
    total=True,
    missing_group="error"
)

studies/xyz123/rtf/ae_summary.rtf

'studies/xyz123/rtf/ae_summary.rtf'

5.6 Step-by-Step Pipeline

This section demonstrates each step of the pipeline individually, allowing you to inspect intermediate outputs and understand the data transformation at each stage.

5.6.1 Step 1: Generate Analysis Results Data (ARD)

The ae_summary_ard function processes raw data and generates standardized long-format output:

Key Parameters: - population_filter: SQL WHERE clause to subset subjects (e.g., "SAFFL = 'Y'" for safety population) - observation_filter: SQL WHERE clause to subset observations (can be None) - group: Tuple of (variable_name, label) for treatment grouping - variables: List of tuples [(filter, label)] defining which events to count

_ard = ae_summary_ard(
    population=adsl,
    observation=adae,
    population_filter="SAFFL = 'Y'",
    observation_filter=None,
    group=("TRT01A", "Treatment Group"),
    variables=[
        ("TRTEMFL = 'Y'", "Any Adverse Events"),
        ("AESER = 'Y'", "Serious Adverse Events")
    ],
    id=("USUBJID", "Subject ID"),
    total=True,
    missing_group="error"
)

_ard

shape: (16, 3)

__index__	__group__	__value__
enum	enum	str
"Participants in population"	"Placebo"	"86"
"Participants in population"	"Xanomeline High Dose"	"84"
"Participants in population"	"Xanomeline Low Dose"	"84"
"Participants in population"	"Total"	"254"
""	"Placebo"	""
…	…	…
"Any Adverse Events"	"Total"	"218 ( 85.8)"
"Serious Adverse Events"	"Placebo"	" 0 ( 0.0)"
"Serious Adverse Events"	"Xanomeline High Dose"	" 2 ( 2.4)"
"Serious Adverse Events"	"Xanomeline Low Dose"	" 1 ( 1.2)"
"Serious Adverse Events"	"Total"	" 3 ( 1.2)"

Output Structure: Long format with __index__, __group__, __value__ columns.

5.6.2 Step 2: Transform to Display Format

The ae_summary_df function pivots the ARD to wide format where groups become columns:

_df = ae_summary_df(_ard)
_df

shape: (4, 5)

__index__	Placebo	Xanomeline High Dose	Xanomeline Low Dose	Total
enum	str	str	str	str
"Participants in population"	"86"	"84"	"84"	"254"
""	""	""	""	""
"Any Adverse Events"	" 65 ( 75.6)"	" 76 ( 90.5)"	" 77 ( 91.7)"	"218 ( 85.8)"
"Serious Adverse Events"	" 0 ( 0.0)"	" 2 ( 2.4)"	" 1 ( 1.2)"	" 3 ( 1.2)"

Output Structure: Wide format with __index__ as row labels and treatment groups as columns.

5.6.3 Step 3: Generate RTF Output

The ae_summary_rtf function creates a formatted RTF document:

ae_summary_rtf(
    _df,
    title=[
        "Analysis of Adverse Event Summary",
        "(Safety Analysis Population)"
    ],
    footnote=["Every participant is counted a single time for each applicable row and column."],
    source=["Source: ADSL and ADAE datasets"],
    col_rel_width=[4, 2, 2, 2, 2]  # Optional: defaults to auto-calculated widths
).write_rtf("studies/xyz123/rtf/ae_summary_step.rtf")

studies/xyz123/rtf/ae_summary_step.rtf

Output: RTFDocument object that can be written to file using .write_rtf().

6 Getting Started for Developers

6.1 Which Approach to Use?

Use StudyPlan-driven workflow (study_plan_to_ae_summary) when: - Working in production with validated YAML specifications - Need to generate multiple analyses at once - Want YAML as single source of truth

Use manual workflow (ae_summary) when: - Developing new analyses or debugging - Need one-off custom analyses - Want direct control over parameters

Use step-by-step workflow (individual functions) when: - Adding new output formats (e.g., Excel, HTML) - Debugging data transformations - Building custom analysis pipelines

6.2 Common Enhancement Patterns

Add new statistics to ARD: 1. Modify ae_summary_ard to add new columns to the long-format output 2. Ensure all values are formatted as strings in __value__ column 3. Add to __index__ categories in the correct order

Create new output format (e.g., Excel): 1. Create new function ae_summary_xlsx(df, ...) that takes display DataFrame 2. Apply Excel-specific formatting and styling 3. ARD and display transformation remain unchanged

Batch process with custom logic: 1. Follow study_plan_to_ae_summary pattern 2. Loop through plan rows 3. Use StudyPlanParser to extract filters and parameters 4. Call appropriate analysis functions

--- title: "Adverse Event Analysis" --- ::: {.callout-note} ## Objectives This guide demonstrates: - StudyPlan-driven batch generation for production use - Three-step pipeline architecture for flexibility and extensibility - Standardized ARD structure for consistent data handling ::: ## Setup First, let's import the required packages and load our study plan. ```{python} #| message: false import os import sys from pathlib import Path import polars as pl import rtflite as rtf # Add src to path for imports sys.path.insert(0, 'src') from rtflite import LibreOfficeConverter try: converter = LibreOfficeConverter() except Exception: converter = None print("WARNING: LibreOffice not found. PDF conversion will be skipped.") ``` ```{python} from csrlite import load_plan, study_plan_to_ae_summary from csrlite.ae.ae_summary import ae_summary_ard, ae_summary_df, ae_summary_rtf, ae_summary ``` ## StudyPlan-Driven Workflow For production environments, define all analyses in a YAML file following the Review-Oriented Development (ROD) philosophy. ### Load Study Plan The study plan contains population definitions, observation periods, parameters, and data source specifications: ```{python} # Load study plan from YAML study_plan = load_plan("studies/xyz123/yaml/plan_xyz123.yaml") study_plan.get_plan_df().filter(pl.col("analysis") == "ae_summary") ``` ### Batch Generate All Outputs The `study_plan_to_ae_summary` function automatically generates RTF outputs for all AE summary analyses defined in the plan: ```{python} output_files = study_plan_to_ae_summary(study_plan) ``` ```{python} #| echo: false for file in output_files: if converter: converter.convert(file, output_dir="docs/pdf/", format="pdf", overwrite=True) ``` #### Week 12 <embed src="pdf/ae_summary_apat_week12_any_rel_ser.pdf" style="width:100%; height:600px" type="application/pdf"> #### Week 24 <embed src="pdf/ae_summary_apat_week24_any_rel_ser.pdf" style="width:100%; height:600px" type="application/pdf"> **How it works:** 1. Reads the expanded plan DataFrame 2. Filters for `analysis == "ae_summary"` 3. For each row, extracts population/observation/parameter/group keywords 4. Uses `StudyPlanParser` to convert keywords to DataFrames and filters 5. Calls `ae_summary()` to generate each RTF file 6. Returns list of generated file paths **When to use:** - Production environments with multiple analyses - YAML-first workflow (specifications drive code) - Need reproducibility and traceability # Design Philosophy ## Three-Step Pipeline Architecture The AE summary analysis follows a three-step pipeline that separates concerns: 1. **`ae_summary_ard`**: Generate Analysis Results Data (ARD) - Input: Raw datasets with filters - Output: Standardized long-format DataFrame with columns: `__index__`, `__group__`, `__value__` - Purpose: Data processing and statistical computation 2. **`ae_summary_df`**: Transform to display format - Input: ARD (long format) - Output: Wide-format DataFrame (groups as columns) - Purpose: Reshape data for table layout 3. **`ae_summary_rtf`**: Generate formatted output - Input: Display DataFrame - Output: RTFDocument object - Purpose: Apply formatting and styling ## Why This Separation? - **Testability**: Each step can be tested independently - **Reusability**: ARD can be transformed to different output formats (CSV, Excel, HTML) - **Extensibility**: Easy to add new output formats without touching analysis logic - **Debugging**: Inspect intermediate data at each stage ## ARD Data Structure All `*_ard` functions return a standardized long-format DataFrame: - `__index__`: Row labels (e.g., "Any Adverse Events", "Serious Adverse Events") - `__group__`: Treatment groups (e.g., "Placebo", "Treatment A") - `__value__`: Formatted values (e.g., "12 (34.5%)") This structure enables consistent data handling across different analyses. ## Function Wrapper The `ae_summary` function wraps all three steps for convenience: ``` ae_summary = ae_summary_ard -> ae_summary_df -> ae_summary_rtf -> write to file ``` ::: {.callout-note} ## Extension Points To extend functionality: - Add new statistics: Modify `ae_summary_ard` - Change table layout: Modify `ae_summary_df` - Add new output formats: Create new `ae_summary_*` function using ARD - Batch processing: Use `study_plan_to_ae_summary` pattern ::: ## Complete Pipeline The `ae_summary` function provides a complete pipeline that executes all three steps and writes the RTF output to a file: ```{python} adsl = pl.read_parquet("data/adsl.parquet") adae = pl.read_parquet("data/adae.parquet") ae_summary( population=adsl, observation=adae, population_filter="SAFFL = 'Y'", observation_filter=None, id=("USUBJID", "Subject ID"), group=("TRT01A", "Treatment Group"), variables=[ ("TRTEMFL = 'Y'", "Any Adverse Events"), ("AESER = 'Y'", "Serious Adverse Events") ], title=[ "Analysis of Adverse Event Summary", "(Safety Analysis Population)" ], footnote=["Every participant is counted a single time for each applicable row and column."], source=["Source: ADSL and ADAE datasets"], output_file="studies/xyz123/rtf/ae_summary.rtf", total=True, missing_group="error" ) ``` ```{python} #| echo: false if converter: converter.convert(f"{study_plan.output_dir}/ae_summary.rtf", output_dir="docs/pdf/", format="pdf", overwrite=True) ``` <embed src="pdf/ae_summary.pdf" style="width:100%; height:600px" type="application/pdf"> ## Step-by-Step Pipeline This section demonstrates each step of the pipeline individually, allowing you to inspect intermediate outputs and understand the data transformation at each stage. ### Step 1: Generate Analysis Results Data (ARD) The `ae_summary_ard` function processes raw data and generates standardized long-format output: **Key Parameters:** - `population_filter`: SQL WHERE clause to subset subjects (e.g., `"SAFFL = 'Y'"` for safety population) - `observation_filter`: SQL WHERE clause to subset observations (can be `None`) - `group`: Tuple of `(variable_name, label)` for treatment grouping - `variables`: List of tuples `[(filter, label)]` defining which events to count ```{python} _ard = ae_summary_ard( population=adsl, observation=adae, population_filter="SAFFL = 'Y'", observation_filter=None, group=("TRT01A", "Treatment Group"), variables=[ ("TRTEMFL = 'Y'", "Any Adverse Events"), ("AESER = 'Y'", "Serious Adverse Events") ], id=("USUBJID", "Subject ID"), total=True, missing_group="error" ) _ard ``` **Output Structure:** Long format with `__index__`, `__group__`, `__value__` columns. ### Step 2: Transform to Display Format The `ae_summary_df` function pivots the ARD to wide format where groups become columns: ```{python} _df = ae_summary_df(_ard) _df ``` **Output Structure:** Wide format with `__index__` as row labels and treatment groups as columns. ### Step 3: Generate RTF Output The `ae_summary_rtf` function creates a formatted RTF document: ```{python} ae_summary_rtf( _df, title=[ "Analysis of Adverse Event Summary", "(Safety Analysis Population)" ], footnote=["Every participant is counted a single time for each applicable row and column."], source=["Source: ADSL and ADAE datasets"], col_rel_width=[4, 2, 2, 2, 2] # Optional: defaults to auto-calculated widths ).write_rtf("studies/xyz123/rtf/ae_summary_step.rtf") ``` ```{python} #| echo: false if converter: converter.convert(f"{study_plan.output_dir}/ae_summary_step.rtf", output_dir="docs/pdf/", format="pdf", overwrite=True) ``` <embed src="pdf/ae_summary_step.pdf" style="width:100%; height:600px" type="application/pdf"> **Output:** RTFDocument object that can be written to file using `.write_rtf()`. # Getting Started for Developers ## Which Approach to Use? **Use StudyPlan-driven workflow (`study_plan_to_ae_summary`) when:** - Working in production with validated YAML specifications - Need to generate multiple analyses at once - Want YAML as single source of truth **Use manual workflow (`ae_summary`) when:** - Developing new analyses or debugging - Need one-off custom analyses - Want direct control over parameters **Use step-by-step workflow (individual functions) when:** - Adding new output formats (e.g., Excel, HTML) - Debugging data transformations - Building custom analysis pipelines ## Common Enhancement Patterns **Add new statistics to ARD:** 1. Modify `ae_summary_ard` to add new columns to the long-format output 2. Ensure all values are formatted as strings in `__value__` column 3. Add to `__index__` categories in the correct order **Create new output format (e.g., Excel):** 1. Create new function `ae_summary_xlsx(df, ...)` that takes display DataFrame 2. Apply Excel-specific formatting and styling 3. ARD and display transformation remain unchanged **Batch process with custom logic:** 1. Follow `study_plan_to_ae_summary` pattern 2. Loop through plan rows 3. Use `StudyPlanParser` to extract filters and parameters 4. Call appropriate analysis functions