import os
import sys
from pathlib import Path
import polars as pl
import rtflite as rtf
# Add src to path for imports
sys.path.insert(0, 'src')
from rtflite import LibreOfficeConverter
try:
converter = LibreOfficeConverter()
except Exception:
converter = None
print("WARNING: LibreOffice not found. PDF conversion will be skipped.")4 Adverse Event Analysis
This guide demonstrates: - StudyPlan-driven batch generation for production use - Three-step pipeline architecture for flexibility and extensibility - Standardized ARD structure for consistent data handling
4.1 Setup
First, let’s import the required packages and load our study plan.
from csrlite import load_plan, study_plan_to_ae_summary
from csrlite.ae.ae_summary import ae_summary_ard, ae_summary_df, ae_summary_rtf, ae_summary4.2 StudyPlan-Driven Workflow
For production environments, define all analyses in a YAML file following the Review-Oriented Development (ROD) philosophy.
4.2.1 Load Study Plan
The study plan contains population definitions, observation periods, parameters, and data source specifications:
# Load study plan from YAML
study_plan = load_plan("studies/xyz123/yaml/plan_xyz123.yaml")
study_plan.get_plan_df().filter(pl.col("analysis") == "ae_summary")2026-02-03 15:25:53,900 - csrlite.common.plan - INFO - Successfully loaded dataset 'adsl' from 'studies/xyz123/yaml/../../../data/adsl.parquet'
2026-02-03 15:25:53,903 - csrlite.common.plan - INFO - Successfully loaded dataset 'adae' from 'studies/xyz123/yaml/../../../data/adae.parquet'
2026-02-03 15:25:53,904 - csrlite.common.plan - INFO - Successfully loaded dataset 'adie' from 'studies/xyz123/yaml/../../../data/adie.parquet'
2026-02-03 15:25:53,906 - csrlite.common.plan - INFO - Successfully loaded dataset 'adpd' from 'studies/xyz123/yaml/../../../data/adpd.parquet'
| analysis | population | observation | parameter | group |
|---|---|---|---|---|
| str | str | str | str | str |
| "ae_summary" | "apat" | "week12" | "any;rel;ser" | "trt01a" |
| "ae_summary" | "apat" | "week24" | "any;rel;ser" | "trt01a" |
4.2.2 Batch Generate All Outputs
The study_plan_to_ae_summary function automatically generates RTF outputs for all AE summary analyses defined in the plan:
output_files = study_plan_to_ae_summary(study_plan)studies/xyz123/rtf/ae_summary_apat_week12_any_rel_ser.rtf
studies/xyz123/rtf/ae_summary_apat_week24_any_rel_ser.rtf
4.2.2.1 Week 12
4.2.2.2 Week 24
5 Design Philosophy
5.1 Three-Step Pipeline Architecture
The AE summary analysis follows a three-step pipeline that separates concerns:
ae_summary_ard: Generate Analysis Results Data (ARD)- Input: Raw datasets with filters
- Output: Standardized long-format DataFrame with columns:
__index__,__group__,__value__ - Purpose: Data processing and statistical computation
ae_summary_df: Transform to display format- Input: ARD (long format)
- Output: Wide-format DataFrame (groups as columns)
- Purpose: Reshape data for table layout
ae_summary_rtf: Generate formatted output- Input: Display DataFrame
- Output: RTFDocument object
- Purpose: Apply formatting and styling
5.2 Why This Separation?
- Testability: Each step can be tested independently
- Reusability: ARD can be transformed to different output formats (CSV, Excel, HTML)
- Extensibility: Easy to add new output formats without touching analysis logic
- Debugging: Inspect intermediate data at each stage
5.3 ARD Data Structure
All *_ard functions return a standardized long-format DataFrame:
__index__: Row labels (e.g., “Any Adverse Events”, “Serious Adverse Events”)__group__: Treatment groups (e.g., “Placebo”, “Treatment A”)__value__: Formatted values (e.g., “12 (34.5%)”)
This structure enables consistent data handling across different analyses.
5.4 Function Wrapper
The ae_summary function wraps all three steps for convenience:
ae_summary = ae_summary_ard -> ae_summary_df -> ae_summary_rtf -> write to file
To extend functionality: - Add new statistics: Modify ae_summary_ard - Change table layout: Modify ae_summary_df - Add new output formats: Create new ae_summary_* function using ARD - Batch processing: Use study_plan_to_ae_summary pattern
5.5 Complete Pipeline
The ae_summary function provides a complete pipeline that executes all three steps and writes the RTF output to a file:
adsl = pl.read_parquet("data/adsl.parquet")
adae = pl.read_parquet("data/adae.parquet")
ae_summary(
population=adsl,
observation=adae,
population_filter="SAFFL = 'Y'",
observation_filter=None,
id=("USUBJID", "Subject ID"),
group=("TRT01A", "Treatment Group"),
variables=[
("TRTEMFL = 'Y'", "Any Adverse Events"),
("AESER = 'Y'", "Serious Adverse Events")
],
title=[
"Analysis of Adverse Event Summary",
"(Safety Analysis Population)"
],
footnote=["Every participant is counted a single time for each applicable row and column."],
source=["Source: ADSL and ADAE datasets"],
output_file="studies/xyz123/rtf/ae_summary.rtf",
total=True,
missing_group="error"
)studies/xyz123/rtf/ae_summary.rtf
'studies/xyz123/rtf/ae_summary.rtf'
5.6 Step-by-Step Pipeline
This section demonstrates each step of the pipeline individually, allowing you to inspect intermediate outputs and understand the data transformation at each stage.
5.6.1 Step 1: Generate Analysis Results Data (ARD)
The ae_summary_ard function processes raw data and generates standardized long-format output:
Key Parameters: - population_filter: SQL WHERE clause to subset subjects (e.g., "SAFFL = 'Y'" for safety population) - observation_filter: SQL WHERE clause to subset observations (can be None) - group: Tuple of (variable_name, label) for treatment grouping - variables: List of tuples [(filter, label)] defining which events to count
_ard = ae_summary_ard(
population=adsl,
observation=adae,
population_filter="SAFFL = 'Y'",
observation_filter=None,
group=("TRT01A", "Treatment Group"),
variables=[
("TRTEMFL = 'Y'", "Any Adverse Events"),
("AESER = 'Y'", "Serious Adverse Events")
],
id=("USUBJID", "Subject ID"),
total=True,
missing_group="error"
)
_ard| __index__ | __group__ | __value__ |
|---|---|---|
| enum | enum | str |
| "Participants in population" | "Placebo" | "86" |
| "Participants in population" | "Xanomeline High Dose" | "84" |
| "Participants in population" | "Xanomeline Low Dose" | "84" |
| "Participants in population" | "Total" | "254" |
| "" | "Placebo" | "" |
| … | … | … |
| "Any Adverse Events" | "Total" | "218 ( 85.8)" |
| "Serious Adverse Events" | "Placebo" | " 0 ( 0.0)" |
| "Serious Adverse Events" | "Xanomeline High Dose" | " 2 ( 2.4)" |
| "Serious Adverse Events" | "Xanomeline Low Dose" | " 1 ( 1.2)" |
| "Serious Adverse Events" | "Total" | " 3 ( 1.2)" |
Output Structure: Long format with __index__, __group__, __value__ columns.
5.6.2 Step 2: Transform to Display Format
The ae_summary_df function pivots the ARD to wide format where groups become columns:
_df = ae_summary_df(_ard)
_df| __index__ | Placebo | Xanomeline High Dose | Xanomeline Low Dose | Total |
|---|---|---|---|---|
| enum | str | str | str | str |
| "Participants in population" | "86" | "84" | "84" | "254" |
| "" | "" | "" | "" | "" |
| "Any Adverse Events" | " 65 ( 75.6)" | " 76 ( 90.5)" | " 77 ( 91.7)" | "218 ( 85.8)" |
| "Serious Adverse Events" | " 0 ( 0.0)" | " 2 ( 2.4)" | " 1 ( 1.2)" | " 3 ( 1.2)" |
Output Structure: Wide format with __index__ as row labels and treatment groups as columns.
5.6.3 Step 3: Generate RTF Output
The ae_summary_rtf function creates a formatted RTF document:
ae_summary_rtf(
_df,
title=[
"Analysis of Adverse Event Summary",
"(Safety Analysis Population)"
],
footnote=["Every participant is counted a single time for each applicable row and column."],
source=["Source: ADSL and ADAE datasets"],
col_rel_width=[4, 2, 2, 2, 2] # Optional: defaults to auto-calculated widths
).write_rtf("studies/xyz123/rtf/ae_summary_step.rtf")studies/xyz123/rtf/ae_summary_step.rtf
6 Getting Started for Developers
6.1 Which Approach to Use?
Use StudyPlan-driven workflow (study_plan_to_ae_summary) when: - Working in production with validated YAML specifications - Need to generate multiple analyses at once - Want YAML as single source of truth
Use manual workflow (ae_summary) when: - Developing new analyses or debugging - Need one-off custom analyses - Want direct control over parameters
Use step-by-step workflow (individual functions) when: - Adding new output formats (e.g., Excel, HTML) - Debugging data transformations - Building custom analysis pipelines
6.2 Common Enhancement Patterns
Add new statistics to ARD: 1. Modify ae_summary_ard to add new columns to the long-format output 2. Ensure all values are formatted as strings in __value__ column 3. Add to __index__ categories in the correct order
Create new output format (e.g., Excel): 1. Create new function ae_summary_xlsx(df, ...) that takes display DataFrame 2. Apply Excel-specific formatting and styling 3. ARD and display transformation remain unchanged
Batch process with custom logic: 1. Follow study_plan_to_ae_summary pattern 2. Loop through plan rows 3. Use StudyPlanParser to extract filters and parameters 4. Call appropriate analysis functions