Quick Start

Quick Start with Polars Eval Metrics

This guide will help you get up and running with the polars-eval-metrics package.

Setup

First, ensure the package is available in your Python environment:

Import the package

import polars as pl

from polars_eval_metrics import MetricDefine, MetricEvaluator


# Example Data 

Let's create an example dataset for typicall 
model evaluation effort. 

```{python}
from data_generator import generate_sample_data

df = generate_sample_data(n_subjects=3, n_visits=2, n_groups=2)
print(f"Data shape: {df.shape}")
df.head()

Single Metric (MAE)

Start with mean absolute error for a single estimate column. MetricDefine fetches a metric expression by name from the global registry and keeps any parameterisation you provide.

metric = MetricDefine(name="mae")
metric

evaluator = MetricEvaluator(
    df=df,
    metrics=metric,
    ground_truth="actual",
    estimates="model1",
)
result = evaluator.evaluate()
result

Equivalent Polars code

Polars equivalent

df.select(
    (pl.col("model1") - pl.col("actual")).abs().mean().alias("mae")
)

# Inspect the lazy plan that powers ARD collection
lazy_result = evaluator.evaluate(collect=False)
print(lazy_result.explain())

Evaluate by Group

Evaluating multiple metrics across estimates stratified by a treatment arm only requires adjusting the evaluator configuration.

evaluator = MetricEvaluator(
    df=df,
    metrics=[MetricDefine(name="mae"), MetricDefine(name="rmse")],
    ground_truth="actual",
    estimates=["model1", "model2"],
    group_by=["treatment"],
)
group_result = evaluator.evaluate()
group_result

Polars equivalent

df.group_by("treatment").agg([
    (pl.col("model1") - pl.col("actual")).abs().mean().alias("mae_model1"),
    (pl.col("model2") - pl.col("actual")).abs().mean().alias("mae_model2"),
    (pl.col("model1") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model1"),
    (pl.col("model2") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model2"),
]).sort("treatment")

Group + Subgroup Evaluation

Subgroup analysis is enabled by specifying subgroup_by. Use a temporary configuration block to show all subgroup rows without changing global Polars settings.

with pl.Config(tbl_rows=-1):
    subgroup_result = MetricEvaluator(
        df=df,
        metrics=[MetricDefine(name="mae"), MetricDefine(name="rmse")],
        ground_truth="actual",
        estimates=["model1", "model2"],
        group_by=["treatment"],
        subgroup_by=["gender", "race"],
    ).evaluate()
subgroup_result

subgroup_result.collect()

Polars equivalent

gender_results = df.group_by(["treatment", "gender"]).agg([
    (pl.col("model1") - pl.col("actual")).abs().mean().alias("mae_model1"),
    (pl.col("model2") - pl.col("actual")).abs().mean().alias("mae_model2"),
    (pl.col("model1") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model1"),
    (pl.col("model2") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model2"),
]).sort(["treatment", "gender"])

race_results = df.group_by(["treatment", "race"]).agg([
    (pl.col("model1") - pl.col("actual")).abs().mean().alias("mae_model1"),
    (pl.col("model2") - pl.col("actual")).abs().mean().alias("mae_model2"),
    (pl.col("model1") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model1"),
    (pl.col("model2") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model2"),
]).sort(["treatment", "race"])

gender_results, race_results