Quick Start
Quick Start with Polars Eval Metrics
This guide will help you get up and running with the polars-eval-metrics package.
Setup
First, ensure the package is available in your Python environment:
Import the package
import polars as pl
from polars_eval_metrics import MetricDefine, MetricEvaluator
# Example Data
Let's create an example dataset for typicall
model evaluation effort.
```{python}
from data_generator import generate_sample_data
df = generate_sample_data(n_subjects=3, n_visits=2, n_groups=2)
print(f"Data shape: {df.shape}")
df.head()
Single Metric (MAE)
Start with mean absolute error for a single estimate column. MetricDefine
fetches a metric expression by name from the global registry and keeps any parameterisation you provide.
metric = MetricDefine(name="mae")
metric
evaluator = MetricEvaluator(
df=df,
metrics=metric,
ground_truth="actual",
estimates="model1",
)
result = evaluator.evaluate()
result
- Equivalent Polars code
NotePolars equivalent
df.select(
(pl.col("model1") - pl.col("actual")).abs().mean().alias("mae")
)
# Inspect the lazy plan that powers ARD collection
lazy_result = evaluator.evaluate(collect=False)
print(lazy_result.explain())
Evaluate by Group
Evaluating multiple metrics across estimates stratified by a treatment arm only requires adjusting the evaluator configuration.
evaluator = MetricEvaluator(
df=df,
metrics=[MetricDefine(name="mae"), MetricDefine(name="rmse")],
ground_truth="actual",
estimates=["model1", "model2"],
group_by=["treatment"],
)
group_result = evaluator.evaluate()
group_result
NotePolars equivalent
df.group_by("treatment").agg([
(pl.col("model1") - pl.col("actual")).abs().mean().alias("mae_model1"),
(pl.col("model2") - pl.col("actual")).abs().mean().alias("mae_model2"),
(pl.col("model1") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model1"),
(pl.col("model2") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model2"),
]).sort("treatment")
Group + Subgroup Evaluation
Subgroup analysis is enabled by specifying subgroup_by
. Use a temporary configuration block to show all subgroup rows without changing global Polars settings.
with pl.Config(tbl_rows=-1):
subgroup_result = MetricEvaluator(
df=df,
metrics=[MetricDefine(name="mae"), MetricDefine(name="rmse")],
ground_truth="actual",
estimates=["model1", "model2"],
group_by=["treatment"],
subgroup_by=["gender", "race"],
).evaluate()
subgroup_result
subgroup_result.collect()
NotePolars equivalent
gender_results = df.group_by(["treatment", "gender"]).agg([
(pl.col("model1") - pl.col("actual")).abs().mean().alias("mae_model1"),
(pl.col("model2") - pl.col("actual")).abs().mean().alias("mae_model2"),
(pl.col("model1") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model1"),
(pl.col("model2") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model2"),
]).sort(["treatment", "gender"])
race_results = df.group_by(["treatment", "race"]).agg([
(pl.col("model1") - pl.col("actual")).abs().mean().alias("mae_model1"),
(pl.col("model2") - pl.col("actual")).abs().mean().alias("mae_model2"),
(pl.col("model1") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model1"),
(pl.col("model2") - pl.col("actual")).pow(2).mean().sqrt().alias("rmse_model2"),
]).sort(["treatment", "race"])
gender_results, race_results