import polars as pl
from polars_eval_metrics import MetricDefine, MetricEvaluator
from polars_eval_metrics.ard import ARD
from data_generator import generate_sample_data
8) pl.Config.set_tbl_rows(
polars.config.Config
Fixed-schema container for evaluation metrics
polars.config.Config
The Analysis Results Data (ARD) container holds every evaluation result produced by MetricEvaluator
. Each ARD instance wraps a Polars LazyFrame
so results stay lazy until you explicitly collect them. The container standardises the schema for downstream processing, keeps metric metadata attached, and preserves display-friendly ordering through enum-backed columns.
metric
, label
, and estimate
use Enum
types so tables respect the metric definition order instead of alphabetical sorting.The minimum schema that every ARD provides is shown below. Calling ARD.schema
exposes these columns.
Column | Type | Description |
---|---|---|
groups |
pl.Struct |
Primary grouping keys (e.g. treatment, site). |
subgroups |
pl.Struct |
Subgroup breakdowns (e.g. gender, race). |
estimate |
pl.Enum |
Estimate identifier for the model or prediction column. |
metric |
pl.Enum |
Metric name registered in the metric registry. |
label |
pl.Utf8 |
Human-readable display label |
stat |
pl.Struct |
Typed value container (see below). |
stat_fmt |
pl.Utf8 |
Default formatted presentation of the statistic. |
context |
pl.Struct |
Metadata describing how the metric was computed. |
warning |
pl.List(pl.Utf8) |
Captured warnings generated while evaluating the metric. |
error |
pl.List(pl.Utf8) |
Captured errors when evaluation falls back to placeholders. |
id |
pl.Struct |
Entity identifiers for within-subject / visit metrics. |
The stat
struct splits the stored result across typed channels:
type
: value hint such as "float"
, "int"
, "bool"
, "string"
, or "struct"
value_float
, value_int
, value_bool
, value_str
, value_struct
: mutually exclusive slots for the actual statisticformat
: optional Python format string used when rendering. The rendered value is kept in stat_fmt
so downstream code can use either the raw struct or the default text representation without recomputing formatting.Any warnings or errors that occur while evaluating a metric are preserved in the warning
and error
list columns. When evaluation succeeds both columns contain empty lists; if an expression fails, the ARD still returns a placeholder row with the captured diagnostic message so the failure is visible without breaking collection.
The evaluator returns a Polars DataFrame
by default, keeping columns such as label
, metric
, estimate
, and value
ready for display. When you need the full ARD structure, reuse the lazy output with the :class:polars_eval_metrics.ard.ARD
helper.
# Sample data with two estimates, grouped by treatment and gender
raw = generate_sample_data(n_subjects=4, n_visits=2, n_groups=2)
metrics = [
MetricDefine(name="mae", label="Mean Absolute Error"),
MetricDefine(name="rmse", label="Root Mean Squared Error"),
]
evaluator = MetricEvaluator(
df=raw,
metrics=metrics,
ground_truth="actual",
estimates=["model1", "model2"],
)
result = evaluator.evaluate()
result
estimate | metric | label | value | metric_type | scope |
---|---|---|---|---|---|
enum | enum | enum | str | str | str |
"model1" | "mae" | "Mean Absolute Error" | "1.1" | "across_sample" | null |
"model2" | "mae" | "Mean Absolute Error" | "1.5" | "across_sample" | null |
"model1" | "rmse" | "Root Mean Squared Error" | "1.3" | "across_sample" | null |
"model2" | "rmse" | "Root Mean Squared Error" | "1.9" | "across_sample" | null |
Convert the evaluation output to an ARD object:
ARD(summary={'n_rows': 4, 'n_metrics': 2, 'n_estimates': 2, 'n_groups': 0, 'n_subgroups': 0, 'metrics': ['mae', 'rmse'], 'estimates': ['model1', 'model2']})
{'estimate': Enum(categories=['model1', 'model2']),
'value': Float64,
'metric': Enum(categories=['mae', 'rmse']),
'label': Enum(categories=['Mean Absolute Error', 'Root Mean Squared Error']),
'metric_type': String,
'scope': String,
'id': Null,
'groups': Null,
'subgroups': Null,
'stat': Struct({'type': String, 'value_float': Float64, 'value_int': Int64, 'value_bool': Boolean, 'value_str': String, 'value_struct': Null, 'format': String}),
'context': Struct({'metric_type': String, 'scope': String, 'label': String, 'estimate_label': String}),
'stat_fmt': String,
'warning': List(String),
'error': List(String)}
Collecting produces the backward-compatible table with struct columns:
id | groups | subgroups | estimate | metric | label | stat | stat_fmt | warning | error | context |
---|---|---|---|---|---|---|---|---|---|---|
null | null | null | enum | enum | enum | struct[7] | str | list[str] | list[str] | struct[4] |
null | null | null | "model1" | "mae" | "Mean Absolute Error" | {"float",1.114286,null,null,null,null,null} | "1.1" | [] | [] | {"across_sample",null,"Mean Absolute Error","model1"} |
null | null | null | "model2" | "mae" | "Mean Absolute Error" | {"float",1.471429,null,null,null,null,null} | "1.5" | [] | [] | {"across_sample",null,"Mean Absolute Error","model2"} |
null | null | null | "model2" | "rmse" | "Root Mean Squared Error" | {"float",1.913486,null,null,null,null,null} | "1.9" | [] | [] | {"across_sample",null,"Root Mean Squared Error","model2"} |
null | null | null | "model1" | "rmse" | "Root Mean Squared Error" | {"float",1.298351,null,null,null,null,null} | "1.3" | [] | [] | {"across_sample",null,"Root Mean Squared Error","model1"} |
The typed stat
payload is preserved even though the value
column is purely presentational:
{'type': 'float',
'value_float': 1.114285714285714,
'value_int': None,
'value_bool': None,
'value_str': None,
'value_struct': None,
'format': None}
Structured columns can be unnested to inspect individual components:
estimate | metric | value_float | format |
---|---|---|---|
enum | enum | f64 | str |
"model1" | "mae" | 1.114286 | null |
"model2" | "mae" | 1.471429 | null |
"model2" | "rmse" | 1.913486 | null |
"model1" | "rmse" | 1.298351 | null |
ARD.get_stats()
offers a quick way to access raw values with optional metadata:
Call it without arguments when you just want the canonical metric/value pairs:
metric | value |
---|---|
enum | f64 |
"mae" | 1.114286 |
"mae" | 1.471429 |
"rmse" | 1.913486 |
"rmse" | 1.298351 |
Pass include_metadata=True
if you need to see the stored stat
type tag and format hint alongside the value:
Empty structs are often produced when downstream code prefers explicit placeholders. The helper methods on ARD
let you toggle between empty and null representations:
with_empty_as_null()
collapses all-null structs and blank estimates to proper nulls so filters behave as expected:
id | groups | subgroups | estimate | metric | label | stat | stat_fmt | warning | error | context |
---|---|---|---|---|---|---|---|---|---|---|
null | null | null | enum | enum | enum | struct[7] | str | list[str] | list[str] | struct[4] |
null | null | null | "model2" | "mae" | "Mean Absolute Error" | {"float",1.471429,null,null,null,null,null} | "1.5" | [] | [] | {"across_sample",null,"Mean Absolute Error","model2"} |
null | null | null | "model1" | "mae" | "Mean Absolute Error" | {"float",1.114286,null,null,null,null,null} | "1.1" | [] | [] | {"across_sample",null,"Mean Absolute Error","model1"} |
null | null | null | "model1" | "rmse" | "Root Mean Squared Error" | {"float",1.298351,null,null,null,null,null} | "1.3" | [] | [] | {"across_sample",null,"Root Mean Squared Error","model1"} |
null | null | null | "model2" | "rmse" | "Root Mean Squared Error" | {"float",1.913486,null,null,null,null,null} | "1.9" | [] | [] | {"across_sample",null,"Root Mean Squared Error","model2"} |
with_null_as_empty()
does the reverse—replacing nulls with empty structs so templating code can access the fields safely:
id | groups | subgroups | estimate | metric | label | stat | stat_fmt | warning | error | context |
---|---|---|---|---|---|---|---|---|---|---|
null | null | null | enum | enum | enum | struct[7] | str | list[str] | list[str] | struct[4] |
null | null | null | "model2" | "mae" | "Mean Absolute Error" | {"float",1.471429,null,null,null,null,null} | "1.5" | [] | [] | {"across_sample",null,"Mean Absolute Error","model2"} |
null | null | null | "model1" | "mae" | "Mean Absolute Error" | {"float",1.114286,null,null,null,null,null} | "1.1" | [] | [] | {"across_sample",null,"Mean Absolute Error","model1"} |
null | null | null | "model1" | "rmse" | "Root Mean Squared Error" | {"float",1.298351,null,null,null,null,null} | "1.3" | [] | [] | {"across_sample",null,"Root Mean Squared Error","model1"} |
null | null | null | "model2" | "rmse" | "Root Mean Squared Error" | {"float",1.913486,null,null,null,null,null} | "1.9" | [] | [] | {"across_sample",null,"Root Mean Squared Error","model2"} |
to_long()
flattens the lazy frame into a DataFrame where grouping metadata is expanded and the formatted stat
value is exposed as value
:
estimate | value | metric | label | metric_type | scope | id | groups | subgroups | stat | context | stat_fmt | warning | error |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
enum | str | enum | enum | str | str | null | null | null | struct[7] | struct[4] | str | list[str] | list[str] |
"model2" | "1.5" | "mae" | "Mean Absolute Error" | "across_sample" | null | null | null | null | {"float",1.471429,null,null,null,null,null} | {"across_sample",null,"Mean Absolute Error","model2"} | "1.5" | [] | [] |
"model1" | "1.1" | "mae" | "Mean Absolute Error" | "across_sample" | null | null | null | null | {"float",1.114286,null,null,null,null,null} | {"across_sample",null,"Mean Absolute Error","model1"} | "1.1" | [] | [] |
"model2" | "1.9" | "rmse" | "Root Mean Squared Error" | "across_sample" | null | null | null | null | {"float",1.913486,null,null,null,null,null} | {"across_sample",null,"Root Mean Squared Error","model2"} | "1.9" | [] | [] |
"model1" | "1.3" | "rmse" | "Root Mean Squared Error" | "across_sample" | null | null | null | null | {"float",1.298351,null,null,null,null,null} | {"across_sample",null,"Root Mean Squared Error","model1"} | "1.3" | [] | [] |
unnest()
is useful when you just need the struct columns expanded in place without the extra pivoting logic:
estimate | value | metric | label | metric_type | scope | id | groups | subgroups | stat | context | stat_fmt | warning | error |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
enum | f64 | enum | enum | str | str | null | null | null | struct[7] | struct[4] | str | list[str] | list[str] |
"model2" | 1.471429 | "mae" | "Mean Absolute Error" | "across_sample" | null | null | null | null | {"float",1.471429,null,null,null,null,null} | {"across_sample",null,"Mean Absolute Error","model2"} | "1.5" | [] | [] |
"model1" | 1.114286 | "mae" | "Mean Absolute Error" | "across_sample" | null | null | null | null | {"float",1.114286,null,null,null,null,null} | {"across_sample",null,"Mean Absolute Error","model1"} | "1.1" | [] | [] |
"model2" | 1.913486 | "rmse" | "Root Mean Squared Error" | "across_sample" | null | null | null | null | {"float",1.913486,null,null,null,null,null} | {"across_sample",null,"Root Mean Squared Error","model2"} | "1.9" | [] | [] |
"model1" | 1.298351 | "rmse" | "Root Mean Squared Error" | "across_sample" | null | null | null | null | {"float",1.298351,null,null,null,null,null} | {"across_sample",null,"Root Mean Squared Error","model1"} | "1.3" | [] | [] |
Wide presentations remain useful for dashboards. to_wide()
pivots metric values while preserving formatting hints—perfect for quick scorecards:
estimate | mae | rmse |
---|---|---|
enum | str | str |
"model2" | "1.5" | "1.9" |
"model1" | "1.1" | "1.3" |
For ad-hoc layouts you can pull the lazy frame to long form and use Polars’ native pivot via the pivot()
convenience wrapper when you want full control over the value column and aggregation:
Use summary()
for a quick diagnostic of the collected dataset or describe()
for a formatted printout:
summary()
returns a dict of counts that you can log or feed into tests:
{'n_rows': 4,
'n_metrics': 2,
'n_estimates': 2,
'n_groups': 0,
'n_subgroups': 0,
'metrics': ['mae', 'rmse'],
'estimates': ['model1', 'model2']}
describe()
emits a readable console report with sample rows—handy during notebook exploration:
==================================================
ARD Summary: 4 results
==================================================
Metrics:
- mae
- rmse
Estimates:
- model1
- model2
Preview:
shape: (4, 14)
┌──────────┬──────────┬────────┬──────────────┬───┬─────────────┬──────────┬───────────┬───────────┐
│ estimate ┆ value ┆ metric ┆ label ┆ … ┆ context ┆ stat_fmt ┆ warning ┆ error │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ enum ┆ f64 ┆ enum ┆ enum ┆ ┆ struct[4] ┆ str ┆ list[str] ┆ list[str] │
╞══════════╪══════════╪════════╪══════════════╪═══╪═════════════╪══════════╪═══════════╪═══════════╡
│ model2 ┆ 1.471429 ┆ mae ┆ Mean ┆ … ┆ {"across_sa ┆ 1.5 ┆ [] ┆ [] │
│ ┆ ┆ ┆ Absolute ┆ ┆ mple",null, ┆ ┆ ┆ │
│ ┆ ┆ ┆ Error ┆ ┆ "Mean Ab… ┆ ┆ ┆ │
│ model1 ┆ 1.114286 ┆ mae ┆ Mean ┆ … ┆ {"across_sa ┆ 1.1 ┆ [] ┆ [] │
│ ┆ ┆ ┆ Absolute ┆ ┆ mple",null, ┆ ┆ ┆ │
│ ┆ ┆ ┆ Error ┆ ┆ "Mean Ab… ┆ ┆ ┆ │
│ model1 ┆ 1.298351 ┆ rmse ┆ Root Mean ┆ … ┆ {"across_sa ┆ 1.3 ┆ [] ┆ [] │
│ ┆ ┆ ┆ Squared ┆ ┆ mple",null, ┆ ┆ ┆ │
│ ┆ ┆ ┆ Error ┆ ┆ "Root Me… ┆ ┆ ┆ │
│ model2 ┆ 1.913486 ┆ rmse ┆ Root Mean ┆ … ┆ {"across_sa ┆ 1.9 ┆ [] ┆ [] │
│ ┆ ┆ ┆ Squared ┆ ┆ mple",null, ┆ ┆ ┆ │
│ ┆ ┆ ┆ Error ┆ ┆ "Root Me… ┆ ┆ ┆ │
└──────────┴──────────┴────────┴──────────────┴───┴─────────────┴──────────┴───────────┴───────────┘