Analysis Results Data (ARD)

Fixed-schema container for evaluation metrics

import polars as pl
from polars_eval_metrics import MetricDefine, MetricEvaluator
from polars_eval_metrics.ard import ARD
from data_generator import generate_sample_data

pl.Config.set_tbl_rows(8)

polars.config.Config

Overview

The Analysis Results Data (ARD) container holds every evaluation result produced by MetricEvaluator. Each ARD instance wraps a Polars LazyFrame so results stay lazy until you explicitly collect them. The container standardises the schema for downstream processing, keeps metric metadata attached, and preserves display-friendly ordering through enum-backed columns.

Why ARD?

Canonical layout: every result set shares the same structured columns, regardless of metric mix or grouping configuration.
Type preservation: statistical values live in a typed struct that keeps floats, integers, booleans, strings, and nested payloads distinct.
Lazy by default: filters and joins can be composed on the underlying lazy frame and collected only when needed.
Display metadata: columns such as metric, label, and estimate use Enum types so tables respect the metric definition order instead of alphabetical sorting.

Canonical Columns

The minimum schema that every ARD provides is shown below. Calling ARD.schema exposes these columns.

Column	Type	Description
`groups`	`pl.Struct`	Primary grouping keys (e.g. treatment, site).
`subgroups`	`pl.Struct`	Subgroup breakdowns (e.g. gender, race).
`estimate`	`pl.Enum`	Estimate identifier for the model or prediction column.
`metric`	`pl.Enum`	Metric name registered in the metric registry.
`label`	`pl.Utf8`	Human-readable display label
`stat`	`pl.Struct`	Typed value container (see below).
`stat_fmt`	`pl.Utf8`	Default formatted presentation of the statistic.
`context`	`pl.Struct`	Metadata describing how the metric was computed.
`warning`	`pl.List(pl.Utf8)`	Captured warnings generated while evaluating the metric.
`error`	`pl.List(pl.Utf8)`	Captured errors when evaluation falls back to placeholders.
`id`	`pl.Struct`	Entity identifiers for within-subject / visit metrics.

The stat struct splits the stored result across typed channels:

type: value hint such as "float", "int", "bool", "string", or "struct"
value_float, value_int, value_bool, value_str, value_struct: mutually exclusive slots for the actual statistic
format: optional Python format string used when rendering. The rendered value is kept in stat_fmt so downstream code can use either the raw struct or the default text representation without recomputing formatting.

Any warnings or errors that occur while evaluating a metric are preserved in the warning and error list columns. When evaluation succeeds both columns contain empty lists; if an expression fails, the ARD still returns a placeholder row with the captured diagnostic message so the failure is visible without breaking collection.

Producing ARD from an Evaluation

The evaluator returns a Polars DataFrame by default, keeping columns such as label, metric, estimate, and value ready for display. When you need the full ARD structure, reuse the lazy output with the :class:polars_eval_metrics.ard.ARD helper.

# Sample data with two estimates, grouped by treatment and gender
raw = generate_sample_data(n_subjects=4, n_visits=2, n_groups=2)
metrics = [
    MetricDefine(name="mae", label="Mean Absolute Error"),
    MetricDefine(name="rmse", label="Root Mean Squared Error"),
]

evaluator = MetricEvaluator(
    df=raw,
    metrics=metrics,
    ground_truth="actual",
    estimates=["model1", "model2"],
)

result = evaluator.evaluate()
result

shape: (4, 6)

estimate	metric	label	value	metric_type	scope
enum	enum	enum	str	str	str
"model1"	"mae"	"Mean Absolute Error"	"1.1"	"across_sample"	null
"model2"	"mae"	"Mean Absolute Error"	"1.5"	"across_sample"	null
"model1"	"rmse"	"Root Mean Squared Error"	"1.3"	"across_sample"	null
"model2"	"rmse"	"Root Mean Squared Error"	"1.9"	"across_sample"	null

ARD Container

Convert the evaluation output to an ARD object:

lazy_result = evaluator.evaluate(collect=False)
ard = ARD(lazy_result)
ard

ARD(summary={'n_rows': 4, 'n_metrics': 2, 'n_estimates': 2, 'n_groups': 0, 'n_subgroups': 0, 'metrics': ['mae', 'rmse'], 'estimates': ['model1', 'model2']})

ard.schema

{'estimate': Enum(categories=['model1', 'model2']),
 'value': Float64,
 'metric': Enum(categories=['mae', 'rmse']),
 'label': Enum(categories=['Mean Absolute Error', 'Root Mean Squared Error']),
 'metric_type': String,
 'scope': String,
 'id': Null,
 'groups': Null,
 'subgroups': Null,
 'stat': Struct({'type': String, 'value_float': Float64, 'value_int': Int64, 'value_bool': Boolean, 'value_str': String, 'value_struct': Null, 'format': String}),
 'context': Struct({'metric_type': String, 'scope': String, 'label': String, 'estimate_label': String}),
 'stat_fmt': String,
 'warning': List(String),
 'error': List(String)}

Collecting produces the backward-compatible table with struct columns:

canonical = ard.collect()
canonical

shape: (4, 11)

id	groups	subgroups	estimate	metric	label	stat	stat_fmt	warning	error	context
null	null	null	enum	enum	enum	struct[7]	str	list[str]	list[str]	struct[4]
null	null	null	"model1"	"mae"	"Mean Absolute Error"	{"float",1.114286,null,null,null,null,null}	"1.1"	[]	[]	{"across_sample",null,"Mean Absolute Error","model1"}
null	null	null	"model2"	"mae"	"Mean Absolute Error"	{"float",1.471429,null,null,null,null,null}	"1.5"	[]	[]	{"across_sample",null,"Mean Absolute Error","model2"}
null	null	null	"model2"	"rmse"	"Root Mean Squared Error"	{"float",1.913486,null,null,null,null,null}	"1.9"	[]	[]	{"across_sample",null,"Root Mean Squared Error","model2"}
null	null	null	"model1"	"rmse"	"Root Mean Squared Error"	{"float",1.298351,null,null,null,null,null}	"1.3"	[]	[]	{"across_sample",null,"Root Mean Squared Error","model1"}

The typed stat payload is preserved even though the value column is purely presentational:

canonical["stat"][0]

{'type': 'float',
 'value_float': 1.114285714285714,
 'value_int': None,
 'value_bool': None,
 'value_str': None,
 'value_struct': None,
 'format': None}

Structured columns can be unnested to inspect individual components:

canonical.select([
    pl.col("estimate"),
    pl.col("metric"),
    pl.col("stat").struct.field("value_float").alias("value_float"),
    pl.col("stat").struct.field("format").alias("format"),
])

shape: (4, 4)

estimate	metric	value_float	format
enum	enum	f64	str
"model1"	"mae"	1.114286	null
"model2"	"mae"	1.471429	null
"model2"	"rmse"	1.913486	null
"model1"	"rmse"	1.298351	null

ARD.get_stats() offers a quick way to access raw values with optional metadata:

Call it without arguments when you just want the canonical metric/value pairs:

ard.get_stats()

shape: (4, 2)

metric	value
enum	f64
"mae"	1.114286
"mae"	1.471429
"rmse"	1.913486
"rmse"	1.298351

Pass include_metadata=True if you need to see the stored stat type tag and format hint alongside the value:

ard.get_stats(include_metadata=True)

shape: (4, 5)

metric	value	type	format	formatted
enum	f64	str	null	str
"mae"	1.114286	"float"	null	"1.1"
"mae"	1.471429	"float"	null	"1.5"
"rmse"	1.913486	"float"	null	"1.9"
"rmse"	1.298351	"float"	null	"1.3"

Normalising Struct Columns

Empty structs are often produced when downstream code prefers explicit placeholders. The helper methods on ARD let you toggle between empty and null representations:

with_empty_as_null() collapses all-null structs and blank estimates to proper nulls so filters behave as expected:

ard.with_empty_as_null().collect()

shape: (4, 11)

id	groups	subgroups	estimate	metric	label	stat	stat_fmt	warning	error	context
null	null	null	enum	enum	enum	struct[7]	str	list[str]	list[str]	struct[4]
null	null	null	"model2"	"mae"	"Mean Absolute Error"	{"float",1.471429,null,null,null,null,null}	"1.5"	[]	[]	{"across_sample",null,"Mean Absolute Error","model2"}
null	null	null	"model1"	"mae"	"Mean Absolute Error"	{"float",1.114286,null,null,null,null,null}	"1.1"	[]	[]	{"across_sample",null,"Mean Absolute Error","model1"}
null	null	null	"model1"	"rmse"	"Root Mean Squared Error"	{"float",1.298351,null,null,null,null,null}	"1.3"	[]	[]	{"across_sample",null,"Root Mean Squared Error","model1"}
null	null	null	"model2"	"rmse"	"Root Mean Squared Error"	{"float",1.913486,null,null,null,null,null}	"1.9"	[]	[]	{"across_sample",null,"Root Mean Squared Error","model2"}

with_null_as_empty() does the reverse—replacing nulls with empty structs so templating code can access the fields safely:

ard.with_null_as_empty().collect()

shape: (4, 11)

id	groups	subgroups	estimate	metric	label	stat	stat_fmt	warning	error	context
null	null	null	enum	enum	enum	struct[7]	str	list[str]	list[str]	struct[4]
null	null	null	"model2"	"mae"	"Mean Absolute Error"	{"float",1.471429,null,null,null,null,null}	"1.5"	[]	[]	{"across_sample",null,"Mean Absolute Error","model2"}
null	null	null	"model1"	"mae"	"Mean Absolute Error"	{"float",1.114286,null,null,null,null,null}	"1.1"	[]	[]	{"across_sample",null,"Mean Absolute Error","model1"}
null	null	null	"model1"	"rmse"	"Root Mean Squared Error"	{"float",1.298351,null,null,null,null,null}	"1.3"	[]	[]	{"across_sample",null,"Root Mean Squared Error","model1"}
null	null	null	"model2"	"rmse"	"Root Mean Squared Error"	{"float",1.913486,null,null,null,null,null}	"1.9"	[]	[]	{"across_sample",null,"Root Mean Squared Error","model2"}

Transformation

to_long() flattens the lazy frame into a DataFrame where grouping metadata is expanded and the formatted stat value is exposed as value:

ard.to_long()

shape: (4, 14)

estimate	value	metric	label	metric_type	scope	id	groups	subgroups	stat	context	stat_fmt	warning	error
enum	str	enum	enum	str	str	null	null	null	struct[7]	struct[4]	str	list[str]	list[str]
"model2"	"1.5"	"mae"	"Mean Absolute Error"	"across_sample"	null	null	null	null	{"float",1.471429,null,null,null,null,null}	{"across_sample",null,"Mean Absolute Error","model2"}	"1.5"	[]	[]
"model1"	"1.1"	"mae"	"Mean Absolute Error"	"across_sample"	null	null	null	null	{"float",1.114286,null,null,null,null,null}	{"across_sample",null,"Mean Absolute Error","model1"}	"1.1"	[]	[]
"model2"	"1.9"	"rmse"	"Root Mean Squared Error"	"across_sample"	null	null	null	null	{"float",1.913486,null,null,null,null,null}	{"across_sample",null,"Root Mean Squared Error","model2"}	"1.9"	[]	[]
"model1"	"1.3"	"rmse"	"Root Mean Squared Error"	"across_sample"	null	null	null	null	{"float",1.298351,null,null,null,null,null}	{"across_sample",null,"Root Mean Squared Error","model1"}	"1.3"	[]	[]

unnest() is useful when you just need the struct columns expanded in place without the extra pivoting logic:

ard.unnest()

shape: (4, 14)

estimate	value	metric	label	metric_type	scope	id	groups	subgroups	stat	context	stat_fmt	warning	error
enum	f64	enum	enum	str	str	null	null	null	struct[7]	struct[4]	str	list[str]	list[str]
"model2"	1.471429	"mae"	"Mean Absolute Error"	"across_sample"	null	null	null	null	{"float",1.471429,null,null,null,null,null}	{"across_sample",null,"Mean Absolute Error","model2"}	"1.5"	[]	[]
"model1"	1.114286	"mae"	"Mean Absolute Error"	"across_sample"	null	null	null	null	{"float",1.114286,null,null,null,null,null}	{"across_sample",null,"Mean Absolute Error","model1"}	"1.1"	[]	[]
"model2"	1.913486	"rmse"	"Root Mean Squared Error"	"across_sample"	null	null	null	null	{"float",1.913486,null,null,null,null,null}	{"across_sample",null,"Root Mean Squared Error","model2"}	"1.9"	[]	[]
"model1"	1.298351	"rmse"	"Root Mean Squared Error"	"across_sample"	null	null	null	null	{"float",1.298351,null,null,null,null,null}	{"across_sample",null,"Root Mean Squared Error","model1"}	"1.3"	[]	[]

Wide presentations remain useful for dashboards. to_wide() pivots metric values while preserving formatting hints—perfect for quick scorecards:

ard.to_wide(index=["estimate"], columns=["metric"]).head()

shape: (2, 3)

estimate	mae	rmse
enum	str	str
"model2"	"1.5"	"1.9"
"model1"	"1.1"	"1.3"

For ad-hoc layouts you can pull the lazy frame to long form and use Polars’ native pivot via the pivot() convenience wrapper when you want full control over the value column and aggregation:

ard.pivot(on="metric", index=["estimate"], values="stat")

shape: (2, 3)

estimate	mae	rmse
enum	f64	f64
"model1"	1.114286	1.298351
"model2"	1.471429	1.913486

Summaries

Use summary() for a quick diagnostic of the collected dataset or describe() for a formatted printout:

summary() returns a dict of counts that you can log or feed into tests:

ard.summary()

{'n_rows': 4,
 'n_metrics': 2,
 'n_estimates': 2,
 'n_groups': 0,
 'n_subgroups': 0,
 'metrics': ['mae', 'rmse'],
 'estimates': ['model1', 'model2']}

describe() emits a readable console report with sample rows—handy during notebook exploration:

ard.describe()

==================================================
ARD Summary: 4 results
==================================================

Metrics:
  - mae
  - rmse

Estimates:
  - model1
  - model2

Preview:
shape: (4, 14)
┌──────────┬──────────┬────────┬──────────────┬───┬─────────────┬──────────┬───────────┬───────────┐
│ estimate ┆ value    ┆ metric ┆ label        ┆ … ┆ context     ┆ stat_fmt ┆ warning   ┆ error     │
│ ---      ┆ ---      ┆ ---    ┆ ---          ┆   ┆ ---         ┆ ---      ┆ ---       ┆ ---       │
│ enum     ┆ f64      ┆ enum   ┆ enum         ┆   ┆ struct[4]   ┆ str      ┆ list[str] ┆ list[str] │
╞══════════╪══════════╪════════╪══════════════╪═══╪═════════════╪══════════╪═══════════╪═══════════╡
│ model2   ┆ 1.471429 ┆ mae    ┆ Mean         ┆ … ┆ {"across_sa ┆ 1.5      ┆ []        ┆ []        │
│          ┆          ┆        ┆ Absolute     ┆   ┆ mple",null, ┆          ┆           ┆           │
│          ┆          ┆        ┆ Error        ┆   ┆ "Mean Ab…   ┆          ┆           ┆           │
│ model1   ┆ 1.114286 ┆ mae    ┆ Mean         ┆ … ┆ {"across_sa ┆ 1.1      ┆ []        ┆ []        │
│          ┆          ┆        ┆ Absolute     ┆   ┆ mple",null, ┆          ┆           ┆           │
│          ┆          ┆        ┆ Error        ┆   ┆ "Mean Ab…   ┆          ┆           ┆           │
│ model1   ┆ 1.298351 ┆ rmse   ┆ Root Mean    ┆ … ┆ {"across_sa ┆ 1.3      ┆ []        ┆ []        │
│          ┆          ┆        ┆ Squared      ┆   ┆ mple",null, ┆          ┆           ┆           │
│          ┆          ┆        ┆ Error        ┆   ┆ "Root Me…   ┆          ┆           ┆           │
│ model2   ┆ 1.913486 ┆ rmse   ┆ Root Mean    ┆ … ┆ {"across_sa ┆ 1.9      ┆ []        ┆ []        │
│          ┆          ┆        ┆ Squared      ┆   ┆ mple",null, ┆          ┆           ┆           │
│          ┆          ┆        ┆ Error        ┆   ┆ "Root Me…   ┆          ┆           ┆           │
└──────────┴──────────┴────────┴──────────────┴───┴─────────────┴──────────┴───────────┴───────────┘