Analysis Results Data (ARD)

Fixed-schema container for evaluation metrics

import polars as pl
from polars_eval_metrics import MetricDefine, MetricEvaluator
from polars_eval_metrics.ard import ARD
from data_generator import generate_sample_data

pl.Config.set_tbl_rows(8)
polars.config.Config

Overview

The Analysis Results Data (ARD) container holds every evaluation result produced by MetricEvaluator. Each ARD instance wraps a Polars LazyFrame so results stay lazy until you explicitly collect them. The container standardises the schema for downstream processing, keeps metric metadata attached, and preserves display-friendly ordering through enum-backed columns.

Why ARD?

  • Canonical layout: every result set shares the same structured columns, regardless of metric mix or grouping configuration.
  • Type preservation: statistical values live in a typed struct that keeps floats, integers, booleans, strings, and nested payloads distinct.
  • Lazy by default: filters and joins can be composed on the underlying lazy frame and collected only when needed.
  • Display metadata: columns such as metric, label, and estimate use Enum types so tables respect the metric definition order instead of alphabetical sorting.

Canonical Columns

The minimum schema that every ARD provides is shown below. Calling ARD.schema exposes these columns.

Column Type Description
groups pl.Struct Primary grouping keys (e.g. treatment, site).
subgroups pl.Struct Subgroup breakdowns (e.g. gender, race).
estimate pl.Enum Estimate identifier for the model or prediction column.
metric pl.Enum Metric name registered in the metric registry.
label pl.Utf8 Human-readable display label
stat pl.Struct Typed value container (see below).
stat_fmt pl.Utf8 Default formatted presentation of the statistic.
context pl.Struct Metadata describing how the metric was computed.
warning pl.List(pl.Utf8) Captured warnings generated while evaluating the metric.
error pl.List(pl.Utf8) Captured errors when evaluation falls back to placeholders.
id pl.Struct Entity identifiers for within-subject / visit metrics.

The stat struct splits the stored result across typed channels:

  • type: value hint such as "float", "int", "bool", "string", or "struct"
  • value_float, value_int, value_bool, value_str, value_struct: mutually exclusive slots for the actual statistic
  • format: optional Python format string used when rendering. The rendered value is kept in stat_fmt so downstream code can use either the raw struct or the default text representation without recomputing formatting.

Any warnings or errors that occur while evaluating a metric are preserved in the warning and error list columns. When evaluation succeeds both columns contain empty lists; if an expression fails, the ARD still returns a placeholder row with the captured diagnostic message so the failure is visible without breaking collection.

Producing ARD from an Evaluation

The evaluator returns a Polars DataFrame by default, keeping columns such as label, metric, estimate, and value ready for display. When you need the full ARD structure, reuse the lazy output with the :class:polars_eval_metrics.ard.ARD helper.

# Sample data with two estimates, grouped by treatment and gender
raw = generate_sample_data(n_subjects=4, n_visits=2, n_groups=2)
metrics = [
    MetricDefine(name="mae", label="Mean Absolute Error"),
    MetricDefine(name="rmse", label="Root Mean Squared Error"),
]

evaluator = MetricEvaluator(
    df=raw,
    metrics=metrics,
    ground_truth="actual",
    estimates=["model1", "model2"],
)

result = evaluator.evaluate()
result
shape: (4, 6)
estimate metric label value metric_type scope
enum enum enum str str str
"model1" "mae" "Mean Absolute Error" "1.1" "across_sample" null
"model2" "mae" "Mean Absolute Error" "1.5" "across_sample" null
"model1" "rmse" "Root Mean Squared Error" "1.3" "across_sample" null
"model2" "rmse" "Root Mean Squared Error" "1.9" "across_sample" null

ARD Container

Convert the evaluation output to an ARD object:

lazy_result = evaluator.evaluate(collect=False)
ard = ARD(lazy_result)
ard
ARD(summary={'n_rows': 4, 'n_metrics': 2, 'n_estimates': 2, 'n_groups': 0, 'n_subgroups': 0, 'metrics': ['mae', 'rmse'], 'estimates': ['model1', 'model2']})
ard.schema
{'estimate': Enum(categories=['model1', 'model2']),
 'value': Float64,
 'metric': Enum(categories=['mae', 'rmse']),
 'label': Enum(categories=['Mean Absolute Error', 'Root Mean Squared Error']),
 'metric_type': String,
 'scope': String,
 'id': Null,
 'groups': Null,
 'subgroups': Null,
 'stat': Struct({'type': String, 'value_float': Float64, 'value_int': Int64, 'value_bool': Boolean, 'value_str': String, 'value_struct': Null, 'format': String}),
 'context': Struct({'metric_type': String, 'scope': String, 'label': String, 'estimate_label': String}),
 'stat_fmt': String,
 'warning': List(String),
 'error': List(String)}

Collecting produces the backward-compatible table with struct columns:

canonical = ard.collect()
canonical
shape: (4, 11)
id groups subgroups estimate metric label stat stat_fmt warning error context
null null null enum enum enum struct[7] str list[str] list[str] struct[4]
null null null "model1" "mae" "Mean Absolute Error" {"float",1.114286,null,null,null,null,null} "1.1" [] [] {"across_sample",null,"Mean Absolute Error","model1"}
null null null "model2" "mae" "Mean Absolute Error" {"float",1.471429,null,null,null,null,null} "1.5" [] [] {"across_sample",null,"Mean Absolute Error","model2"}
null null null "model2" "rmse" "Root Mean Squared Error" {"float",1.913486,null,null,null,null,null} "1.9" [] [] {"across_sample",null,"Root Mean Squared Error","model2"}
null null null "model1" "rmse" "Root Mean Squared Error" {"float",1.298351,null,null,null,null,null} "1.3" [] [] {"across_sample",null,"Root Mean Squared Error","model1"}

The typed stat payload is preserved even though the value column is purely presentational:

canonical["stat"][0]
{'type': 'float',
 'value_float': 1.114285714285714,
 'value_int': None,
 'value_bool': None,
 'value_str': None,
 'value_struct': None,
 'format': None}

Structured columns can be unnested to inspect individual components:

canonical.select([
    pl.col("estimate"),
    pl.col("metric"),
    pl.col("stat").struct.field("value_float").alias("value_float"),
    pl.col("stat").struct.field("format").alias("format"),
])
shape: (4, 4)
estimate metric value_float format
enum enum f64 str
"model1" "mae" 1.114286 null
"model2" "mae" 1.471429 null
"model2" "rmse" 1.913486 null
"model1" "rmse" 1.298351 null

ARD.get_stats() offers a quick way to access raw values with optional metadata:

Call it without arguments when you just want the canonical metric/value pairs:

ard.get_stats()
shape: (4, 2)
metric value
enum f64
"mae" 1.114286
"mae" 1.471429
"rmse" 1.913486
"rmse" 1.298351

Pass include_metadata=True if you need to see the stored stat type tag and format hint alongside the value:

ard.get_stats(include_metadata=True)
shape: (4, 5)
metric value type format formatted
enum f64 str null str
"mae" 1.114286 "float" null "1.1"
"mae" 1.471429 "float" null "1.5"
"rmse" 1.913486 "float" null "1.9"
"rmse" 1.298351 "float" null "1.3"

Normalising Struct Columns

Empty structs are often produced when downstream code prefers explicit placeholders. The helper methods on ARD let you toggle between empty and null representations:

with_empty_as_null() collapses all-null structs and blank estimates to proper nulls so filters behave as expected:

ard.with_empty_as_null().collect()
shape: (4, 11)
id groups subgroups estimate metric label stat stat_fmt warning error context
null null null enum enum enum struct[7] str list[str] list[str] struct[4]
null null null "model2" "mae" "Mean Absolute Error" {"float",1.471429,null,null,null,null,null} "1.5" [] [] {"across_sample",null,"Mean Absolute Error","model2"}
null null null "model1" "mae" "Mean Absolute Error" {"float",1.114286,null,null,null,null,null} "1.1" [] [] {"across_sample",null,"Mean Absolute Error","model1"}
null null null "model1" "rmse" "Root Mean Squared Error" {"float",1.298351,null,null,null,null,null} "1.3" [] [] {"across_sample",null,"Root Mean Squared Error","model1"}
null null null "model2" "rmse" "Root Mean Squared Error" {"float",1.913486,null,null,null,null,null} "1.9" [] [] {"across_sample",null,"Root Mean Squared Error","model2"}

with_null_as_empty() does the reverse—replacing nulls with empty structs so templating code can access the fields safely:

ard.with_null_as_empty().collect()
shape: (4, 11)
id groups subgroups estimate metric label stat stat_fmt warning error context
null null null enum enum enum struct[7] str list[str] list[str] struct[4]
null null null "model2" "mae" "Mean Absolute Error" {"float",1.471429,null,null,null,null,null} "1.5" [] [] {"across_sample",null,"Mean Absolute Error","model2"}
null null null "model1" "mae" "Mean Absolute Error" {"float",1.114286,null,null,null,null,null} "1.1" [] [] {"across_sample",null,"Mean Absolute Error","model1"}
null null null "model1" "rmse" "Root Mean Squared Error" {"float",1.298351,null,null,null,null,null} "1.3" [] [] {"across_sample",null,"Root Mean Squared Error","model1"}
null null null "model2" "rmse" "Root Mean Squared Error" {"float",1.913486,null,null,null,null,null} "1.9" [] [] {"across_sample",null,"Root Mean Squared Error","model2"}

Transformation

to_long() flattens the lazy frame into a DataFrame where grouping metadata is expanded and the formatted stat value is exposed as value:

ard.to_long()
shape: (4, 14)
estimate value metric label metric_type scope id groups subgroups stat context stat_fmt warning error
enum str enum enum str str null null null struct[7] struct[4] str list[str] list[str]
"model2" "1.5" "mae" "Mean Absolute Error" "across_sample" null null null null {"float",1.471429,null,null,null,null,null} {"across_sample",null,"Mean Absolute Error","model2"} "1.5" [] []
"model1" "1.1" "mae" "Mean Absolute Error" "across_sample" null null null null {"float",1.114286,null,null,null,null,null} {"across_sample",null,"Mean Absolute Error","model1"} "1.1" [] []
"model2" "1.9" "rmse" "Root Mean Squared Error" "across_sample" null null null null {"float",1.913486,null,null,null,null,null} {"across_sample",null,"Root Mean Squared Error","model2"} "1.9" [] []
"model1" "1.3" "rmse" "Root Mean Squared Error" "across_sample" null null null null {"float",1.298351,null,null,null,null,null} {"across_sample",null,"Root Mean Squared Error","model1"} "1.3" [] []

unnest() is useful when you just need the struct columns expanded in place without the extra pivoting logic:

ard.unnest()
shape: (4, 14)
estimate value metric label metric_type scope id groups subgroups stat context stat_fmt warning error
enum f64 enum enum str str null null null struct[7] struct[4] str list[str] list[str]
"model2" 1.471429 "mae" "Mean Absolute Error" "across_sample" null null null null {"float",1.471429,null,null,null,null,null} {"across_sample",null,"Mean Absolute Error","model2"} "1.5" [] []
"model1" 1.114286 "mae" "Mean Absolute Error" "across_sample" null null null null {"float",1.114286,null,null,null,null,null} {"across_sample",null,"Mean Absolute Error","model1"} "1.1" [] []
"model2" 1.913486 "rmse" "Root Mean Squared Error" "across_sample" null null null null {"float",1.913486,null,null,null,null,null} {"across_sample",null,"Root Mean Squared Error","model2"} "1.9" [] []
"model1" 1.298351 "rmse" "Root Mean Squared Error" "across_sample" null null null null {"float",1.298351,null,null,null,null,null} {"across_sample",null,"Root Mean Squared Error","model1"} "1.3" [] []

Wide presentations remain useful for dashboards. to_wide() pivots metric values while preserving formatting hints—perfect for quick scorecards:

ard.to_wide(index=["estimate"], columns=["metric"]).head()
shape: (2, 3)
estimate mae rmse
enum str str
"model2" "1.5" "1.9"
"model1" "1.1" "1.3"

For ad-hoc layouts you can pull the lazy frame to long form and use Polars’ native pivot via the pivot() convenience wrapper when you want full control over the value column and aggregation:

ard.pivot(on="metric", index=["estimate"], values="stat")
shape: (2, 3)
estimate mae rmse
enum f64 f64
"model1" 1.114286 1.298351
"model2" 1.471429 1.913486

Summaries

Use summary() for a quick diagnostic of the collected dataset or describe() for a formatted printout:

summary() returns a dict of counts that you can log or feed into tests:

ard.summary()
{'n_rows': 4,
 'n_metrics': 2,
 'n_estimates': 2,
 'n_groups': 0,
 'n_subgroups': 0,
 'metrics': ['mae', 'rmse'],
 'estimates': ['model1', 'model2']}

describe() emits a readable console report with sample rows—handy during notebook exploration:

ard.describe()
==================================================
ARD Summary: 4 results
==================================================

Metrics:
  - mae
  - rmse

Estimates:
  - model1
  - model2

Preview:
shape: (4, 14)
┌──────────┬──────────┬────────┬──────────────┬───┬─────────────┬──────────┬───────────┬───────────┐
│ estimate ┆ value    ┆ metric ┆ label        ┆ … ┆ context     ┆ stat_fmt ┆ warning   ┆ error     │
│ ---      ┆ ---      ┆ ---    ┆ ---          ┆   ┆ ---         ┆ ---      ┆ ---       ┆ ---       │
│ enum     ┆ f64      ┆ enum   ┆ enum         ┆   ┆ struct[4]   ┆ str      ┆ list[str] ┆ list[str] │
╞══════════╪══════════╪════════╪══════════════╪═══╪═════════════╪══════════╪═══════════╪═══════════╡
│ model2   ┆ 1.471429 ┆ mae    ┆ Mean         ┆ … ┆ {"across_sa ┆ 1.5      ┆ []        ┆ []        │
│          ┆          ┆        ┆ Absolute     ┆   ┆ mple",null, ┆          ┆           ┆           │
│          ┆          ┆        ┆ Error        ┆   ┆ "Mean Ab…   ┆          ┆           ┆           │
│ model1   ┆ 1.114286 ┆ mae    ┆ Mean         ┆ … ┆ {"across_sa ┆ 1.1      ┆ []        ┆ []        │
│          ┆          ┆        ┆ Absolute     ┆   ┆ mple",null, ┆          ┆           ┆           │
│          ┆          ┆        ┆ Error        ┆   ┆ "Mean Ab…   ┆          ┆           ┆           │
│ model1   ┆ 1.298351 ┆ rmse   ┆ Root Mean    ┆ … ┆ {"across_sa ┆ 1.3      ┆ []        ┆ []        │
│          ┆          ┆        ┆ Squared      ┆   ┆ mple",null, ┆          ┆           ┆           │
│          ┆          ┆        ┆ Error        ┆   ┆ "Root Me…   ┆          ┆           ┆           │
│ model2   ┆ 1.913486 ┆ rmse   ┆ Root Mean    ┆ … ┆ {"across_sa ┆ 1.9      ┆ []        ┆ []        │
│          ┆          ┆        ┆ Squared      ┆   ┆ mple",null, ┆          ┆           ┆           │
│          ┆          ┆        ┆ Error        ┆   ┆ "Root Me…   ┆          ┆           ┆           │
└──────────┴──────────┴────────┴──────────────┴───┴─────────────┴──────────┴───────────┴───────────┘