MetricDefine

MetricDefine is the entry point for describing metrics in the polars-eval-metrics framework. A definition bundles the metric identifier, optional labels, the aggregation pattern, and the Polars expressions that power lazy evaluation. Once defined, the metric can be registered globally or handed directly to MetricEvaluator.

Key Parameters

Argument	Type	Default	Notes
`name`	`str`	required	Unique identifier. Use the `metric:summary` form (for example `"mae:mean"`) to compose built-ins.
`label`	`str \| None`	derived from `name`	Optional display name. When omitted the name string is reused.
`type`	`MetricType`	`across_sample`	Controls the aggregation level. Determines whether `within_expr` or `across_expr` are required.
`scope`	`MetricScope \| None`	`None`	Narrows where the metric is applied (`global`, `model`, or `group`). When omitted the evaluator keeps its default grouping over estimates and group columns.
`within_expr`	`list[str \| pl.Expr \| MetricInfo] \| None`	`None`	First-level aggregations. Accept built-in metric names, Polars expressions, or `MetricInfo` instances. Required for custom `across_subject` / `across_visit` metrics.
`across_expr`	`str \| pl.Expr \| MetricInfo \| None`	`None`	Second-level aggregation or final expression. Strings reference summary names in `MetricRegistry`.

Metric Types

across_sample: Aggregates directly from the sample-level errors.
within_subject / within_visit: Produces metrics inside each entity and keeps identifiers in the result id struct.
across_subject / across_visit: Uses within_expr to summarise inside the entity, then across_expr for the overall statistic.

Metric Scopes

scope is orthogonal to the aggregation type and controls where a metric is computed:

global: Run once for the entire dataset.
model: Run per model, ignoring group splits.
group: Run per group, ignoring per-model splits.

If scope is omitted the evaluator inherits its default behaviour.

Setup

import polars as pl

pl.Config.set_fmt_str_lengths(200)
pl.Config.set_tbl_rows(-1)

from polars_eval_metrics import MetricDefine

Quick Start: Mean Absolute Error

A minimal definition creates the built-in MAE metric. The evaluator injects the absolute_error column, so only the metric name is required.

MetricDefine(name="mae")

MetricDefine(name='mae', type=across_sample)
  Label: 'mae'
  Across-entity expression:
    - [mae] col("absolute_error").mean()

(
  pl.LazyFrame
  .select(col("absolute_error").mean().alias("value"))
)

Note

name and label: how other components reference the metric.
type=across_sample: MAE operates over all samples without grouping.
summary: the Polars expression backing the metric. Built-ins automatically alias the result to value.
pl chain: the lazy operations the evaluator applies when executing the definition.

Inspect Built-in Metrics

The registry exposes the available metrics and summaries. The expressions are returned as Polars lazy operations that fit directly into within_expr or across_expr.

Built-in metrics always populate the value column unless you extend the registry with a custom MetricInfo that carries a different payload.

Code

from polars_eval_metrics import MetricRegistry

metrics_data = [
    {"name": name, "expression": str(MetricRegistry.get_metric(name))}
    for name in sorted(MetricRegistry.list_metrics())
]
pl.DataFrame(metrics_data)

shape: (15, 2)

name	expression
str	str
"mae"	"MetricInfo(expr=<Expr ['col("absolute_error").mean()'] at 0x7FA39EC29250>, value_kind='float', format=None)"
"mape"	"MetricInfo(expr=<Expr ['col("absolute_percent_error").…'] at 0x7FA39EC29410>, value_kind='float', format=None)"
"me"	"MetricInfo(expr=<Expr ['col("error").mean()'] at 0x7FA39EC29210>, value_kind='float', format=None)"
"mpe"	"MetricInfo(expr=<Expr ['col("percent_error").mean()'] at 0x7FA39EC29390>, value_kind='float', format=None)"
"mse"	"MetricInfo(expr=<Expr ['col("squared_error").mean()'] at 0x7FA39EC29290>, value_kind='float', format=None)"
"n_sample"	"MetricInfo(expr=<Expr ['col("sample_index").n_unique()'] at 0x7FA39EC29DD0>, value_kind='int', format=None)"
"n_sample_with_data"	"MetricInfo(expr=<Expr ['col("error").is_not_null().sum…'] at 0x7FA39EC29E10>, value_kind='int', format=None)"
"n_subject"	"MetricInfo(expr=<Expr ['col("subject_id").n_unique()'] at 0x7FA39EC29490>, value_kind='int', format=None)"
"n_subject_with_data"	"MetricInfo(expr=<Expr ['col("subject_id").filter(col("…'] at 0x7FA3B828BE10>, value_kind='int', format=None)"
"n_visit"	"MetricInfo(expr=<Expr ['col("subject_id").as_struct([c…'] at 0x7FA39EC295D0>, value_kind='int', format=None)"
"n_visit_with_data"	"MetricInfo(expr=<Expr ['col("subject_id").as_struct([c…'] at 0x7FA3AF3A6B10>, value_kind='int', format=None)"
"pct_sample_with_data"	"MetricInfo(expr=<Expr ['[(col("error").is_not_null().m…'] at 0x7FA39EC2AE50>, value_kind='float', format=None)"
"pct_subject_with_data"	"MetricInfo(expr=<Expr ['[([(col("subject_id").filter(c…'] at 0x7FA39EC12090>, value_kind='float', format=None)"
"pct_visit_with_data"	"MetricInfo(expr=<Expr ['[([(col("subject_id").as_struc…'] at 0x7FA39EC29F10>, value_kind='float', format=None)"
"rmse"	"MetricInfo(expr=<Expr ['col("squared_error").mean().sq…'] at 0x7FA39EC292D0>, value_kind='float', format=None)"

Code

summaries_data = [
    {"name": name, "expression": str(MetricRegistry.get_summary(name))}
    for name in sorted(MetricRegistry.list_summaries())
]
pl.DataFrame(summaries_data)

shape: (14, 2)

name	expression
str	str
"max"	"col("value").max()"
"mean"	"col("value").mean()"
"median"	"col("value").median()"
"min"	"col("value").min()"
"p1"	"col("value").quantile()"
"p25"	"col("value").quantile()"
"p5"	"col("value").quantile()"
"p75"	"col("value").quantile()"
"p90"	"col("value").quantile()"
"p95"	"col("value").quantile()"
"p99"	"col("value").quantile()"
"sqrt"	"col("value").sqrt()"
"std"	"col("value").std()"
"sum"	"col("value").sum()"

Hierarchical Aggregation Patterns

Many analyses require metrics at multiple levels (for example subject or visit summaries). Built-in metric names support the colon convention, so "mae:mean" computes MAE within each subject and then averages the result.

MetricDefine(name="mae:mean", type="across_subject")

MetricDefine(name='mae:mean', type=across_subject)
  Label: 'mae:mean'
  Within-entity expressions:
    - [mae] col("absolute_error").mean()
  Across-entity expression:
    - [mean] col("value").mean()

(
  pl.LazyFrame
  .group_by('subject_id')
  .agg(col("absolute_error").mean().alias("value"))
  .select(col("value").mean().alias("value"))
)

Internally the definition resolves to:

[mae] col("absolute_error").mean() — per-subject MAE.
[mean] col("value").mean() — average of the per-subject values.

Polars lazy chain equivalent to:

.group_by("subject_id").agg(...).select(...)

Apply the same metric:summary pattern to visits or any other second-level summary that lives in the registry.

Custom Expressions

To go beyond the built-ins, supply Polars expressions. Strings in within_expr or across_expr still resolve through the registry, while expressions give you full control over column math.

MetricDefine(
    name="pct_within_1",
    label="% Predictions Within +/- 1",
    type="across_sample",
    across_expr=(pl.col("absolute_error") < 1).mean() * 100,
)

MetricDefine(name='pct_within_1', type=across_sample)
  Label: '% Predictions Within +/- 1'
  Across-entity expression:
    - [custom] [([(col("absolute_error")) < (dyn int: 1)].mean()) * (dyn int: 100)]

(
  pl.LazyFrame
  .select(((col("absolute_error")) < (1).mean()) * (100).alias("value"))
)

When a within_expr list returns more than one expression, alias each output so you can reference it from across_expr:

MetricDefine(
    name="mae_p90_by_subject",
    label="90th Percentile of Subject MAEs",
    type="across_subject",
    within_expr="mae",  # resolves to the built-in metric, stored in `value`
    across_expr=pl.col("value").quantile(0.9, interpolation="linear"),
)

MetricDefine(name='mae_p90_by_subject', type=across_subject)
  Label: '90th Percentile of Subject MAEs'
  Within-entity expressions:
    - [mae] col("absolute_error").mean()
  Across-entity expression:
    - [custom] col("value").quantile()

(
  pl.LazyFrame
  .group_by('subject_id')
  .agg(col("absolute_error").mean().alias("value"))
  .select(col("value").quantile().alias("value"))
)

Mixing Built-ins and Custom Outputs

Combining built-ins with additional columns lets you calculate more advanced statistics, such as a weighted average that includes per-subject weights.

MetricDefine(
    name="weighted_mae",
    label="Weighted Average of Subject MAEs",
    type="across_subject",
    within_expr=[
        "mae",  # stored as `value`
        pl.col("weight").mean().alias("avg_weight"),
    ],
    across_expr=(
        (pl.col("value") * pl.col("avg_weight")).sum()
        / pl.col("avg_weight").sum()
    ),
)

MetricDefine(name='weighted_mae', type=across_subject)
  Label: 'Weighted Average of Subject MAEs'
  Within-entity expressions:
    - [mae] col("absolute_error").mean()
    - [custom] col("weight").mean().alias("avg_weight")
  Across-entity expression:
    - [custom] [([(col("value")) * (col("avg_weight"))].sum()) / (col("avg_weight").sum())]

(
  pl.LazyFrame
  .group_by('subject_id')
  .agg(
[
      col("absolute_error").mean().alias("value"),
      col("weight").mean().alias("avg_weight")
    ]
  )
  .select(((col("value")) * (col("avg_weight")).sum()) /
      (col("avg_weight").sum()).alias("value"))
)

Inline struct output: `mean +/- sd`

You can return richer payloads without registering the metric globally by passing a MetricInfo directly to MetricDefine. The evaluator inspects the expression output to determine the stored type, while the optional format string controls how stat_fmt renders the value:

from polars_eval_metrics.metric_registry import MetricInfo

MetricDefine(
    name="mean_sd_inline",
    type="across_sample",
    across_expr=MetricInfo(
        expr=pl.struct([
            pl.col("absolute_error").mean().alias("mean"),
            pl.col("absolute_error").std().alias("sd"),
        ]),
        format="{0[mean]:.1f} +/- {0[sd]:.1f}",
    ),
)

MetricDefine(name='mean_sd_inline', type=across_sample)
  Label: 'mean_sd_inline'
  Across-entity expression:
    - [custom] col("absolute_error").mean().alias("mean").as_struct([col("absolute_error").std().alias("sd")])

(
  pl.LazyFrame
  .select(col("absolute_error").mean().alias("mean").as_struct([col("absolute_error").std().alias("sd")))
)

The resulting metric stores the struct in stat.value_struct while stat_fmt (and the value column when formatted) displays the familiar mean ± sd string.

Evaluate with MetricEvaluator

MetricDefine instances plug directly into MetricEvaluator. The evaluator feeds error columns, applies scopes, and materialises results as lazy frames.

from polars_eval_metrics import MetricEvaluator
from data_generator import generate_sample_data

data = generate_sample_data(n_subjects=6, n_visits=3, n_groups=2)

metrics = [
    MetricDefine(name="mae"),
    MetricDefine(
        name="pct_within_1",
        type="across_sample",
        across_expr=(pl.col("absolute_error") < 1).mean() * 100,
    ),
    MetricDefine(name="mae:mean", type="across_subject"),
]

evaluator = MetricEvaluator(
    df=data,
    metrics=metrics,
    ground_truth="actual",
    estimates=["model1", "model2"],
    group_by=["treatment"],
)

evaluator.evaluate()

shape: (12, 7)

estimate	metric	label	value	treatment	metric_type	scope
enum	enum	enum	str	str	str	str
"model1"	"mae"	"mae"	"1.0"	"A"	"across_sample"	null
"model2"	"mae"	"mae"	"1.7"	"A"	"across_sample"	null
"model1"	"pct_within_1"	"pct_within_1"	"66.7"	"A"	"across_sample"	null
"model2"	"pct_within_1"	"pct_within_1"	"0.0"	"A"	"across_sample"	null
"model1"	"mae:mean"	"mae:mean"	"1.0"	"A"	"across_subject"	null
"model2"	"mae:mean"	"mae:mean"	"1.7"	"A"	"across_subject"	null
"model1"	"mae"	"mae"	"1.0"	"B"	"across_sample"	null
"model2"	"mae"	"mae"	"1.3"	"B"	"across_sample"	null
"model1"	"pct_within_1"	"pct_within_1"	"25.0"	"B"	"across_sample"	null
"model2"	"pct_within_1"	"pct_within_1"	"37.5"	"B"	"across_sample"	null
"model1"	"mae:mean"	"mae:mean"	"1.1"	"B"	"across_subject"	null
"model2"	"mae:mean"	"mae:mean"	"1.3"	"B"	"across_subject"	null

The resulting frame keeps a formatted value column plus a stat struct that preserves typed payloads. When you register metrics with MetricRegistry.register_metric, use MetricInfo(value_kind=...) to surface non-float structures that downstream consumers can unwrap.

Advanced: Custom Functions

You can opt into user-defined functions when a vectorised Polars expression is not available. The following example keeps the earlier weighted MAE structure but performs the across-entity step inside a NumPy call.

import numpy as np

weighted_average = (
    pl.struct(["value", "avg_weight"]).map_batches(
        lambda rows: pl.Series(
            [
                np.average(
                    rows.struct.field("value"),
                    weights=rows.struct.field("avg_weight"),
                )
            ]
        ),
        return_dtype=pl.Float64,
    )
)

MetricDefine(
    name="weighted_mae_numpy",
    label="Weighted Average of Subject MAEs (NumPy)",
    type="across_subject",
    within_expr=[
        "mae",
        pl.col("weight").mean().alias("avg_weight"),
    ],
    across_expr=weighted_average,
)

MetricDefine(name='weighted_mae_numpy', type=across_subject)
  Label: 'Weighted Average of Subject MAEs (NumPy)'
  Within-entity expressions:
    - [mae] col("absolute_error").mean()
    - [custom] col("weight").mean().alias("avg_weight")
  Across-entity expression:
    - [custom] col("value").as_struct([col("avg_weight")]).python_udf()

(
  pl.LazyFrame
  .group_by('subject_id')
  .agg(
[
      col("absolute_error").mean().alias("value"),
      col("weight").mean().alias("avg_weight")
    ]
  )
  .select(col("value").as_struct([col("avg_weight")).python_udf().alias("value"))
)

Warning

Batch UDFs disable some Polars optimisations and can be slower than pure expressions. Prefer native expressions when possible, and reserve UDFs for cases where vectorised operations do not exist.