import polars as pl
200)
pl.Config.set_fmt_str_lengths(-1)
pl.Config.set_tbl_rows(
from polars_eval_metrics import MetricDefine
MetricDefine
MetricDefine
MetricDefine
is the entry point for describing metrics in the polars-eval-metrics framework. A definition bundles the metric identifier, optional labels, the aggregation pattern, and the Polars expressions that power lazy evaluation. Once defined, the metric can be registered globally or handed directly to MetricEvaluator
.
Key Parameters
Argument | Type | Default | Notes |
---|---|---|---|
name |
str |
required | Unique identifier. Use the metric:summary form (for example "mae:mean" ) to compose built-ins. |
label |
str | None |
derived from name |
Optional display name. When omitted the name string is reused. |
type |
MetricType |
across_sample |
Controls the aggregation level. Determines whether within_expr or across_expr are required. |
scope |
MetricScope | None |
None |
Narrows where the metric is applied (global , model , or group ). When omitted the evaluator keeps its default grouping over estimates and group columns. |
within_expr |
list[str | pl.Expr | MetricInfo] | None |
None |
First-level aggregations. Accept built-in metric names, Polars expressions, or MetricInfo instances. Required for custom across_subject / across_visit metrics. |
across_expr |
str | pl.Expr | MetricInfo | None |
None |
Second-level aggregation or final expression. Strings reference summary names in MetricRegistry . |
Metric Types
across_sample
: Aggregates directly from the sample-level errors.within_subject
/within_visit
: Produces metrics inside each entity and keeps identifiers in the resultid
struct.across_subject
/across_visit
: Useswithin_expr
to summarise inside the entity, thenacross_expr
for the overall statistic.
Metric Scopes
scope
is orthogonal to the aggregation type and controls where a metric is computed:
global
: Run once for the entire dataset.model
: Run per model, ignoring group splits.group
: Run per group, ignoring per-model splits.
If scope
is omitted the evaluator inherits its default behaviour.
Setup
Quick Start: Mean Absolute Error
A minimal definition creates the built-in MAE metric. The evaluator injects the absolute_error
column, so only the metric name is required.
="mae") MetricDefine(name
MetricDefine(name='mae', type=across_sample)
Label: 'mae'
Across-entity expression:
- [mae] col("absolute_error").mean()
(
pl.LazyFrame
.select(col("absolute_error").mean().alias("value"))
)
name
andlabel
: how other components reference the metric.type=across_sample
: MAE operates over all samples without grouping.summary
: the Polars expression backing the metric. Built-ins automatically alias the result tovalue
.pl chain
: the lazy operations the evaluator applies when executing the definition.
Inspect Built-in Metrics
The registry exposes the available metrics and summaries. The expressions are returned as Polars lazy operations that fit directly into within_expr
or across_expr
.
Built-in metrics always populate the
value
column unless you extend the registry with a customMetricInfo
that carries a different payload.
Code
from polars_eval_metrics import MetricRegistry
= [
metrics_data "name": name, "expression": str(MetricRegistry.get_metric(name))}
{for name in sorted(MetricRegistry.list_metrics())
] pl.DataFrame(metrics_data)
name | expression |
---|---|
str | str |
"mae" | "MetricInfo(expr=<Expr ['col("absolute_error").mean()'] at 0x7FA39EC29250>, value_kind='float', format=None)" |
"mape" | "MetricInfo(expr=<Expr ['col("absolute_percent_error").…'] at 0x7FA39EC29410>, value_kind='float', format=None)" |
"me" | "MetricInfo(expr=<Expr ['col("error").mean()'] at 0x7FA39EC29210>, value_kind='float', format=None)" |
"mpe" | "MetricInfo(expr=<Expr ['col("percent_error").mean()'] at 0x7FA39EC29390>, value_kind='float', format=None)" |
"mse" | "MetricInfo(expr=<Expr ['col("squared_error").mean()'] at 0x7FA39EC29290>, value_kind='float', format=None)" |
"n_sample" | "MetricInfo(expr=<Expr ['col("sample_index").n_unique()'] at 0x7FA39EC29DD0>, value_kind='int', format=None)" |
"n_sample_with_data" | "MetricInfo(expr=<Expr ['col("error").is_not_null().sum…'] at 0x7FA39EC29E10>, value_kind='int', format=None)" |
"n_subject" | "MetricInfo(expr=<Expr ['col("subject_id").n_unique()'] at 0x7FA39EC29490>, value_kind='int', format=None)" |
"n_subject_with_data" | "MetricInfo(expr=<Expr ['col("subject_id").filter(col("…'] at 0x7FA3B828BE10>, value_kind='int', format=None)" |
"n_visit" | "MetricInfo(expr=<Expr ['col("subject_id").as_struct([c…'] at 0x7FA39EC295D0>, value_kind='int', format=None)" |
"n_visit_with_data" | "MetricInfo(expr=<Expr ['col("subject_id").as_struct([c…'] at 0x7FA3AF3A6B10>, value_kind='int', format=None)" |
"pct_sample_with_data" | "MetricInfo(expr=<Expr ['[(col("error").is_not_null().m…'] at 0x7FA39EC2AE50>, value_kind='float', format=None)" |
"pct_subject_with_data" | "MetricInfo(expr=<Expr ['[([(col("subject_id").filter(c…'] at 0x7FA39EC12090>, value_kind='float', format=None)" |
"pct_visit_with_data" | "MetricInfo(expr=<Expr ['[([(col("subject_id").as_struc…'] at 0x7FA39EC29F10>, value_kind='float', format=None)" |
"rmse" | "MetricInfo(expr=<Expr ['col("squared_error").mean().sq…'] at 0x7FA39EC292D0>, value_kind='float', format=None)" |
Code
= [
summaries_data "name": name, "expression": str(MetricRegistry.get_summary(name))}
{for name in sorted(MetricRegistry.list_summaries())
] pl.DataFrame(summaries_data)
name | expression |
---|---|
str | str |
"max" | "col("value").max()" |
"mean" | "col("value").mean()" |
"median" | "col("value").median()" |
"min" | "col("value").min()" |
"p1" | "col("value").quantile()" |
"p25" | "col("value").quantile()" |
"p5" | "col("value").quantile()" |
"p75" | "col("value").quantile()" |
"p90" | "col("value").quantile()" |
"p95" | "col("value").quantile()" |
"p99" | "col("value").quantile()" |
"sqrt" | "col("value").sqrt()" |
"std" | "col("value").std()" |
"sum" | "col("value").sum()" |
Hierarchical Aggregation Patterns
Many analyses require metrics at multiple levels (for example subject or visit summaries). Built-in metric names support the colon convention, so "mae:mean"
computes MAE within each subject and then averages the result.
="mae:mean", type="across_subject") MetricDefine(name
MetricDefine(name='mae:mean', type=across_subject)
Label: 'mae:mean'
Within-entity expressions:
- [mae] col("absolute_error").mean()
Across-entity expression:
- [mean] col("value").mean()
(
pl.LazyFrame
.group_by('subject_id')
.agg(col("absolute_error").mean().alias("value"))
.select(col("value").mean().alias("value"))
)
Internally the definition resolves to:
[mae] col("absolute_error").mean()
— per-subject MAE.[mean] col("value").mean()
— average of the per-subject values.Polars lazy chain equivalent to:
"subject_id").agg(...).select(...) .group_by(
Apply the same metric:summary
pattern to visits or any other second-level summary that lives in the registry.
Custom Expressions
To go beyond the built-ins, supply Polars expressions. Strings in within_expr
or across_expr
still resolve through the registry, while expressions give you full control over column math.
MetricDefine(="pct_within_1",
name="% Predictions Within +/- 1",
labeltype="across_sample",
=(pl.col("absolute_error") < 1).mean() * 100,
across_expr )
MetricDefine(name='pct_within_1', type=across_sample)
Label: '% Predictions Within +/- 1'
Across-entity expression:
- [custom] [([(col("absolute_error")) < (dyn int: 1)].mean()) * (dyn int: 100)]
(
pl.LazyFrame
.select(((col("absolute_error")) < (1).mean()) * (100).alias("value"))
)
When a within_expr
list returns more than one expression, alias each output so you can reference it from across_expr
:
MetricDefine(="mae_p90_by_subject",
name="90th Percentile of Subject MAEs",
labeltype="across_subject",
="mae", # resolves to the built-in metric, stored in `value`
within_expr=pl.col("value").quantile(0.9, interpolation="linear"),
across_expr )
MetricDefine(name='mae_p90_by_subject', type=across_subject)
Label: '90th Percentile of Subject MAEs'
Within-entity expressions:
- [mae] col("absolute_error").mean()
Across-entity expression:
- [custom] col("value").quantile()
(
pl.LazyFrame
.group_by('subject_id')
.agg(col("absolute_error").mean().alias("value"))
.select(col("value").quantile().alias("value"))
)
Mixing Built-ins and Custom Outputs
Combining built-ins with additional columns lets you calculate more advanced statistics, such as a weighted average that includes per-subject weights.
MetricDefine(="weighted_mae",
name="Weighted Average of Subject MAEs",
labeltype="across_subject",
=[
within_expr"mae", # stored as `value`
"weight").mean().alias("avg_weight"),
pl.col(
],=(
across_expr"value") * pl.col("avg_weight")).sum()
(pl.col(/ pl.col("avg_weight").sum()
), )
MetricDefine(name='weighted_mae', type=across_subject)
Label: 'Weighted Average of Subject MAEs'
Within-entity expressions:
- [mae] col("absolute_error").mean()
- [custom] col("weight").mean().alias("avg_weight")
Across-entity expression:
- [custom] [([(col("value")) * (col("avg_weight"))].sum()) / (col("avg_weight").sum())]
(
pl.LazyFrame
.group_by('subject_id')
.agg(
[
col("absolute_error").mean().alias("value"),
col("weight").mean().alias("avg_weight")
]
)
.select(((col("value")) * (col("avg_weight")).sum()) /
(col("avg_weight").sum()).alias("value"))
)
Inline struct output: mean +/- sd
You can return richer payloads without registering the metric globally by passing a MetricInfo
directly to MetricDefine
. The evaluator inspects the expression output to determine the stored type, while the optional format
string controls how stat_fmt
renders the value:
from polars_eval_metrics.metric_registry import MetricInfo
MetricDefine(="mean_sd_inline",
nametype="across_sample",
=MetricInfo(
across_expr=pl.struct([
expr"absolute_error").mean().alias("mean"),
pl.col("absolute_error").std().alias("sd"),
pl.col(
]),format="{0[mean]:.1f} +/- {0[sd]:.1f}",
), )
MetricDefine(name='mean_sd_inline', type=across_sample)
Label: 'mean_sd_inline'
Across-entity expression:
- [custom] col("absolute_error").mean().alias("mean").as_struct([col("absolute_error").std().alias("sd")])
(
pl.LazyFrame
.select(col("absolute_error").mean().alias("mean").as_struct([col("absolute_error").std().alias("sd")))
)
The resulting metric stores the struct in stat.value_struct
while stat_fmt
(and the value
column when formatted) displays the familiar mean ± sd
string.
Evaluate with MetricEvaluator
MetricDefine
instances plug directly into MetricEvaluator
. The evaluator feeds error columns, applies scopes, and materialises results as lazy frames.
from polars_eval_metrics import MetricEvaluator
from data_generator import generate_sample_data
= generate_sample_data(n_subjects=6, n_visits=3, n_groups=2)
data
= [
metrics ="mae"),
MetricDefine(name
MetricDefine(="pct_within_1",
nametype="across_sample",
=(pl.col("absolute_error") < 1).mean() * 100,
across_expr
),="mae:mean", type="across_subject"),
MetricDefine(name
]
= MetricEvaluator(
evaluator =data,
df=metrics,
metrics="actual",
ground_truth=["model1", "model2"],
estimates=["treatment"],
group_by
)
evaluator.evaluate()
estimate | metric | label | value | treatment | metric_type | scope |
---|---|---|---|---|---|---|
enum | enum | enum | str | str | str | str |
"model1" | "mae" | "mae" | "1.0" | "A" | "across_sample" | null |
"model2" | "mae" | "mae" | "1.7" | "A" | "across_sample" | null |
"model1" | "pct_within_1" | "pct_within_1" | "66.7" | "A" | "across_sample" | null |
"model2" | "pct_within_1" | "pct_within_1" | "0.0" | "A" | "across_sample" | null |
"model1" | "mae:mean" | "mae:mean" | "1.0" | "A" | "across_subject" | null |
"model2" | "mae:mean" | "mae:mean" | "1.7" | "A" | "across_subject" | null |
"model1" | "mae" | "mae" | "1.0" | "B" | "across_sample" | null |
"model2" | "mae" | "mae" | "1.3" | "B" | "across_sample" | null |
"model1" | "pct_within_1" | "pct_within_1" | "25.0" | "B" | "across_sample" | null |
"model2" | "pct_within_1" | "pct_within_1" | "37.5" | "B" | "across_sample" | null |
"model1" | "mae:mean" | "mae:mean" | "1.1" | "B" | "across_subject" | null |
"model2" | "mae:mean" | "mae:mean" | "1.3" | "B" | "across_subject" | null |
The resulting frame keeps a formatted value
column plus a stat
struct that preserves typed payloads. When you register metrics with MetricRegistry.register_metric
, use MetricInfo(value_kind=...)
to surface non-float structures that downstream consumers can unwrap.
Advanced: Custom Functions
You can opt into user-defined functions when a vectorised Polars expression is not available. The following example keeps the earlier weighted MAE structure but performs the across-entity step inside a NumPy call.
import numpy as np
= (
weighted_average "value", "avg_weight"]).map_batches(
pl.struct([lambda rows: pl.Series(
[
np.average("value"),
rows.struct.field(=rows.struct.field("avg_weight"),
weights
)
]
),=pl.Float64,
return_dtype
)
)
MetricDefine(="weighted_mae_numpy",
name="Weighted Average of Subject MAEs (NumPy)",
labeltype="across_subject",
=[
within_expr"mae",
"weight").mean().alias("avg_weight"),
pl.col(
],=weighted_average,
across_expr )
MetricDefine(name='weighted_mae_numpy', type=across_subject)
Label: 'Weighted Average of Subject MAEs (NumPy)'
Within-entity expressions:
- [mae] col("absolute_error").mean()
- [custom] col("weight").mean().alias("avg_weight")
Across-entity expression:
- [custom] col("value").as_struct([col("avg_weight")]).python_udf()
(
pl.LazyFrame
.group_by('subject_id')
.agg(
[
col("absolute_error").mean().alias("value"),
col("weight").mean().alias("avg_weight")
]
)
.select(col("value").as_struct([col("avg_weight")).python_udf().alias("value"))
)
Batch UDFs disable some Polars optimisations and can be slower than pure expressions. Prefer native expressions when possible, and reserve UDFs for cases where vectorised operations do not exist.