model.log(...). Those rows go to
history.jsonl in the run directory and, if W&B logging is enabled, to W&B.
Use this page for three things:
- understand the metrics ART emits automatically
- add task-specific metrics from your own rollout code
- track external judge and API spend alongside training metrics
What ART logs automatically
When you callawait model.train(...) or await model.log(train_groups, split="train"),
ART already logs most of the metrics you need to monitor a run.
| Type | Examples |
|---|---|
| Reward | reward/mean, reward/std_dev, reward/exception_rate |
| Loss | loss/train, loss/entropy, loss/kl_div, loss/grad_norm, loss/learning_rate |
| Data | data/step_num_scenarios, data/step_num_trajectories, data/step_num_groups_submitted, data/step_num_groups_trainable |
| Train summary | train/num_groups_submitted, train/num_groups_trainable, train/num_trajectories |
| Time | time/wall_clock_sec, time/step_wall_s, time/step_trainer_s |
| Cost | costs/gpu on LocalBackend when GPU pricing is known |
- cumulative metrics such as
time/cum/trainer_s,data/cum/num_unique_scenarios, andcosts/cum/all - cost rollups such as
costs/train,costs/eval, andcosts/all - throughput metrics such as
throughput/avg_trainer_tok_per_sandthroughput/avg_actor_tok_per_s
Some metrics only appear when the backend or your code provides the underlying
inputs. For example,
throughput/avg_actor_tok_per_s requires both
data/step_actor_tokens and time/step_actor_s.Add task-specific outcome metrics
Attach metrics directly to eachTrajectory when your rollout code knows whether
an attempt succeeded, how many tools it called, or any other task-specific
signal.
reward/ namespace, such as reward/correct and reward/tool_calls.
If you want to record one value per TrajectoryGroup instead of one per
trajectory, pass metrics={...} when you build the group. ART logs those once
per group, using keys like reward/group_difficulty on train steps.
Add step-level metrics ART cannot infer
Usemodel.metrics_builder() for metrics that live outside individual
trajectories, such as actor-side timing, token counts, or idle time.
- log
scenario_idsto unlockdata/cum/num_unique_scenarios - log both
data/step_actor_tokensandtime/step_actor_sto unlock actor throughput metrics - log
time/step_eval_swhen eval runs happen outside the backend - use fully qualified keys like
time/step_actor_sordata/step_actor_tokensfor builder-managed metrics
model.log(...) or
model.train(...) call.
Track judge and API costs
Use@track_api_cost when a function returns a provider response object with
token usage. Wrap the relevant part of your code in a metrics context so ART
knows whether the spend belongs to training or evaluation.
costs/train/llm_judge/correctnessorcosts/eval/llm_judge/correctness- rollups such as
costs/train,costs/eval, andcosts/all - cumulative totals such as
costs/cum/all
provider and model_name to @track_api_cost.
For custom pricing or unsupported models, register pricing on the builder:
Track GPU cost on LocalBackend
LocalBackend can log costs/gpu automatically on train steps. ART currently
auto-detects H200 pricing at $3/hour per GPU. For other hardware, pass an
explicit override: