Edge-Triage Back to demo
Metrics source

Current validated frontier.

A judge-friendly summary of what the numbers mean, which runs support the public claims, and why the public story uses a trusted full-50 frontier instead of the raw maximum row in a research ledger. The canonical repository file is docs/CURRENT_FRONTIER.md.

What these numbers mean

Two profiles, two response needs.

Volunteer Speed Profile is for high-volume field sorting; Critical Accuracy Profile is for slower, more careful review. Both are measured on the same 50-sample MEDIC/QCRI gold set and stay far below the 4-second response budget, so the tradeoff is about operational fit, not waiting minutes for a model.

Decision

We present a Pareto frontier, not a single magic number.

Speed matters for volunteer sorting. Accuracy matters for critical incident review. Judges can inspect both profiles in Optimization Mode while the Web UI keeps only two top-level experiences.

KEEP · 50 samples

Volunteer Speed Profile

0.9794

Latency: 158.61 ms

Best for high-volume field reports where responders need fast safe routing.

Run: EDG-307 r0 / 20260427T200056Z

CANDIDATE · 50 samples

Critical Accuracy Profile

0.9818

Latency: 237.97 ms

Best for high-stakes review where a small latency cost is acceptable.

Run: EDG-480 r2 / 20260515T093558Z

FIELD BUDGET

4,000 ms ceiling

< 0.3s

The current profiles preserve most of the time budget for human judgment and communications.

Metric discipline

Why not just show the raw maximum?

results.tsv is an experiment ledger, not a leaderboard. It includes diagnostics, guarded CPU runs, low-VRAM checks, and historical artifact rows. Public claims therefore use trustworthy comparable full-50 rows from docs/CURRENT_FRONTIER.md, with exact run IDs preserved for auditability.

Readable source details

Open the evidence without leaving the demo.

For safety and judge usability, this page exposes curated, versioned excerpts instead of serving every raw research log or repository file from the public site.

Canonical file docs/CURRENT_FRONTIER.md View excerpt

The repository source of truth says to present two validated profiles rather than one headline number:

  • Volunteer Speed Profile: 0.9794 weighted F1 at 158.61 ms.
  • Critical Accuracy Profile: 0.9818 weighted F1 at 237.97 ms.
  • Decision: EDG-480 is the current Pareto/frontier candidate; EDG-479 ablations should not replace it alone.
Experiment ledger results.tsv View rows

Judge-facing rows from the run ledger:

RunWeighted F1LatencyDecision
EDG-307 r00.9794158.61 msKeep for speed
EDG-478 r10.9798286.26 msSuperseded by EDG-480
EDG-480 r20.9818237.97 msAccuracy candidate
EDG-479 r10.9594260.33 msDo not use alone
EDG-479 r20.9595214.79 msDo not use alone
Demo data site/data.json View data

The interactive cards are powered by a small static JSON file. It is safe to expose because it contains only public metrics and curated demo examples.

{
  "frontier": [
    { "profile": "Volunteer Speed Profile", "f1": 0.9794, "latencyMs": 158.61 },
    { "profile": "Critical Accuracy Profile", "f1": 0.9818, "latencyMs": 237.97 }
  ]
}

Open full demo JSON