# NER Comparison & Annotation Tool > A browser-based tool for comparing named-entity recognition (NER) model outputs side by side and for building labeled gold datasets by hand or by correcting model predictions. JSON in, JSON out. No account, no server — all state lives in browser localStorage. This file is a full-content mirror of the in-app documentation, intended for LLM agents that operate the tool on a user's behalf. The same content is embedded in `index.html` as a `
` block and as Schema.org JSON-LD. ## The two modes 1. **Compare model outputs** — upload a JSON file containing one or more `modelResponses` per `example` and review them side by side. The tool computes per-model precision, recall, and F1 against any `humanAnnotations`, surfaces disagreements, and exposes a cross-model label confusion matrix with suggest-merge. 2. **Annotate text** — start a new project (or upload existing annotations) and label spans by hand. Each annotation goes into `humanAnnotations`; export the file to keep your gold dataset. You can switch modes at any time by clicking **Clear** in the header and picking the other card on the landing page. ## JSON file format The canonical schema is `/data-schema.json` (JSON Schema Draft-07). Uploads are validated with Ajv at upload time; mismatches surface a path-keyed error. ### Minimal example ```json { "schemaVersion": 2, "examples": [ { "id": "q1", "text": "OpenAI was founded in San Francisco in 2015.", "modelResponses": [ { "modelName": "Model A", "inferenceTime": 0.045, "entities": [ { "text": "OpenAI", "label": "ORG", "start": 0, "end": 6, "confidence": 0.95 }, { "text": "San Francisco", "label": "LOC", "start": 22, "end": 35, "confidence": 0.92 } ] } ], "humanAnnotations": [ { "text": "OpenAI", "label": "Organization", "start": 0, "end": 6 } ] } ], "modelNames": ["Model A"] } ``` ### Required fields - `examples` — array of objects. - `examples[].id` — unique string identifier. - `examples[].text` — the source text. - `examples[].modelResponses` — array (may be empty for hand-annotation projects). - `examples[].modelResponses[].modelName` — string. - `examples[].modelResponses[].entities` — array of `{ text, label }`. - `modelNames` — array of model name strings (top-level). ### Optional fields - `schemaVersion` — integer; current schema is `2`. - `examples[].humanAnnotations` — array of gold entities (treated as the ground truth for P/R/F1). - `examples[].rejectedPredictions` — predictions the user has dismissed; excluded from FP counts and the errors-vs-gold filter. - `entity.start`, `entity.end` — character offsets into `text` (integers, `start <= end`). - `entity.confidence` — number in `[0, 1]`. - `customLabelColors` — map of `label name -> color class or hex triple` (`bg:#fee2e2|text:#991b1b|border:#dc2626`). - `savedThemes` — array of reusable color themes. - `labelDefinitions` — map of `label name -> { description, examples, counterExamples }` for per-label annotation guidance. ### Vocabulary notes - Use `examples` (not `questions`) and `confidence` (not `score`). Older payloads with the legacy keys still load (back-compat shim) but are migrated on save. - The synthetic annotator name `Gold` refers to `humanAnnotations`; do NOT add a real model with that name. ## Common tasks — Compare mode - **Find errors fast**: switch the filter mode to *Errors vs Gold* to see only examples where at least one model disagrees with the human annotations. - **Find label-pair merge candidates**: open *Show label confusion* above the example list, pick two annotators (any two models, or one model vs Gold), and look for off-diagonal cells with score ≥ 0.8 (lightning-bolt icon). - **Promote a model prediction to gold**: click ✓ next to a model entity. If a same-span/different-label gold entity exists you'll get a confirm modal; if not, it's added directly. Undo banner appears for 5 seconds. - **Reject a model prediction**: click ✗. The prediction is excluded from FP counts and the errors-vs-gold filter. Undo within 5 seconds. - **Score a model**: click stars in the scoring panel; pick a category (Best / Good / Fair / Poor); add notes. The Model Summary aggregates totals. ## Common tasks — Annotate mode - **Annotate a span**: highlight text in any example card. Pick from existing labels or type to create a new one. - **Edit a label across the project**: open the Entity Label Reference panel, click the pencil next to a label, and use the Batch Operations section to rename, merge, or delete. When a filter is active, choose between "All examples" and "Current filter" scopes. - **Document a label**: open the label tooltip in the reference panel and add a description / examples / counter-examples. Stored under `labelDefinitions` and exported with the file. - **Add a new example**: enable "Allow adding examples" in Settings, then use the inline separator between examples to insert a new one. ## Export The **Export JSON** button writes a JSON file with the same shape as the upload, plus: - A `scores` map keyed by example id and model name (1–5 star ratings, category, notes). - A `metadata` block with totals and per-model P/R/F1. When a filter is active you'll get a chooser dialog: *Export filtered (N of M)* (default) vs *Export all (M)*. ## Operating the tool as an LLM agent The tool is a single-page React app served at `/`. Key URLs: - `GET /` — landing page (or active workspace if state is in localStorage). - `GET /data-schema.json` — JSON Schema for the upload format. - `GET /example-data.json` — full working example, hooked up to the "Try sample data" CTA. - `GET /llms.txt` — this file. - `GET /how-to-use` — not a route; the static `
` in `/index.html` carries the same content for non-JS consumers. To programmatically prepare an upload, generate a JSON document matching `/data-schema.json` and POST/upload it via the `Upload model output JSON` or `Upload existing annotations` button. The Ajv validator returns errors keyed by path, e.g. `examples[2].modelResponses[0].entities[5].start must be a number`. Fix and retry. ## License MIT.