# NER Comparison & Annotation Tool

> A browser-based tool for comparing named-entity recognition (NER) model outputs side by side and for building labeled gold datasets by hand or by correcting model predictions. JSON in, JSON out. No account, no server — all state lives in browser localStorage.

This file is a full-content mirror of the in-app documentation, intended for LLM agents that operate the tool on a user's behalf. The same content is embedded in `index.html` as a `<section id="how-to-use">` block and as Schema.org JSON-LD.

## The two modes

1. **Compare model outputs** — upload a JSON file containing one or more `modelResponses` per `example` and review them side by side. The tool computes per-model precision, recall, and F1 against any `humanAnnotations`, surfaces disagreements, and exposes a cross-model label confusion matrix with suggest-merge.
2. **Annotate text** — start a new project (or upload existing annotations) and label spans by hand. Each annotation goes into `humanAnnotations`; export the file to keep your gold dataset.

You can switch modes at any time by clicking **Clear** in the header and picking the other card on the landing page.

## JSON file format

The canonical schema is `/data-schema.json` (JSON Schema Draft-07). Uploads are validated with Ajv at upload time; mismatches surface a path-keyed error.

### Minimal example

```json
{
  "schemaVersion": 2,
  "examples": [
    {
      "id": "q1",
      "text": "OpenAI was founded in San Francisco in 2015.",
      "modelResponses": [
        {
          "modelName": "Model A",
          "inferenceTime": 0.045,
          "entities": [
            { "text": "OpenAI", "label": "ORG", "start": 0, "end": 6, "confidence": 0.95 },
            { "text": "San Francisco", "label": "LOC", "start": 22, "end": 35, "confidence": 0.92 }
          ]
        }
      ],
      "humanAnnotations": [
        { "text": "OpenAI", "label": "Organization", "start": 0, "end": 6 }
      ]
    }
  ],
  "modelNames": ["Model A"]
}
```

### Required fields

- `examples` — array of objects.
- `examples[].id` — unique string identifier.
- `examples[].text` — the source text.
- `examples[].modelResponses` — array (may be empty for hand-annotation projects).
- `examples[].modelResponses[].modelName` — string.
- `examples[].modelResponses[].entities` — array of `{ text, label }`.
- `modelNames` — array of model name strings (top-level).

### Optional fields

- `schemaVersion` — integer; current schema is `2`.
- `examples[].humanAnnotations` — array of gold entities (treated as the ground truth for P/R/F1).
- `examples[].rejectedPredictions` — predictions the user has dismissed; excluded from FP counts and the errors-vs-gold filter.
- `entity.start`, `entity.end` — character offsets into `text` (integers, `start <= end`).
- `entity.confidence` — number in `[0, 1]`.
- `customLabelColors` — map of `label name -> color class or hex triple` (`bg:#fee2e2|text:#991b1b|border:#dc2626`).
- `savedThemes` — array of reusable color themes.
- `labelDefinitions` — map of `label name -> { description, examples, counterExamples }` for per-label annotation guidance.

### Vocabulary notes

- Use `examples` (not `questions`) and `confidence` (not `score`). Older payloads with the legacy keys still load (back-compat shim) but are migrated on save.
- The synthetic annotator name `Gold` refers to `humanAnnotations`; do NOT add a real model with that name.

## Common tasks — Compare mode

- **Find errors fast**: switch the filter mode to *Errors vs Gold* to see only examples where at least one model disagrees with the human annotations.
- **Find label-pair merge candidates**: open *Show label confusion* above the example list, pick two annotators (any two models, or one model vs Gold), and look for off-diagonal cells with score ≥ 0.8 (lightning-bolt icon).
- **Promote a model prediction to gold**: click ✓ next to a model entity. If a same-span/different-label gold entity exists you'll get a confirm modal; if not, it's added directly. Undo banner appears for 5 seconds.
- **Reject a model prediction**: click ✗. The prediction is excluded from FP counts and the errors-vs-gold filter. Undo within 5 seconds.
- **Score a model**: click stars in the scoring panel; pick a category (Best / Good / Fair / Poor); add notes. The Model Summary aggregates totals.

## Common tasks — Annotate mode

- **Annotate a span**: highlight text in any example card. Pick from existing labels or type to create a new one.
- **Edit a label across the project**: open the Entity Label Reference panel, click the pencil next to a label, and use the Batch Operations section to rename, merge, or delete. When a filter is active, choose between "All examples" and "Current filter" scopes.
- **Document a label**: open the label tooltip in the reference panel and add a description / examples / counter-examples. Stored under `labelDefinitions` and exported with the file.
- **Add a new example**: enable "Allow adding examples" in Settings, then use the inline separator between examples to insert a new one.

## Export

The **Export JSON** button writes a JSON file with the same shape as the upload, plus:

- A `scores` map keyed by example id and model name (1–5 star ratings, category, notes).
- A `metadata` block with totals and per-model P/R/F1.

When a filter is active you'll get a chooser dialog: *Export filtered (N of M)* (default) vs *Export all (M)*.

## Operating the tool as an LLM agent

The tool is a single-page React app served at `/`. Key URLs:

- `GET /` — landing page (or active workspace if state is in localStorage).
- `GET /data-schema.json` — JSON Schema for the upload format.
- `GET /example-data.json` — full working example, hooked up to the "Try sample data" CTA.
- `GET /llms.txt` — this file.
- `GET /how-to-use` — not a route; the static `<section id="how-to-use">` in `/index.html` carries the same content for non-JS consumers.

To programmatically prepare an upload, generate a JSON document matching `/data-schema.json` and POST/upload it via the `Upload model output JSON` or `Upload existing annotations` button. The Ajv validator returns errors keyed by path, e.g. `examples[2].modelResponses[0].entities[5].start must be a number`. Fix and retry.

## License

MIT.