# Label and Count Policy

## Why There Are Two Configurations

The original binary benchmark contains 1,078 independently human-validated examples. The five-class annotation table contains 40 `Ambiguous` examples and one unresolved example outside the paper's five-class taxonomy. The corrected release therefore provides:

- `binary`: all 1,078 examples;
- `five_class`: 1,037 examples with one of the five defined taxonomy labels.

Ambiguous examples are not converted to Neutral.

The full annotated benchmark contains 1,078 examples. The public clean release contains 1,037 examples after removing 41 ambiguous or unsupported-label cases.

## Standardized Five-Class IDs

The corrected public release uses:

| ID | Label |
|---:|---|
| 0 | Invalidation |
| 1 | Neutral |
| 2 | Support |
| 3 | Validation |
| 4 | Escalation |

This differs from the internal annotation table's temporary encoding. Public consumers should use the standardized release IDs.

## Independent Annotation Tasks

Binary sycophancy detection and five-class response classification were annotated separately. Invalidation and Neutral are not defined as `NON-SYCOPHANTIC`, and Support, Validation, and Escalation are not defined as `SYCOPHANTIC`. No mapping or agreement constraint is expected between the two tasks.

The public configurations keep their labels separate. Researchers should evaluate binary predictions against the binary configuration and five-class predictions against the five-class configuration.

## Evidence Annotations

Evidence annotations are retained but are not always exact substrings. `evidence_is_exact_span` indicates whether the supplied annotation exactly matches the released, redacted comment. Non-exact annotations should be treated as explanatory evidence notes pending manual correction.
