Identification Metrics

What you’ll learn

What ID accuracy measures and how it is calculated
Which predictions are included and which are excluded
What low accuracy means for your identification workflow

Overview

Identification is the final phase of finwave’s ML pipeline. After a feature is detected and its side is classified, an identification model compares the annotation against known individuals in the population and predicts which animal it belongs to. The identification metrics track how often those predictions are correct.

ID Accuracy

What it measures: Of all confirmed ML annotations where the model predicted a specific individual, how many had the correct ID without needing a human correction.

Accuracy = (Confirmed - AnimalIdChanged) / Confirmed x 100

When a researcher reviews an ML annotation and changes the predicted individual ID before confirming, that generates an AnimalIdChanged revision. The metric subtracts those corrections from the total to determine how often the model’s ID prediction was right.

What is included

Only annotations where the model actually predicted an individual are counted. If the model returned “unknown” or made no ID prediction, that annotation is excluded from the accuracy calculation. This keeps the metric focused on the quality of the model’s positive predictions rather than penalizing it for being appropriately uncertain.

Specifically:

AnimalIdChanged revisions are only counted when the OldValue (the model’s original prediction) is not null — meaning the model did predict a specific individual.
If a researcher changes an ID from “unknown” to a specific individual, that is not counted as a model error because the model did not claim to know who it was.

Interpreting the values

High accuracy (e.g., 90%+) — When the model predicts an individual, it is usually correct. Researchers can trust the suggestions and focus their review on confirming rather than correcting.
Moderate accuracy (e.g., 70-90%) — The model makes useful predictions but requires regular correction. Researchers should always verify ID suggestions against the catalog.
Low accuracy (below 70%) — The model frequently misidentifies individuals. This could indicate that the catalog has grown or changed significantly since the model was last trained, or that image quality varies widely.

Null values

If no annotations with model-predicted IDs have been confirmed yet (Confirmed = 0), the metric displays as ”---”. A “Preliminary” badge appears when the number of reviewed annotations falls below the sample threshold (default: 100).

When to consider retraining

A drop in ID accuracy over time — visible in the ML Center’s trend charts — often signals that the model needs retraining. Common triggers include:

New individuals added to the population that the model has never seen.
Catalog growth that increases the number of possible matches, making the task harder.
Changing image conditions such as new camera equipment or different survey locations.

The trend data in the ML Center helps you distinguish between a gradual decline (likely needs retraining) and a one-time dip (possibly a batch of unusual images).

ML Center Overview — How the ML Center works and how metrics are refreshed
Detection Metrics — Precision, recall, and bounding box quality
Classification Metrics — Side accuracy for classification models
Revision Rate — Overall human correction effort
Individual Profiles — How individuals are tracked in the catalog