🌍 Open-set Classification

Overview

Real-world fungal datasets are never closed: New and previously unseen species appear in the wild each year.
The open-set benchmark asks: Can your model detect when a specimen belongs to an unknown class?

Use Cases

Discovering new or rare species
Deployed models that must avoid confident mistakes on out-of-distribution data

Data & Splits

Training: Species up to end of 2021.
Validation: Contains species first observed in 2022.
Test: Contains species first observed in 2023.
"Unknown" label is used for new classes.

Evaluation Protocol

Primary Metric: Area Under ROC Curve (AUC)
Secondary Metric: True Negative Rate @ 95% True Positive Rate (TNR95)

Baselines & Results

Includes Max Softmax Probability, Max Logit Score, and Nearest Mean approaches, evaluated on both supervised and pre-trained (DINOv2, BEiT) backbones.

Detailed results and code: Baselines & Models

Quick Start

Data splits and scripts available in the repo and Kaggle.
Tutorial: usage/evaluation.md