🧪 Evaluation Protocols
Accurate, fair evaluation is key to FungiTastic!
This page explains how to evaluate your models for all main benchmarks, including metrics, submission format, and leaderboard suggestions.
1. Standard Metrics
Task | Main Metric(s) |
---|---|
Closed-set Classification | Top-1 Accuracy, Macro F1, Top-3 Accuracy |
Open-set Classification | AUC, TNR@95%TPR |
Few-shot Learning | Top-1 Accuracy, Macro F1 |
Segmentation | mIoU, mAP |
Cost-sensitive | Custom weighted loss, see below |
2. How to Evaluate
- Use the provided
*_test.csv
files for each benchmark. -
After inference, output a CSV:
- For classification:
image_id,predicted_label
- For open-set: add a column for
is_unknown
or use "unknown" as a label - For segmentation: follow COCO or Pascal VOC format
- For classification:
-
Python example (classification):
import pandas as pd from sklearn.metrics import f1_score, accuracy_score y_true = [...] # ground-truth labels y_pred = [...] # your predictions print("Accuracy:", accuracy_score(y_true, y_pred)) print("Macro F1:", f1_score(y_true, y_pred, average="macro"))
-
Open-set: Use ROC/AUC and TNR metrics, see Open-set Models
3. Cost-sensitive Evaluation
- Download or define a cost matrix: higher penalties for dangerous mistakes (e.g., poisonous predicted edible).
- Use sample code from Baselines & Models or Appendix A in the paper.
4. Submitting to Leaderboards
- For official challenges or leaderboards, see instructions in README or GitHub Issues.
Need more? Check Benchmarks for protocol details or open an issue!