Skip to content

🧪 Evaluation Protocols

Accurate, fair evaluation is key to FungiTastic!
This page explains how to evaluate your models for all main benchmarks, including metrics, submission format, and leaderboard suggestions.


1. Standard Metrics

Task Main Metric(s)
Closed-set Classification Top-1 Accuracy, Macro F1, Top-3 Accuracy
Open-set Classification AUC, TNR@95%TPR
Few-shot Learning Top-1 Accuracy, Macro F1
Segmentation mIoU, mAP
Cost-sensitive Custom weighted loss, see below

2. How to Evaluate

  • Use the provided *_test.csv files for each benchmark.
  • After inference, output a CSV:

    • For classification: image_id,predicted_label
    • For open-set: add a column for is_unknown or use "unknown" as a label
    • For segmentation: follow COCO or Pascal VOC format
  • Python example (classification):

    import pandas as pd
    from sklearn.metrics import f1_score, accuracy_score
    
    y_true = [...]  # ground-truth labels
    y_pred = [...]  # your predictions
    
    print("Accuracy:", accuracy_score(y_true, y_pred))
    print("Macro F1:", f1_score(y_true, y_pred, average="macro"))
    

  • Open-set: Use ROC/AUC and TNR metrics, see Open-set Models


3. Cost-sensitive Evaluation

  • Download or define a cost matrix: higher penalties for dangerous mistakes (e.g., poisonous predicted edible).
  • Use sample code from Baselines & Models or Appendix A in the paper.

4. Submitting to Leaderboards


Need more? Check Benchmarks for protocol details or open an issue!