Skip to content

🗃️ FungiTastic Dataset

FungiTastic is a large-scale, multi-modal benchmark dataset for computer vision, machine learning, and biodiversity research, centered around wild fungi observations. It offers expert-labeled data across more than 5,000 species, collected over 20+ years, and is designed to power research in fine-grained recognition, domain adaptation, multi-modal learning, open-set recognition, and more.


🚀 What’s Inside?

  • Size: ~350,000 observations, >600,000 images, ~5,000 species.
  • Modalities: Photographs, satellite data, climate time series, segmentation masks, expert taxon labels, rich metadata, and image captions.
  • Expert Curation: Species labels are expert-verified; a subset includes DNA-sequenced ground truth.
  • Region: Predominantly Denmark and Northern Europe.
  • Time span: 2003–2023.

🧑‍🔬 What Can You Use It For?

FungiTastic is specifically designed to benchmark and develop advanced ML models for:

  • Fine-grained image classification (closed-set, open-set)
  • Few-shot learning and rare species recognition
  • Multi-modal and multi-task learning (combine visual, tabular, geospatial, and textual data)
  • Domain adaptation and temporal shift (study seasonal, habitat, and year-to-year distribution changes)
  • Vision-language modeling (with detailed image captions)
  • Semantic and instance segmentation
  • Cost-sensitive classification (e.g., recognizing poisonous vs. edible species)

See the Benchmarks section for a deep dive on supported challenges.


📚 Dataset Subsets

FungiTastic is split into several subsets, each tailored for different research tasks:

1. FungiTastic (Full)

  • The main benchmark set: >346k observations, >4,500 species, with all modalities.
  • Use: Closed-set & open-set classification, multi-modal modeling.

2. FungiTastic-Mini (FungiTastic–M)

  • A compact subset from 6 challenging genera (e.g., Russula, Amanita, Boletus).
  • Includes body part segmentation masks for ~70k images.
  • Use: Fast prototyping, segmentation, few-shot, open-set.

3. FungiTastic–FS (Few-shot)

  • Observations from species with <5 training samples.
  • Use: Few-shot learning & rare species recognition.

Details and statistics for each subset are provided in the Subsets and Benchmarks pages.


🔗 Available Data Modalities

Each observation can include a combination of:


Data Type Description Link
Photographs & Captions Images and generated text descriptions Photographs & Captions
Taxonomic Metadata Labels, toxicity, environment info Metadata
Segmentation Masks Body part annotations (Mini subset) Segmentation Masks
Satellite Data Local satellite/environmental context Satellite Data
Climate Series Long-term climate for each location Climate Time Series

📥 How to Download

Instructions for access and download are provided in the Dataset Download Guide.


📝 Citations & Further Reading

For detailed methodology, statistics, and baselines, see our paper and the Citation page.


Still have questions? Explore the FAQ or reach out via GitHub Issues.