🗃️ FungiTastic Dataset

FungiTastic is a large-scale, multi-modal benchmark dataset for computer vision, machine learning, and biodiversity research, centered around wild fungi observations. It offers expert-labeled data across more than 5,000 species, collected over 20+ years, and is designed to power research in fine-grained recognition, domain adaptation, multi-modal learning, open-set recognition, and more.

🚀 What’s Inside?

Size: ~350,000 observations, >600,000 images, ~5,000 species.
Modalities: Photographs, satellite data, climate time series, segmentation masks, expert taxon labels, rich metadata, and image captions.
Expert Curation: Species labels are expert-verified; a subset includes DNA-sequenced ground truth.
Region: Predominantly Denmark and Northern Europe.
Time span: 2003–2023.

🧑‍🔬 What Can You Use It For?

FungiTastic is specifically designed to benchmark and develop advanced ML models for:

Fine-grained image classification (closed-set, open-set)
Few-shot learning and rare species recognition
Multi-modal and multi-task learning (combine visual, tabular, geospatial, and textual data)
Domain adaptation and temporal shift (study seasonal, habitat, and year-to-year distribution changes)
Vision-language modeling (with detailed image captions)
Semantic and instance segmentation
Cost-sensitive classification (e.g., recognizing poisonous vs. edible species)

See the Benchmarks section for a deep dive on supported challenges.

📚 Dataset Subsets

FungiTastic is split into several subsets, each tailored for different research tasks:

1. FungiTastic (Full)

The main benchmark set: >346k observations, >4,500 species, with all modalities.
Use: Closed-set & open-set classification, multi-modal modeling.

2. FungiTastic-Mini (FungiTastic–M)

A compact subset from 6 challenging genera (e.g., Russula, Amanita, Boletus).
Includes body part segmentation masks for ~70k images.
Use: Fast prototyping, segmentation, few-shot, open-set.

3. FungiTastic–FS (Few-shot)

Observations from species with <5 training samples.
Use: Few-shot learning & rare species recognition.

Details and statistics for each subset are provided in the Subsets and Benchmarks pages.

🔗 Available Data Modalities

Each observation can include a combination of:

Photographs & Captions: High-quality images (incl. some spore micrographs) and automatic detailed text descriptions.
Taxonomic Labels: Full biological hierarchy and toxicity (edible/poisonous).
Body Part Segmentation Masks: For toadstool-type fungi in the Mini subset.
Tabular Metadata: Date, location, habitat, substrate, elevation, land cover, etc.
Remote Sensing Data: 64x64 multi-band satellite images at 10m spatial resolution.
Climate Time Series: 20 years of temperature/precipitation data and bioclimatic variables.

🗂️ Quick Links to Data Sections

Data Type	Description	Link
Photographs & Captions	Images and generated text descriptions	Photographs & Captions
Taxonomic Metadata	Labels, toxicity, environment info	Metadata
Segmentation Masks	Body part annotations (Mini subset)	Segmentation Masks
Satellite Data	Local satellite/environmental context	Satellite Data
Climate Series	Long-term climate for each location	Climate Time Series

📥 How to Download

Instructions for access and download are provided in the Dataset Download Guide.

📝 Citations & Further Reading

For detailed methodology, statistics, and baselines, see our paper and the Citation page.

Still have questions? Explore the FAQ or reach out via GitHub Issues.