π Welcome to FungiTastic!
FungiTastic is a large-scale, expert-verified, multi-modal dataset and toolkit for benchmarking and research in wild fungi recognition, discovery, and biodiversity.
Whether youβre here to push the limits of vision models, explore new multi-modal learning, or build better biodiversity tools, youβre in the right place!
π Key Resources
Resource | Description | Link |
---|---|---|
π Dataset Paper | CVPR 2025 (FGVC Workshop), dataset details & benchmarks | arXiv Paper (PDF) |
π§ GitHub Repository | Code, loaders, scripts, and baselines | FungiTastic Repo |
π Starter Notebooks | Baseline pipelines and scripts | Kaggle Code Notebooks |
π¦ Download Instructions | How to access subsets and modalities | Download & Usage Guide |
ποΈ Dataset Overview
FungiTastic is a large-scale (~350,000 observations and >600,000 images), multi-modal benchmark dataset for computer vision, machine learning, and biodiversity research, centered around wild fungi observations. It offers expert-labeled data across more than 5,000 species, collected over 20+ years, and is designed to power research in fine-grained recognition, domain adaptation, multi-modal learning, open-set recognition, and more.
Figure1: A Fungi observation includes one or more photos [π©] with expert-verified labels, sometimes spores, and rich contextual data: captions [π¦], metadata [π§], geospatial [π«], and climatic time-series [π¦]. For a subset (~70k images), body part masks [π₯] are included.
π§βπ¬ What Can You Do With FungiTastic?
- Fine-grained classification (closed-set, open-set)
- Few-shot learning and rare species recognition
- Multi-modal and multi-task learning (mix visual, tabular, geospatial, text)
- Domain adaptation & temporal shift (yearly, seasonal, and habitat variation)
- Vision-language modeling (rich captions)
- Semantic/instance segmentation
- Cost-sensitive classification (e.g., edible vs. poisonous)
See Benchmarks for benchmark challenges and usage.
π Dataset Subsets
- Full: ~346k observations, all modalities β benchmark for classification and discovery
- Mini (FungiTasticβM): Focused on 6 genera, ~70k images with masks β fast prototyping, segmentation, few-shot
- Few-shot (FungiTasticβFS): Species with <5 training samples β test few-shot models & rare class learning
Full subset details and statistics in Dataset.
πΎ Downloading the Data
Two options:
- Kaggle download: Contains the majority of the data and images in 500px image resolution (~50GB)
- Download script (recommended):
Download only what you need (by subset, modality, or resolution).See the Download Guide for all options.git clone https://github.com/bohemianvra/FungiTastic.git cd FungiTastic/dataset python download.py --metadata --images --subset "m" --size "300" --save_path "./"
π£ Get Involved
- Issues or help? Open an Issue
- Request a feature or contribute? Fork & PR!
Citation
- When used, please use the following reference.
@InProceedings{Picek_2025_CVPR, author = {Picek, Lukas and Janouskova, Klara and Cermak, Vojtech and Matas, Jiri}, title = {FungiTastic: A Multi-Modal Dataset and Benchmark for Image Categorization}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {2046-2056} }
Enjoy! π