Skip to content

FungiTastic

Stars Issues Pull Requests License

πŸ„ Welcome to FungiTastic!

FungiTastic is a large-scale, expert-verified, multi-modal dataset and toolkit for benchmarking and research in wild fungi recognition, discovery, and biodiversity.
Whether you’re here to push the limits of vision models, explore new multi-modal learning, or build better biodiversity tools, you’re in the right place!


πŸ”Ž Key Resources

Resource Description Link
πŸ“„ Dataset Paper CVPR 2025 (FGVC Workshop), dataset details & benchmarks arXiv Paper (PDF)
🧠 GitHub Repository Code, loaders, scripts, and baselines FungiTastic Repo
πŸš€ Starter Notebooks Baseline pipelines and scripts Kaggle Code Notebooks
πŸ“¦ Download Instructions How to access subsets and modalities Download & Usage Guide

🏞️ Dataset Overview

FungiTastic is a large-scale (~350,000 observations and >600,000 images), multi-modal benchmark dataset for computer vision, machine learning, and biodiversity research, centered around wild fungi observations. It offers expert-labeled data across more than 5,000 species, collected over 20+ years, and is designed to power research in fine-grained recognition, domain adaptation, multi-modal learning, open-set recognition, and more.

FungiTastic Example Figure1: A Fungi observation includes one or more photos [🟩] with expert-verified labels, sometimes spores, and rich contextual data: captions [🟦], metadata [🟧], geospatial [🟫], and climatic time-series [🟦]. For a subset (~70k images), body part masks [πŸŸ₯] are included.


πŸ§‘β€πŸ”¬ What Can You Do With FungiTastic?

  • Fine-grained classification (closed-set, open-set)
  • Few-shot learning and rare species recognition
  • Multi-modal and multi-task learning (mix visual, tabular, geospatial, text)
  • Domain adaptation & temporal shift (yearly, seasonal, and habitat variation)
  • Vision-language modeling (rich captions)
  • Semantic/instance segmentation
  • Cost-sensitive classification (e.g., edible vs. poisonous)

See Benchmarks for benchmark challenges and usage.


πŸ“š Dataset Subsets

  • Full: ~346k observations, all modalities β€” benchmark for classification and discovery
  • Mini (FungiTastic–M): Focused on 6 genera, ~70k images with masks β€” fast prototyping, segmentation, few-shot
  • Few-shot (FungiTastic–FS): Species with <5 training samples β€” test few-shot models & rare class learning

Full subset details and statistics in Dataset.


πŸ’Ύ Downloading the Data

Two options:

  1. Kaggle download: Contains the majority of the data and images in 500px image resolution (~50GB)
  2. Download script (recommended):
    Download only what you need (by subset, modality, or resolution).
    git clone https://github.com/bohemianvra/FungiTastic.git
    cd FungiTastic/dataset
    python download.py --metadata --images --subset "m" --size "300" --save_path "./"
    
    See the Download Guide for all options.

πŸ“£ Get Involved

  • Issues or help? Open an Issue
  • Request a feature or contribute? Fork & PR!

Citation

  • When used, please use the following reference.
    @InProceedings{Picek_2025_CVPR,
        author    = {Picek, Lukas and Janouskova, Klara and Cermak, Vojtech and Matas, Jiri},
        title     = {FungiTastic: A Multi-Modal Dataset and Benchmark for Image Categorization},
        booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
        month     = {June},
        year      = {2025},
        pages     = {2046-2056}
    }
    

Enjoy! πŸ„