Overview

The Annotation Garden Initiative (AGI) establishes an open infrastructure for collaborative, multi-layered annotation of stimuli used in neuroscience research. Building on emerging standards including Stim-BIDS (BEP044) and HED specifications , AGI addresses the critical gap between increasingly complex naturalistic stimuli and the need for standardized, reusable annotations.

Through GitHub-based version control and modern web interfaces, AGI enables researchers to share, refine, and build upon stimulus annotations across studies, transforming how we document and analyze brain responses to complex sensory experiences.


The Problem We’re Solving

Modern neuroscience increasingly employs naturalistic stimuli, from the 73,000 COCO images prepared for the Natural Scenes Dataset to the 2-hour audiovisual experience of Forrest Gump and the diverse movie-watching paradigms in the Child Mind Institute’s Healthy Brain Network. Each dataset represents thousands of hours of wasted researcher time as labs independently annotate the same stimuli: scene boundaries, emotional valence, semantic content, acoustic features, narrative structure.

Yet these annotations remain siloed within individual studies, limiting reproducibility and preventing the field from building cumulative knowledge about brain-behavior relationships.

Current approaches fragment across:

  • Tools: ELAN, Praat, custom scripts
  • Formats: CSV, JSON, proprietary
  • Storage: Lab servers, supplementary materials, forgotten hard drives

Architecture

Each stimulus becomes a Git repository with branches serving as annotation layers. Different perspectives (scene boundaries, emotional arcs, object detection) can be developed independently and merged through standard Git workflows.

Repository Structure

Each stimulus repository follows a consistent structure aligned with stimuli-BIDS:

stimulus-name/
├── stimuli/
│   └── [stimulus files or pointers]
├── annotations/
│   ├── visual-saliency/
│   │   ├── events.tsv
│   │   └── events.json
│   ├── emotional-ratings/
│   └── scene-boundaries/
├── README.md
└── LICENSE

Flagship Datasets

Natural Scenes Dataset (NSD)

73,000 COCO images prepared for this dataset, with each of 8 subjects viewing approximately 10,000 distinct images during 7T fMRI scanning. AGI hosts annotation layers including: object detection, scene categorization, emotional ratings, and visual saliency maps.

Forrest Gump (studyforrest.org)

The 2-hour movie stimulus has published annotations: 870 shot boundaries, 2,500+ spoken sentences with speaker identification, emotion characterizations, and frame-wise perceptual features. AGI unifies these layers into HED-compatible formats.

HBN Movies

The diverse movie collection from the Child Mind Institute (Despicable Me, The Present, Pixar shorts) serves as the foundation for the NeurIPS 2025 EEG Foundation Challenge. AGI handles copyright complexities through pointer-based architecture.


Current Status

The platform at annotation.garden hosts:

  • VLM-generated annotations for NSD Shared 1000 images (see Image Annotation Tool )
  • Infrastructure for hosting and versioning stimulus annotations
  • Plans for HED integration to standardize semantic annotations

Tools and Resources

Image Annotation Tool

The Image Annotation tool provides a web-based interface for VLM-powered image annotation, starting with the Natural Scenes Dataset.

Standards Integration

  • BIDS: Brain Imaging Data Structure; Stim-BIDS (BEP044) for standardized annotation format
  • HED: Hierarchical Event Descriptors for semantic annotation framework

Access Information

License

  • License: CC-BY-NC-SA 4.0

References

  • Allen, E. J., et al. (2021). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1), 116-126.
  • Hanke, M., et al. (2014). A high-resolution 7-Tesla fMRI dataset from complex natural stimulation with an audio movie. Scientific Data, 1, 140003.
  • Shirazi, S. Y., et al. (2024). HBN-EEG: The FAIR implementation of the Healthy Brain Network electroencephalography dataset. bioRxiv.

© 2025 Seyed Yahya Shirazi