Overview
The Image Annotation Tool generates annotations for static image datasets using open-source Vision Language Models (VLMs). We have used this tool to create an annotation bank for the NSD Shared 1000 images, the subset of images viewed by all 8 subjects in the Natural Scenes Dataset study.
The tool runs VLMs locally via OLLAMA. The current annotation bank includes outputs from six models: Qwen2.5-VL (7B, 32B), Gemma3 (4B, 12B, 27B), and Mistral-Small3.2 (24B). Quality assessment across models is in progress.
Key Features
Models Used
All annotations generated locally via OLLAMA:
- Qwen2.5-VL: 7B and 32B parameter versions
- Gemma3: 4B, 12B, and 27B parameter versions
- Mistral-Small3.2: 24B parameters
Multi-Prompt Annotation
Each image is annotated using multiple prompts (general description, foreground/background, entities and interactions, mood and emotions) across all models to capture different aspects of the scene.
HED Integration (Planned)
Integration with Hierarchical Event Descriptors (HED) is the next development priority:
- Mapping VLM annotations to HED tags
- Validation against HED schema
- Export in BIDS-compliant events.tsv format
BIDS Compliance
Annotations follow stimuli-BIDS specifications:
- Standardized events.tsv format
- JSON sidecars with annotation schema
- Compatible with neuroimaging datasets
Web Dashboard
Interactive visualization with AGI branding, real-time annotation preview, and easy navigation through large datasets.
Annotation Types
The tool supports multiple annotation types for comprehensive image description:
General Description
Detailed natural language descriptions of image content, setting, main elements, colors, lighting, and overall composition.
Object Detection
Identification and localization of objects within images, compatible with COCO categories.
Scene Categorization
Classification of scenes into semantic categories for cross-dataset analysis.
Emotional Ratings
Valence and arousal ratings for affective neuroscience applications.
Natural Scenes Dataset (NSD)
The tool is optimized for the NSD Shared 1000 Dataset, featuring 1,000 images viewed by all 8 subjects in the Natural Scenes Dataset study. This shared subset enables:
- Cross-subject analysis of neural representations
- Benchmark annotations for model comparison
- Foundation for the broader 73,000 image collection
Architecture
Backend
- FastAPI: Python API for VLM orchestration
- OLLAMA: Local model serving on GPU
Frontend
- Next.js: Modern React framework
- AGI Branding: Consistent design with Annotation Garden
- Real-time Updates: Live annotation preview
Storage
- JSON files with comprehensive metrics
- Database support for large datasets
- Version control through Git
Performance Benchmarks
All performance metrics were generated using an NVIDIA GeForce RTX 4090 GPU with OLLAMA for local model inference.
Annotation Tools
Powerful CLI tools for post-processing annotations:
from image_annotation.utils import reorder_annotations, remove_model, export_to_csv
# Reorder model annotations by quality
reorder_annotations("annotations/", ["best_model", "second_best"])
# Remove underperforming models
remove_model("annotations/", "poor_model")
# Export for analysis
export_to_csv("annotations/", "results.csv", include_metrics=True)
Quick Start
Prerequisites
- Python 3.11+
- Node.js 18+
- OLLAMA
- GPU with sufficient VRAM for target models
Installation
# Clone from Annotation Garden
git clone https://github.com/Annotation-Garden/image-annotation.git
cd image-annotation
# Python environment
conda create -n torch-312 python=3.12
conda activate torch-312
pip install -e .
# Frontend
cd frontend && npm install
Usage
# Start OLLAMA (for local models)
ollama serve
# Run frontend dashboard
cd frontend && npm run dev
# Visit http://localhost:3000
Access Information
Links
- Live Tool: annotation.garden/image-annotation
- GitHub Repository: github.com/Annotation-Garden/image-annotation
- Parent Project: annotation.garden
License
- License: CC-BY-NC-SA 4.0
Related Projects
- Annotation Garden Initiative : Parent organization for collaborative annotation
- HED-MCP : Model Context Protocol for HED integration
- Natural Scenes Dataset : Source of stimulus images
© 2025 Seyed Yahya Shirazi