Overview

The Image Annotation Tool generates annotations for static image datasets using open-source Vision Language Models (VLMs). We have used this tool to create an annotation bank for the NSD Shared 1000 images, the subset of images viewed by all 8 subjects in the Natural Scenes Dataset study.

The tool runs VLMs locally via OLLAMA. The current annotation bank includes outputs from six models: Qwen2.5-VL (7B, 32B), Gemma3 (4B, 12B, 27B), and Mistral-Small3.2 (24B). Quality assessment across models is in progress.


Key Features

Models Used

All annotations generated locally via OLLAMA:

  • Qwen2.5-VL: 7B and 32B parameter versions
  • Gemma3: 4B, 12B, and 27B parameter versions
  • Mistral-Small3.2: 24B parameters

Multi-Prompt Annotation

Each image is annotated using multiple prompts (general description, foreground/background, entities and interactions, mood and emotions) across all models to capture different aspects of the scene.

HED Integration (Planned)

Integration with Hierarchical Event Descriptors (HED) is the next development priority:

  • Mapping VLM annotations to HED tags
  • Validation against HED schema
  • Export in BIDS-compliant events.tsv format

BIDS Compliance

Annotations follow stimuli-BIDS specifications:

  • Standardized events.tsv format
  • JSON sidecars with annotation schema
  • Compatible with neuroimaging datasets

Web Dashboard

Interactive visualization with AGI branding, real-time annotation preview, and easy navigation through large datasets.


Annotation Types

The tool supports multiple annotation types for comprehensive image description:

General Description

Detailed natural language descriptions of image content, setting, main elements, colors, lighting, and overall composition.

Object Detection

Identification and localization of objects within images, compatible with COCO categories.

Scene Categorization

Classification of scenes into semantic categories for cross-dataset analysis.

Emotional Ratings

Valence and arousal ratings for affective neuroscience applications.


Natural Scenes Dataset (NSD)

The tool is optimized for the NSD Shared 1000 Dataset, featuring 1,000 images viewed by all 8 subjects in the Natural Scenes Dataset study. This shared subset enables:

  • Cross-subject analysis of neural representations
  • Benchmark annotations for model comparison
  • Foundation for the broader 73,000 image collection

Architecture

Backend

  • FastAPI: Python API for VLM orchestration
  • OLLAMA: Local model serving on GPU

Frontend

  • Next.js: Modern React framework
  • AGI Branding: Consistent design with Annotation Garden
  • Real-time Updates: Live annotation preview

Storage

  • JSON files with comprehensive metrics
  • Database support for large datasets
  • Version control through Git

Performance Benchmarks

All performance metrics were generated using an NVIDIA GeForce RTX 4090 GPU with OLLAMA for local model inference.

Annotation Tools

Powerful CLI tools for post-processing annotations:

from image_annotation.utils import reorder_annotations, remove_model, export_to_csv

# Reorder model annotations by quality
reorder_annotations("annotations/", ["best_model", "second_best"])

# Remove underperforming models
remove_model("annotations/", "poor_model")

# Export for analysis
export_to_csv("annotations/", "results.csv", include_metrics=True)

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • OLLAMA
  • GPU with sufficient VRAM for target models

Installation

# Clone from Annotation Garden
git clone https://github.com/Annotation-Garden/image-annotation.git
cd image-annotation

# Python environment
conda create -n torch-312 python=3.12
conda activate torch-312
pip install -e .

# Frontend
cd frontend && npm install

Usage

# Start OLLAMA (for local models)
ollama serve

# Run frontend dashboard
cd frontend && npm run dev
# Visit http://localhost:3000

Access Information

License

  • License: CC-BY-NC-SA 4.0


© 2025 Seyed Yahya Shirazi