Hyser: High-Density Surface EMG Dataset for Neural Interface Research

Overview

The Hyser (High densitY Surface Electromyogram Recordings) dataset is a comprehensive open-access collection of high-density surface EMG data from 20 subjects performing diverse hand and finger tasks. This dataset addresses critical gaps in existing sEMG datasets by providing cross-day recordings, random task combinations, and synchronized individual finger force measurements alongside 256-channel HD-sEMG signals.

Developed by researchers at Fudan University and collaborating institutions, Hyser fills important needs in neural interface research that previous datasets couldn’t address: the ability to study arbitrary switching between degrees of freedom (DoFs), cross-day robustness, and precise finger-level force control for dexterous prosthetic applications.

The dataset is particularly valuable for developing next-generation neural interfaces that can provide intuitive, proportional control of prosthetic hands with individual finger control, rather than just discrete gesture recognition.

Data Description

HD-sEMG Data: 256-channel high-density surface EMG recorded at 2048 Hz using four 8×8 electrode arrays placed on forearm extensor and flexor muscles
Force Measurements: Individual finger force recordings at 100 Hz for four of the five sub-datasets, enabling proportional control research
Cross-Day Sessions: Each subject recorded twice on different days (3-25 day intervals) to evaluate temporal robustness
Multiple Tasks: Five distinct experimental paradigms covering gesture recognition and force control applications
Rich Annotations: Comprehensive labeling of gestures, force trajectories, and experimental conditions

Electrode Configuration

The 256 channels are arranged using four 8×8 electrode arrays (64 channels each):

ED (Extensor-Distal): Distal end of extensor muscles
EP (Extensor-Proximal): Proximal end of extensor muscles
FD (Flexor-Distal): Distal end of flexor muscles
FP (Flexor-Proximal): Proximal end of flexor muscles

Each electrode array uses 5mm × 2.8mm elliptical electrodes with 10mm inter-electrode spacing, providing high spatial resolution for muscle activation mapping and motor unit decomposition.

Five Sub-Datasets

1. Pattern Recognition (PR) Dataset

Purpose: Hand gesture classification research
Tasks: 34 commonly used hand gestures including individual finger extensions, wrist movements, and combined motions
Data: 204 dynamic tasks (1s transitions) + 68 maintenance tasks (4s holds) per subject per session
Applications: Prosthetic control via discrete gesture commands, human-computer interfaces

All 34 Hand Gestures The complete set of 34 hand gestures included in the Pattern Recognition dataset, covering individual finger movements, wrist motions, and combined actions commonly used in daily activities.

2. Maximal Voluntary Contraction (MVC) Dataset

Purpose: Force normalization and maximum strength assessment
Tasks: MVC measurements for flexion and extension of each finger
Data: 2 trials × 5 fingers × 2 directions × 10s duration per subject per session
Applications: Normalizing force data, understanding individual strength capabilities

3. One Degree-of-Freedom (1-DoF) Dataset

Purpose: Single-finger proportional control research
Tasks: Force tracking with individual fingers (30% MVC flexion to 30% MVC extension)
Data: 3 trials × 5 fingers × 25s duration with triangular force trajectories
Applications: Proportional control of individual prosthetic fingers

4. N Degrees-of-Freedom (N-DoF) Dataset

Purpose: Multi-finger coordination and control
Tasks: 15 different finger combinations with prescribed force trajectories
Data: 2 trials × 15 combinations × 25s duration, including both synchronized and opposing finger movements
Applications: Coordinated multi-finger prosthetic control, studying finger interaction effects

5. Random Task Dataset

Purpose: Realistic, unconstrained control scenarios
Tasks: Free-form finger contractions without prescribed trajectories
Data: 5 trials × 25s duration of spontaneous finger force combinations
Applications: Developing robust controllers for real-world prosthetic use

Special Research Application: Bracelet vs Sleeve EMG

Optimizing Wearable EMG Placement for Practical Neural Interfaces

One compelling research direction using the Hyser dataset involves comparing information content between wrist-based “bracelet” EMG configurations versus traditional forearm “sleeve” arrangements. This has significant implications for developing practical, everyday-wearable neural interfaces.

Research Question

How much neural control information is lost when using a compact wrist-mounted EMG bracelet compared to a full forearm sleeve, and what is the optimal electrode count for each configuration?

Methodology Using Hyser Data

The high spatial resolution of Hyser’s 256-channel arrays enables systematic comparison of different electrode subsets:

Bracelet Configuration:

Select electrodes closest to the wrist from all four arrays
Compare 8, 16, and 32 electrode subsets arranged circumferentially around the wrist
Focus on distal muscle activation patterns

Sleeve Configuration:

Use distributed electrodes across the entire forearm coverage area
Match electrode counts (8, 16, 32) but with broader spatial sampling
Capture both proximal and distal muscle activations

Expected Insights

Information Loss Quantification: Measure degradation in gesture classification accuracy and force estimation precision
Optimal Electrode Placement: Identify which wrist locations provide maximum information density
Task-Dependent Performance: Some gestures may be more robust to wrist-only sensing than others
Individual Differences: Subject-specific variations in optimal electrode placement

Practical Implications

This research directly addresses the usability vs. performance tradeoff in wearable neural interfaces:

Usability: Wrist bracelets may be more convenient, less conspicuous, and easier to don/doff
Performance: Full forearm coverage traditionally provides richer control signals
Finding the Sweet Spot: Determine minimum electrode requirements for acceptable performance

The results could guide development of next-generation wearable devices that balance practicality with control fidelity, making neural interfaces more accessible for daily use.

Data Management and Access

Dataset Structure

The dataset is organized hierarchically by subject and session:

/dataset_name/
├── pr_dataset/           # Pattern recognition (37.1 GB)
├── mvc_dataset/          # Maximal voluntary contraction (7.8 GB)  
├── 1dof_dataset/         # Single finger control (29.3 GB)
├── ndof_dataset/         # Multi-finger control (58.6 GB)
├── random_dataset/       # Free-form tasks (9.8 GB)
├── SHA256SUMS.txt        # Data integrity verification
├── equipment_info.pdf    # Technical specifications
└── readme.txt           # Dataset documentation

Each subdirectory contains folders for individual subjects and sessions:

/pr_dataset/
├── subject01_session1/
├── subject01_session2/
├── subject02_session1/
└── ...

File Formats

EMG Data: WFDB format (.dat + .hea files) for efficient storage and MATLAB compatibility
Force Data: WFDB format with synchronized timestamps
Labels: Comma-separated text files for gesture classifications
Preprocessing: Both raw and preprocessed versions provided

Getting Started

The dataset includes a comprehensive MATLAB toolbox with example scripts:

demo_pr.m - Pattern recognition analysis
demo_1dof.m - Single finger force estimation
demo_ndof.m - Multi-finger coordination analysis
main_decomposition.m - Motor unit decomposition via ICA

Access Information

Download Locations

Primary Source: PhysioNet - DOI: 10.13026/ym7v-bh53
Toolbox: GitHub Repository - MATLAB analysis functions and examples

Dataset Statistics

Total Size: 142.6 GB across all sub-datasets
Subjects: 20 healthy volunteers (8 female, 12 male, ages 21-34)
Sessions: 2 per subject on different days (3-25 day intervals)
Sampling Rates: 2048 Hz (EMG), 100 Hz (force)
Total Recording Time: ~33 hours of synchronized EMG and force data

License and Usage

License: Open Data Commons Attribution License v1.0
Commercial Use: Permitted with attribution
Citation Required: Please cite the original dataset paper when using this data

Performance Benchmarks

The dataset paper provides baseline performance metrics for common neural interface tasks:

Gesture Recognition Results

LDA-based Method: 96.86% accuracy (dynamic tasks), 93.80% accuracy (maintenance tasks)
CNN-based Method: 88.96% accuracy (dynamic tasks), 89.84% accuracy (maintenance tasks)
34-class Classification: Well above 2.94% random chance level

Force Estimation Results

Average RMSE: 18.28 ± 5.82% MVC across all subjects and fingers
Correlation: 0.8611 ± 0.0358 between estimated and actual forces
Individual Finger Performance: Consistent across thumb, index, middle, ring, and little fingers

These benchmarks provide reference points for evaluating new algorithms developed using the dataset.

Future Research Directions

The Hyser dataset enables investigation of numerous important questions in neural interface research:

Cross-Day Robustness: How do EMG-based controllers degrade over time? What adaptation strategies work best?
Individual Differences: Can we develop personalized control algorithms that account for anatomical and physiological variations?
Motor Unit Analysis: What insights can motor unit decomposition provide for prosthetic control?
Compressed Sensing: How can we reduce the computational burden of 256-channel processing for real-time applications?
Transfer Learning: Can models trained on one population generalize to new users with minimal calibration?

References

If you use this dataset, please cite the following publications:

Hyser Dataset Paper: Jiang et al., IEEE TNSRE 2021 - “Open Access Dataset, Toolbox and Benchmark Processing Results of High-Density Surface Electromyogram Recordings”
PhysioNet Entry: DOI: 10.13026/ym7v-bh53