Skip to content

Serialization Round-Trip Examples

This example demonstrates how to serialize a Recording to the biosigIO columnar and serving formats and read it back: Parquet and Arrow/Feather for lossless, analytics-friendly round trips, and Zarr for a cloud-native serving store with a multi-resolution view pyramid.

These formats need optional dependencies. Parquet and Arrow/Feather use the arrow extra (the pyarrow package); Zarr uses the zarr extra (zarr v3):

# With uv (editable dev install)
uv sync --extra arrow      # Parquet / Arrow / Feather
uv sync --extra zarr       # Zarr serving store

# Or into an existing environment (UV-only project; use uv pip)
uv pip install 'biosigio[arrow]'
uv pip install 'biosigio[zarr]'

The examples below build a small Recording in memory so they run without any external data file. In practice you would load a recording from any supported format (EDF, WFDB, XDF, ...) and serialize that instead.

import numpy as np
from biosigio import Recording

# Build a small two-channel recording (1 second at 1000 Hz)
fs = 1000.0
t = np.arange(fs) / fs
rec = Recording()
rec.add_channel('EMG1', np.sin(2 * np.pi * 20 * t), fs, 'mV', 'EMG')
rec.add_channel('EMG2', np.sin(2 * np.pi * 35 * t), fs, 'mV', 'EMG')
rec.add_event(onset=0.25, duration=0.10, description='burst')

Parquet Round-Trip

Recording.to_parquet() writes a single self-describing columnar table: signal channels become columns (the time index is preserved), and channels, events, and recording metadata travel in the file's schema metadata. The round trip is lossless and bit-exact; sample values are stored verbatim, so no quantization or resampling occurs.

# Export
rec.to_parquet('out.parquet')

# Read back; the '.parquet' extension selects the tabular importer automatically
restored = Recording.from_file('out.parquet')

# Signals are recovered exactly
assert np.array_equal(rec.signals.to_numpy(), restored.signals.to_numpy())
print(list(restored.channels.keys()))   # ['EMG1', 'EMG2']
print(restored.events.iloc[0]['description'])  # 'burst'

Parquet is a good choice when you want to query signals with analytics engines (DuckDB, Polars, pandas, Spark) while keeping the full biosigIO metadata intact.

Arrow / Feather Round-Trip

Recording.to_arrow() writes the same self-describing schema as Parquet, but in the Arrow/Feather inter-process communication (IPC) format for fast, zero-copy reads. It is also lossless and bit-exact.

# Export (.feather or .arrow are both accepted)
rec.to_arrow('out.feather')

# Read back; the extension selects the tabular importer automatically
restored = Recording.from_file('out.feather')

assert np.array_equal(rec.signals.to_numpy(), restored.signals.to_numpy())

Both to_parquet() and to_arrow() return the written file path.

Zarr Export and Read

Recording.to_zarr() writes a sharded Zarr v3 serving store. This is a derived serving copy, not an archival source: level 0 of each channel group is the anti-aliased, per-modality-resampled inference signal, with a min/max render pyramid stacked above it for fast viewing. Channels are grouped by (modality, native rate), so a single-rate EMG recording produces one group named like emg_1000hz.

By default the store uses int16 with a per-channel scale and offset (half the bytes of float32). Pass dtype='float32' for a lossless store.

# Lossless store (float32). The default dtype='int16' is smaller but quantized.
rec.to_zarr('out.zarr', dtype='float32')

# Read back; the '.zarr' extension selects the Zarr importer automatically.
# With a single channel group, no group= selector is needed.
served = Recording.from_file('out.zarr')
print(list(served.channels.keys()))   # ['EMG1', 'EMG2']

Reading the serving signal, not the original

The Zarr read returns the downsampled serving signal, not the original full-rate recording. Recording.from_file reconstructs level 0 of the chosen group at the store's canonical (possibly downsampled) rate. The view pyramid (view/*) is render-only and is never read back as signal. Treat the source recording (for example a BIDS or EDF archive) as authoritative when you need the exact acquisition signal.

# level 0 rate is the per-modality canonical rate (EMG caps at 1000 Hz by default),
# capped to never exceed the native rate.
fs_served = served.channels['EMG1']['sample_frequency']
print(fs_served)   # 1000.0 here; lower than native for, e.g., a 2048 Hz source

Selecting a group in a multi-rate store

A store may hold several (modality, rate) groups that cannot share biosigio's single time grid (for example an EEG group at 250 Hz and an EMG group at 1000 Hz). When more than one group is present, pass the group= selector to choose which one to reconstruct:

# Explicit importer plus group selector for a multi-rate store
rec = Recording.from_file('rec.zarr', importer='zarr', group='emg_1000hz')
eeg = Recording.from_file('rec.zarr', importer='zarr', group='eeg_250hz')

If you omit group= on a multi-group store, from_file raises a ValueError listing the available group names.

Choosing a Format

For when to reach for Parquet, Arrow/Feather, or Zarr, including the lossless-versus-serving trade-offs and the on-disk store contract, see the Serialization Formats reference.