XDF Importer¶

The XDFImporter class handles importing data from XDF (Extensible Data Format) files, the native format for Lab Streaming Layer (LSL) recordings. It supports multi-stream files with different sampling rates and data types.

Class Documentation¶

`biosigio.importers.xdf` ¶

XDF (Extensible Data Format) importer for EMG data.

XDF files can contain multiple streams (EMG, EEG, markers, etc.). This module provides tools to explore XDF contents and selectively import specific streams.

`logger = logging.getLogger(name)` `module-attribute` ¶

`BaseImporter` ¶

Bases: ABC

Base class for EMG data importers.

Source code in biosigio/importers/base.py

class BaseImporter(ABC):
    """Base class for EMG data importers."""

    @abstractmethod
    def load(self, filepath: str) -> Recording:
        """
        Load EMG data from file.

        Args:
            filepath: Path to the input file

        Returns:
            Recording: Recording object containing the loaded data
        """
        pass

`load(filepath)` `abstractmethod` ¶

Load EMG data from file.

Args: filepath: Path to the input file

Returns: Recording: Recording object containing the loaded data

Source code in biosigio/importers/base.py

@abstractmethod
def load(self, filepath: str) -> Recording:
    """
    Load EMG data from file.

    Args:
        filepath: Path to the input file

    Returns:
        Recording: Recording object containing the loaded data
    """
    pass

`Recording` ¶

Core biosignal recording: signals + channels + events + metadata.

Modality-agnostic container for EEG / EMG / iEEG / MEG / stim / marker data imported from any supported format.

Attributes: signals (pd.DataFrame): Raw signal data with time as index. metadata (dict): Metadata dictionary containing recording information. channels (dict): Channel information including type, unit, sampling frequency. events (pd.DataFrame): Annotations or events associated with the signals, with columns 'onset', 'duration', 'description'.

Source code in biosigio/core/emg.py

class Recording:
    """
    Core biosignal recording: signals + channels + events + metadata.

    Modality-agnostic container for EEG / EMG / iEEG / MEG / stim / marker data
    imported from any supported format.

    Attributes:
        signals (pd.DataFrame): Raw signal data with time as index.
        metadata (dict): Metadata dictionary containing recording information.
        channels (dict): Channel information including type, unit, sampling frequency.
        events (pd.DataFrame): Annotations or events associated with the signals,
                               with columns 'onset', 'duration', 'description'.
    """

    def __init__(self):
        """Initialize an empty recording."""
        self.signals = None
        self.metadata = {}
        self.channels = {}
        # Initialize events as an empty DataFrame with specified columns
        self.events = pd.DataFrame(columns=["onset", "duration", "description"])

    def plot_signals(
        self,
        channels=None,
        time_range=None,
        offset_scale=0.8,
        uniform_scale=True,
        detrend=False,
        grid=True,
        title=None,
        show=True,
        plt_module=None,
    ):
        """
        Plot signals in a single plot with vertical offsets.

        Args:
            channels: List of channels to plot. If None, plot all channels.
            time_range: Tuple of (start_time, end_time) to plot. If None, plot all data.
            offset_scale: Portion of allocated space each signal can use (0.0 to 1.0).
            uniform_scale: Whether to use the same scale for all signals.
            detrend: Whether to remove mean from signals before plotting.
            grid: Whether to show grid lines.
            title: Optional title for the figure.
            show: Whether to display the plot.
            plt_module: Matplotlib pyplot module to use.
        """
        # Delegate to the static plotting function in visualization module
        static_plot_signals(
            rec_object=self,
            channels=channels,
            time_range=time_range,
            offset_scale=offset_scale,
            uniform_scale=uniform_scale,
            detrend=detrend,
            grid=grid,
            title=title,
            show=show,
            plt_module=plt_module,
        )

    @classmethod
    def _infer_importer(cls, filepath: str) -> ImporterName:
        """
        Infer the importer to use based on the file extension.
        """
        # rstrip path separators so a Zarr store passed as a directory with a
        # trailing slash (e.g. "rec.zarr/") still resolves by its ".zarr" suffix.
        extension = os.path.splitext(filepath.rstrip("/\\"))[1].lower()
        if extension in {".edf", ".bdf"}:
            return "edf"
        elif extension in {".set"}:
            return "eeglab"
        elif extension in {".otb", ".otb+"}:
            return "otb"
        elif extension in {".csv", ".txt"}:
            return "csv"
        elif extension in {".hea", ".dat", ".atr"}:
            return "wfdb"
        elif extension in {".xdf", ".xdfz"}:
            return "xdf"
        elif extension in {".fif", ".ds", ".con", ".sqd", ".kdf"}:
            # MEG via MNE: .fif (Neuromag/FIF), CTF .ds directory, KIT/Yokogawa
            # .con/.sqd/.kdf. The MEGImporter dispatches on extension internally.
            return "meg"
        elif extension in {".vhdr"}:
            return "brainvision"
        elif extension in {".parquet", ".feather", ".arrow"}:
            return "tabular"
        elif extension in {
            ".rhd",
            ".rhs",
            ".ns1",
            ".ns2",
            ".ns3",
            ".ns4",
            ".ns5",
            ".ns6",
            ".smr",
            ".smrx",
            ".plx",
            ".pl2",
            ".trc",
            ".ncs",
        }:
            return "neo"
        elif extension == ".zarr":
            return "zarr"
        else:
            raise ValueError(f"Unsupported file extension: {extension}")

    @classmethod
    def from_file(
        cls,
        filepath: str,
        importer: ImporterName | None = None,
        force_csv: bool = False,
        bids_channels: str = "auto",
        mixed_rate: str = "error",
        **kwargs,
    ) -> "Recording":
        """
        The method to create a Recording object from file.

        Args:
            filepath: Path to the input file
            importer: Name of the importer to use. Can be one of the following:
                - 'trigno': Delsys Trigno EMG system (CSV)
                - 'otb': OTB/OTB+ EMG system (OTB, OTB+)
                - 'eeglab': EEGLAB .set files (SET)
                - 'edf': EDF/EDF+/BDF/BDF+ format (EDF, BDF)
                - 'csv': Generic CSV (or TXT) files with columnar data
                - 'wfdb': Waveform Database (WFDB)
                - 'xdf': XDF format (multi-stream Lab Streaming Layer files)
                - 'meg': MEG via MNE (.fif, CTF .ds, KIT .con/.sqd/.kdf; requires the 'meg' extra)
                - 'brainvision': BrainVision .vhdr via MNE (requires the 'meg' extra)
                - 'tabular': biosigIO Parquet/Arrow/Feather (requires the 'arrow' extra)
                - 'neo': proprietary electrophysiology formats via python-neo
                  (Intan, Blackrock, Spike2, Plexon, Micromed, Neuralynx, ...;
                  requires the 'neo' extra)
                - 'zarr': biosigIO Zarr serving store (requires the 'zarr' extra)
                If None, the importer will be inferred from the file extension.
                Automatic import is supported for CSV/TXT files.
            force_csv: If True and importer is 'csv', forces using the generic CSV
                      importer even if the file appears to match a specialized format.
            bids_channels: When 'auto' (default), look for a sibling BIDS
                      _channels.tsv next to the file and apply its per-channel
                      type/units over the importer's inferred values. Pass 'off'
                      to disable.
            mixed_rate: Policy for an EDF/BDF file whose signals carry differing
                      per-channel sampling rates (ignored for every other format,
                      which is single-rate). 'error' (default) raises -- biosigIO
                      stores one uniform grid and will not fabricate a common one
                      silently. 'resample' upsamples the slower channels to the
                      fastest rate (a lossy derived view; each channel keeps its
                      native rate as ``original_sample_frequency``).
            **kwargs: Additional arguments passed to the importer.
                For XDF files, useful kwargs include:
                - stream_names: List of stream names to import
                - stream_types: List of stream types to import (e.g., ["EMG", "EXG"])
                - stream_ids: List of stream IDs to import

        Returns:
            Recording: New Recording object with loaded data
        """
        if importer is None:
            importer = cls._infer_importer(filepath)

        importers = {
            "trigno": "TrignoImporter",  # CSV with Delsys Trigno Headers
            "otb": "OTBImporter",  # OTB/OTB+ EMG system data
            "edf": "EDFImporter",  # EDF/EDF+/BDF format
            "eeglab": "EEGLABImporter",  # EEGLAB .set files
            "csv": "CSVImporter",  # Generic CSV/Text files
            "wfdb": "WFDBImporter",  # Waveform Database format
            "xdf": "XDFImporter",  # XDF multi-stream format
            "meg": "MEGImporter",  # MEG via MNE (.fif, CTF .ds, KIT .con/.sqd/.kdf)
            "brainvision": "BrainVisionImporter",  # BrainVision via MNE (.vhdr)
            "tabular": "TabularImporter",  # biosigIO Parquet / Arrow / Feather
            "neo": "NeoImporter",  # proprietary ephys via python-neo
            "zarr": "ZarrImporter",  # biosigIO Zarr serving store
        }

        if importer not in importers:
            raise ValueError(
                f"Unsupported importer: {importer}. "
                f"Available importers: {list(importers.keys())}\n"
                "- trigno: Delsys Trigno EMG system\n"
                "- otb: OTB/OTB+ EMG system\n"
                "- edf: EDF/EDF+/BDF format\n"
                "- eeglab: EEGLAB .set files\n"
                "- csv: Generic CSV/Text files\n"
                "- wfdb: Waveform Database\n"
                "- xdf: XDF multi-stream format\n"
                "- meg: MEG via MNE (.fif, CTF .ds, KIT .con/.sqd/.kdf)\n"
                "- brainvision: BrainVision via MNE (.vhdr)\n"
                "- tabular: biosigIO Parquet/Arrow/Feather (.parquet, .feather, .arrow)\n"
                "- neo: proprietary electrophysiology formats via python-neo "
                "(Intan, Blackrock, Spike2, Plexon, Micromed, Neuralynx, ...)\n"
                "- zarr: biosigIO Zarr serving store (.zarr)"
            )

        # If using CSV importer and force_csv is set, pass it as force_generic
        if importer == "csv":
            kwargs["force_generic"] = force_csv

        # mixed_rate is only meaningful for EDF/BDF, the one format whose signals may
        # carry differing per-channel sampling rates; forward it there and nowhere
        # else (so the default "error" never reaches an importer that can't use it).
        if importer == "edf":
            kwargs["mixed_rate"] = mixed_rate

        # Import the appropriate importer class
        importer_module = __import__(
            f"biosigio.importers.{importer}", globals(), locals(), [importers[importer]]
        )
        importer_class = getattr(importer_module, importers[importer])

        # Create importer instance and load data
        rec = importer_class().load(filepath, **kwargs)

        # Record provenance: which format this recording came from. setdefault so a
        # re-imported serialization file (tabular/zarr) keeps the ORIGINAL
        # source_format restored from its metadata rather than being relabeled.
        rec.metadata.setdefault("source_format", importer)

        # In a BIDS layout, the sibling _channels.tsv is the authoritative source
        # of per-channel type/units; apply it over the importer's header/label
        # guesses unless explicitly disabled with bids_channels="off".
        if bids_channels != "off":
            from ..bids import apply_channels_tsv, find_channels_tsv

            channels_tsv = find_channels_tsv(filepath)
            if channels_tsv:
                apply_channels_tsv(rec, channels_tsv)

        return rec

    def select_channels(
        self,
        channels: str | list[str] | None = None,
        channel_type: str | None = None,
        inplace: bool = False,
        *,
        modality: str | None = None,
    ) -> "Recording":
        """
        Select specific channels from the data and return a new Recording object.

        Args:
            channels: Channel name or list of channel names to select. If None and
                    channel_type is specified, selects all channels of that type.
            channel_type: Type of channels to select ('EMG', 'ACC', 'GYRO', etc.).
                        If specified with channels, filters the selection to only
                        channels of this type.

        Returns:
            Recording: A new Recording object containing only the selected channels

        Examples:
            # Select specific channels
            new_rec = rec.select_channels(['EMG1', 'ACC1'])

            # Select all EMG channels
            emg_only = rec.select_channels(channel_type='EMG')

            # Select specific EMG channels only, this example does not select ACC channels
            emg_subset = rec.select_channels(['EMG1', 'ACC1'], channel_type='EMG')
        """
        if self.signals is None:
            raise ValueError("No signals loaded")

        if channels is None and channel_type is None and modality is None:
            raise ValueError("Specify at least one of: channels, channel_type, or modality.")

        # If type/modality specified but no channels, select all matching channels
        if channels is None and channel_type is not None:
            channels = self.get_channels_by_type(channel_type)
            if not channels:
                raise ValueError(f"No channels found of type: {channel_type}")
        elif channels is None and modality is not None:
            channels = self.get_channels_by_modality(modality)
            if not channels:
                raise ValueError(f"No channels found of modality: {modality}")
        elif isinstance(channels, str):
            channels = [channels]

        if channels is None:
            raise ValueError("Specify at least one of: channels, channel_type, or modality.")

        # Validate channels exist
        if not all(ch in self.signals.columns for ch in channels):
            missing = [ch for ch in channels if ch not in self.signals.columns]
            raise ValueError(f"Channels not found: {missing}")

        # Filter by type if specified
        if channel_type is not None:
            channels = [ch for ch in channels if self.channels[ch]["channel_type"] == channel_type]
            if not channels:
                raise ValueError(f"None of the selected channels are of type: {channel_type}")

        # Filter by modality if specified
        if modality is not None:
            canonical_modality = validate_modality(modality)
            channels = [
                ch for ch in channels if self.channels[ch].get("modality") == canonical_modality
            ]
            if not channels:
                raise ValueError(f"None of the selected channels are of modality: {modality}")

        # Create new Recording object
        new_rec = Recording()

        # Copy selected signals and channels
        new_rec.signals = self.signals[channels].copy()
        new_rec.channels = {ch: self.channels[ch].copy() for ch in channels}

        # Copy metadata
        new_rec.metadata = self.metadata.copy()

        if not inplace:
            return new_rec
        else:
            self.signals = new_rec.signals
            self.channels = new_rec.channels
            self.metadata = new_rec.metadata
            return self

    def resample(self, target_rate: float) -> "Recording":
        """Return a NEW, anti-aliased down-sampled copy of this recording.

        Low-resolution demos need a smaller, lighter recording; this rebuilds the
        uniform signal grid at ``target_rate`` using a polyphase resampler
        (``scipy.signal.resample_poly``), which applies a Kaiser-windowed sinc
        anti-alias FIR before decimation. A naive stride-decimation would fold
        energy above the new Nyquist back into the band (aliasing); resample_poly
        removes that energy first, so no aliasing occurs.

        Non-destructive: ``self`` is left untouched and a new Recording is returned,
        mirroring ``select_channels``'s copy semantics.

        Resampling factors come from the integer source/target rates:
        ``g = gcd(int(src), int(target)); up = int(target)//g; down = int(src)//g``
        and ``resample_poly(x, up, down)`` runs once, vectorized over all channels
        along ``axis=0``.

        Args:
            target_rate: Desired sampling rate in Hz. Must be <= the source rate
                (this is a DOWN-sampling helper). A target equal to the source
                returns an unchanged copy; a target above it raises ``ValueError``
                rather than silently up-sampling (up-sampling cannot recover
                detail and is out of scope for the low-res pipeline).

        Returns:
            Recording: A new Recording with the resampled signals, each channel's
                ``sample_frequency`` set to the achieved rate (source * up / down,
                which equals ``target_rate`` for integer rates), and channel/recording
                metadata and events preserved. Events are unchanged because their
                onsets/durations are in SECONDS, which stay valid under any rate
                change (only the per-sample grid shrinks, not wall-clock time).

        Raises:
            ValueError: If no signals are loaded, if channels do not share a single
                ``sample_frequency`` (biosigio stores one uniform grid; mixed-rate
                resampling is out of scope), or if ``target_rate`` exceeds the
                source rate.
        """
        from math import gcd

        from scipy.signal import resample_poly

        if self.signals is None:
            raise ValueError("No signals loaded")

        if target_rate <= 0:
            raise ValueError(f"target_rate must be positive, got {target_rate}")

        # biosigio stores all channels on one uniform-length grid; a per-channel rate
        # mix is out of scope here, matching the exporter's single-rate guard.
        distinct_rates = {info["sample_frequency"] for info in self.channels.values()}
        if len(distinct_rates) > 1:
            raise ValueError(
                "Resampling requires a single sampling rate across all channels, but "
                f"multiple were found: {sorted(distinct_rates)} Hz. biosigio stores one "
                "uniform grid; resample each rate group separately."
            )

        source_rate = float(next(iter(distinct_rates)))

        # Down-sampling only: refuse to up-sample/alias; an equal rate is a no-op
        # copy so callers can resample unconditionally without special-casing.
        if target_rate > source_rate:
            raise ValueError(
                f"target_rate {target_rate} Hz exceeds source rate {source_rate} Hz; "
                "resample() only down-samples (low-res). Up-sampling is out of scope."
            )

        new_rec = Recording()
        new_rec.channels = {ch: info.copy() for ch, info in self.channels.items()}
        new_rec.metadata = self.metadata.copy()
        # Onsets/durations are in SECONDS, so they remain valid after the grid
        # changes; copy them through unchanged.
        new_rec.events = self.events.copy() if self.events is not None else self.events

        if target_rate == source_rate:
            # No grid change: copy signals through untouched (fresh RangeIndex for
            # consistency with the resampled path).
            new_rec.signals = self.signals.copy().reset_index(drop=True)
            return new_rec

        # Rational resampling factors from the integer rates.
        src_i = int(round(source_rate))
        tgt_i = int(round(target_rate))
        g = gcd(src_i, tgt_i)
        up = tgt_i // g
        down = src_i // g

        # The achieved rate is exactly source * up / down. Store THAT, not the
        # requested float, so the metadata can never disagree with the data (a
        # non-integer or odd target snaps to the nearest achievable rational rate;
        # warn so the caller knows). This avoids silently writing e.g. 99.5 Hz
        # onto a grid that resample_poly actually produced at 100 Hz.
        actual_rate = source_rate * up / down
        if abs(actual_rate - target_rate) > 1e-9:
            logging.warning(
                "Requested resample to %g Hz; nearest achievable rational rate is "
                "%g Hz, which is what is stored on the channels.",
                target_rate,
                actual_rate,
            )

        columns = list(self.signals.columns)
        data = self.signals.to_numpy(dtype=float)
        # resample_poly over axis=0 resamples every channel column at once with the
        # shared anti-alias FIR.
        resampled = resample_poly(data, up, down, axis=0)

        new_rec.signals = pd.DataFrame(resampled, columns=columns)
        new_rec.signals.index = pd.RangeIndex(len(new_rec.signals))
        for info in new_rec.channels.values():
            info["sample_frequency"] = actual_rate

        return new_rec

    def get_channel_types(self) -> list[str]:
        """
        Get list of unique channel types in the data.

        Returns:
            List of channel types (e.g., ['EMG', 'ACC', 'GYRO'])
        """
        return list({info["channel_type"] for info in self.channels.values()})

    def get_channels_by_type(self, channel_type: str) -> list[str]:
        """
        Get list of channels of a specific type.

        Args:
            channel_type: Type of channels to get ('EMG', 'ACC', 'GYRO', etc.)

        Returns:
            List of channel names of the specified type
        """
        return [ch for ch, info in self.channels.items() if info["channel_type"] == channel_type]

    def get_modalities(self) -> list[str]:
        """
        Get the list of unique modalities present in the data.

        Returns:
            List of modalities (e.g., ['EEG', 'EMG', 'MISC']).
        """
        return list(
            {info.get("modality") for info in self.channels.values() if info.get("modality")}
        )

    def get_channels_by_modality(self, modality: str) -> list[str]:
        """
        Get the channels belonging to a given modality.

        Args:
            modality: Modality to filter by ('EEG', 'EMG', 'IEEG', 'MEG', 'BEH', 'MISC').

        Returns:
            List of channel names of the specified modality.
        """
        canonical_modality = validate_modality(modality)
        return [
            ch for ch, info in self.channels.items() if info.get("modality") == canonical_modality
        ]

    def to_edf(
        self,
        filepath: str,
        method: str = "both",
        fft_noise_range: tuple | None = None,
        svd_rank: int | None = None,
        precision_threshold: float = 0.01,
        format: Literal["auto", "edf", "bdf"] = "auto",
        bypass_analysis: bool | None = None,
        verify: bool = False,
        verify_tolerance: float = 1e-6,
        verify_channel_map: dict[str, str] | None = None,
        verify_plot: bool = False,
        events_df: pd.DataFrame | None = None,
        create_channels_tsv: bool = True,
        clip_outliers: bool | str = "auto",
        **kwargs,
    ) -> dict | None:
        """
        Export the recording to EDF/BDF format, optionally including events.

        Args:
            filepath: Path to save the EDF/BDF file
            method: Method for signal analysis ('svd', 'fft', or 'both')
                'svd': Uses Singular Value Decomposition for noise floor estimation
                'fft': Uses Fast Fourier Transform for noise floor estimation
                'both': Uses both methods and takes the minimum noise floor (default)
            fft_noise_range: Optional tuple (min_freq, max_freq) specifying frequency range for noise in FFT method
            svd_rank: Optional manual rank cutoff for signal/noise separation in SVD method
            precision_threshold: Maximum acceptable precision loss percentage (default: 0.01%)
            format: Format to use ('auto', 'edf', or 'bdf'). Default is 'auto'.
                    If 'edf' or 'bdf' is specified, that format will be used directly.
                    If 'auto', the format (EDF/16-bit or BDF/24-bit) is chosen based
                    on signal analysis to minimize precision loss while preferring EDF
                    if sufficient.
            bypass_analysis: If True, skip signal analysis step when format is explicitly
                             set to 'edf' or 'bdf'. If None (default), analysis is skipped
                             automatically when format is forced. Set to False to force
                             analysis even with a specified format. Ignored if format='auto'.
            verify: If True, reload the exported file and compare signals with the original
                    to check for data integrity loss. Results are printed. (default: False)
            verify_tolerance: Absolute tolerance used when comparing signals during verification. (default: 1e-6)
            verify_channel_map: Optional dictionary mapping original channel names (keys)
                                to reloaded channel names (values) for verification.
                                Used if `verify` is True and channel names might differ.
            verify_plot: If True and verify is True, plots a comparison of original vs reloaded signals.
            events_df: Optional DataFrame with events ('onset', 'duration', 'description').
                      If None, uses self.events. (This provides flexibility)
            create_channels_tsv: If True, create a BIDS-compliant channels.tsv file (default: True)
            clip_outliers: Singularity handling for the per-channel physical window.
                'auto' (default) keeps the full range losslessly but clips rare extreme
                outliers to a robust window only when keeping them would crater the bulk
                signal's resolution at the chosen format (with a warning); True always
                clips to the robust window; False never clips. See EDFExporter.export for
                the advanced ``outlier_sigmas`` / ``min_effective_bits`` knobs.
            **kwargs: Additional arguments for the EDF exporter

        Returns:
            Union[str, None]: If verify is True, returns a string with verification results.
                             Otherwise, returns None.

        Raises:
            ValueError: If no signals are loaded
        """
        from ..exporters.edf import EDFExporter  # Local import

        if self.signals is None:
            raise ValueError("No signals loaded")

        # --- Determine if analysis should be bypassed ---
        final_bypass_analysis = False
        if format.lower() == "auto":
            if bypass_analysis is True:
                logging.warning(
                    "bypass_analysis=True ignored because format='auto'. Analysis is required."
                )
            # Analysis is always needed for 'auto' format
            final_bypass_analysis = False
        elif format.lower() in ["edf", "bdf"]:
            if bypass_analysis is None:
                # Default behaviour: skip analysis if format is forced
                final_bypass_analysis = True
                msg = (
                    f"Format forced to '{format}'. Skipping signal analysis for faster export. "
                    "Set bypass_analysis=False to force analysis."
                )
                logging.log(logging.CRITICAL, msg)
            elif bypass_analysis is True:
                final_bypass_analysis = True
                logging.log(logging.CRITICAL, "bypass_analysis=True set. Skipping signal analysis.")
            else:  # bypass_analysis is False
                final_bypass_analysis = False
                logging.info(
                    f"Format forced to '{format}' but bypass_analysis=False. Performing signal analysis."
                )
        else:
            # Should not happen if Literal type hint works, but good practice
            logging.warning(
                f"Unknown format '{format}'. Defaulting to 'auto' behavior (analysis enabled)."
            )
            format = "auto"
            final_bypass_analysis = False

        # Determine which events DataFrame to use
        if events_df is None:
            events_to_export = self.events
        else:
            events_to_export = events_df

        # Combine parameters
        all_params: dict[str, Any] = {
            "precision_threshold": precision_threshold,
            "method": method,
            "fft_noise_range": fft_noise_range,
            "svd_rank": svd_rank,
            "format": format,
            "bypass_analysis": final_bypass_analysis,
            "events_df": events_to_export,  # Pass the events dataframe
            "create_channels_tsv": create_channels_tsv,
            "clip_outliers": clip_outliers,
            **kwargs,
        }

        EDFExporter.export(self, filepath, **all_params)

        verification_report_dict = None
        if verify:
            logging.info(f"Verification requested. Reloading exported file: {filepath}")
            try:
                # Reload the exported file
                reloaded_rec = Recording.from_file(filepath, importer="edf")

                logging.info("Comparing original signals with reloaded signals...")
                # Compare signals using the imported function
                verification_results = compare_signals(
                    self, reloaded_rec, tolerance=verify_tolerance, channel_map=verify_channel_map
                )

                # Generate and log report using the imported function
                report_verification_results(verification_results, verify_tolerance)
                verification_report_dict = verification_results

                # Plot comparison using imported function if requested
                summary = verification_results.get("channel_summary", {})
                comparison_mode = summary.get("comparison_mode", "unknown")
                compared_count = sum(1 for k in verification_results if k != "channel_summary")

                if verify_plot and compared_count > 0 and comparison_mode != "failed":
                    plot_comparison(self, reloaded_rec, channel_map=verify_channel_map)
                elif verify_plot:
                    logging.warning(
                        "Skipping verification plot: No channels were successfully compared."
                    )

            except Exception as e:
                logging.error(f"Verification failed during reload or comparison: {e}")
                verification_report_dict = {
                    "error": str(e),
                    "channel_summary": {"comparison_mode": "failed"},
                }

        return verification_report_dict

    def to_parquet(self, filepath: str) -> str:
        """Export to a self-describing biosigIO Parquet file.

        Signals are stored as a columnar table (channels = columns, time index
        preserved); channels/events/metadata travel in the file's schema metadata,
        so ``Recording.from_file`` round-trips it losslessly. Great for analytics
        (DuckDB/Polars/pandas/Spark). Requires the ``arrow`` extra (pyarrow).

        Args:
            filepath: Output ``.parquet`` path.

        Returns:
            str: The written file path.
        """
        from ..exporters.tabular import TabularExporter

        return TabularExporter.to_parquet(self, filepath)

    def to_arrow(self, filepath: str) -> str:
        """Export to a biosigIO Arrow/Feather file (fast zero-copy IPC).

        Same self-describing schema as :meth:`to_parquet`; round-trips via
        ``Recording.from_file``. Requires the ``arrow`` extra (pyarrow).

        Args:
            filepath: Output ``.feather`` / ``.arrow`` path.

        Returns:
            str: The written file path.
        """
        from ..exporters.tabular import TabularExporter

        return TabularExporter.to_arrow(self, filepath)

    def to_zarr(self, filepath: str, **kwargs) -> str:
        """Export to a sharded Zarr v3 serving store with a min/max view pyramid.

        Writes one cloud-native store that serves viewing, inference, and training
        from a single conversion: ``level 0`` of each ``(modality, rate)`` group is
        the anti-aliased, per-modality-resampled inference signal, with a min/max
        render pyramid above it (flagged not-for-inference). A derived serving copy,
        not the archival source (BIDS/EDF stay authoritative). Requires the ``zarr``
        extra (zarr v3). See :class:`~biosigio.exporters.zarr.ZarrExporter` for the
        tuning knobs (``modality_rates``, ``dtype``, chunk/shard sizing, ...).

        Args:
            filepath: Output store path (``.zarr`` appended if missing).
            **kwargs: Forwarded to :meth:`ZarrExporter.export`.

        Returns:
            str: The written store path.
        """
        from ..exporters.zarr import ZarrExporter

        # The empty-signal guard lives once, in ZarrExporter.export ("No signals
        # loaded"), matching the tabular path; no duplicate guard here.
        return ZarrExporter.export(self, filepath, **kwargs)

    def set_metadata(self, key: str, value: Any) -> None:
        """
        Set metadata value.

        Args:
            key: Metadata key
            value: Metadata value
        """
        self.metadata[key] = value

    def get_metadata(self, key: str) -> Any:
        """
        Get metadata value.

        Args:
            key: Metadata key

        Returns:
            Value associated with the key
        """
        return self.metadata.get(key)

    def has_metadata(self, key: str) -> bool:
        """Return True if ``key`` is present in the recording metadata."""
        return key in self.metadata

    def get_n_channels(self) -> int:
        """Number of channels in the recording."""
        return len(self.channels)

    def get_n_samples(self) -> int:
        """Number of time samples per channel (0 if no signals are loaded)."""
        return 0 if self.signals is None else len(self.signals)

    def get_sampling_frequency(self) -> float:
        """Sampling frequency in Hz, when all channels share a single rate.

        Raises:
            ValueError: if no channels are loaded, or channels have differing
                sampling frequencies; for a mixed-rate recording read
                ``channels[ch]["sample_frequency"]`` per channel instead.
        """
        if not self.channels:
            raise ValueError("No channels loaded")
        rates = {info["sample_frequency"] for info in self.channels.values()}
        if len(rates) > 1:
            raise ValueError(
                "Channels have differing sampling frequencies; read "
                "channels[ch]['sample_frequency'] per channel instead."
            )
        return float(next(iter(rates)))

    def get_duration(self) -> float:
        """Total recording duration in seconds (n_samples / sampling_frequency).

        Computed from the time index spacing, so it is the full window length
        (one sample period longer than the last sample's timestamp). Returns 0.0
        when fewer than two samples are loaded (a single sample has no inferable
        sample period).
        """
        if self.signals is None or len(self.signals) < 2:
            return 0.0
        index = self.signals.index
        sample_period = float(index[1] - index[0])
        return len(index) * sample_period

    def add_channel(
        self,
        label: str,
        data: np.ndarray,
        sample_frequency: float,
        physical_dimension: str,
        channel_type: str,
        *,
        modality: str | None = None,
        prefilter: str = "n/a",
    ) -> None:
        """
        Add a new channel to the recording.

        Args:
            label: Channel label or name (as per EDF specification)
            data: Channel data
            sample_frequency: Sampling frequency in Hz (as per EDF specification)
            physical_dimension: Physical dimension/unit of measurement (as per EDF specification)
            channel_type: BIDS channel type ('EEG', 'EMG', 'ECG', 'ACC', 'SEEG', ...).
                Required; validated against the modality vocabulary. There is no
                default (a missing type must be explicit, e.g. 'OTHER'/'MISC').
            modality: Coarse modality ('EEG', 'EMG', 'IEEG', 'MEG', 'BEH', 'MISC').
                If None, it is inferred from ``channel_type``.
            prefilter: Pre-filtering applied to the channel (keyword-only).
        """
        canonical_type = validate_channel_type(channel_type)
        canonical_modality = (
            infer_modality_from_channel_type(canonical_type)
            if modality is None
            else validate_modality(modality)
        )

        if self.signals is None:
            # Create DataFrame with time index
            time = np.arange(len(data)) / sample_frequency
            self.signals = pd.DataFrame(index=time)

        self.signals[label] = data
        self.channels[label] = {
            "sample_frequency": sample_frequency,
            "physical_dimension": physical_dimension,
            "prefilter": prefilter,
            "channel_type": canonical_type,
            "modality": canonical_modality,
        }

    def set_channel(
        self,
        label: str,
        *,
        channel_type: str | None = None,
        modality: str | None = None,
        physical_dimension: str | None = None,
        prefilter: str | None = None,
    ) -> None:
        """
        Update metadata of an existing channel (the supported relabel path).

        Args:
            label: Existing channel label.
            channel_type: New BIDS channel type (validated). When given without an
                explicit ``modality``, the modality is re-derived from it.
            modality: New coarse modality (validated).
            physical_dimension: New physical unit.
            prefilter: New prefilter string.

        Raises:
            KeyError: If ``label`` is not an existing channel.
            ValueError: If ``channel_type`` or ``modality`` is not in the
                modality vocabulary.
        """
        if label not in self.channels:
            raise KeyError(f"Channel not found: {label}")
        info = self.channels[label]
        if channel_type is not None:
            info["channel_type"] = validate_channel_type(channel_type)
            if modality is None:
                info["modality"] = infer_modality_from_channel_type(info["channel_type"])
        if modality is not None:
            info["modality"] = validate_modality(modality)
        if physical_dimension is not None:
            info["physical_dimension"] = physical_dimension
        if prefilter is not None:
            info["prefilter"] = prefilter

    def add_event(self, onset: float, duration: float, description: str) -> None:
        """
        Add an event/annotation to the recording.

        Args:
            onset: Event onset time in seconds.
            duration: Event duration in seconds.
            description: Event description string.
        """
        new_event = pd.DataFrame(
            [{"onset": float(onset), "duration": float(duration), "description": description}]
        )
        # Avoid concatenating onto the empty, object-dtype events frame, which
        # would coerce the numeric columns to object. Start from the typed
        # new_event when there are no existing events.
        if self.events is None or self.events.empty:
            self.events = new_event
        else:
            self.events = pd.concat([self.events, new_event], ignore_index=True)
        # Sort events by onset time for consistency
        self.events = self.events.sort_values(by="onset").reset_index(drop=True)

`init()` ¶

Initialize an empty recording.

Source code in biosigio/core/emg.py

def __init__(self):
    """Initialize an empty recording."""
    self.signals = None
    self.metadata = {}
    self.channels = {}
    # Initialize events as an empty DataFrame with specified columns
    self.events = pd.DataFrame(columns=["onset", "duration", "description"])

`add_channel(label, data, sample_frequency, physical_dimension, channel_type, *, modality=None, prefilter='n/a')` ¶

Add a new channel to the recording.

Args: label: Channel label or name (as per EDF specification) data: Channel data sample_frequency: Sampling frequency in Hz (as per EDF specification) physical_dimension: Physical dimension/unit of measurement (as per EDF specification) channel_type: BIDS channel type ('EEG', 'EMG', 'ECG', 'ACC', 'SEEG', ...). Required; validated against the modality vocabulary. There is no default (a missing type must be explicit, e.g. 'OTHER'/'MISC'). modality: Coarse modality ('EEG', 'EMG', 'IEEG', 'MEG', 'BEH', 'MISC'). If None, it is inferred from channel_type. prefilter: Pre-filtering applied to the channel (keyword-only).

Source code in biosigio/core/emg.py

def add_channel(
    self,
    label: str,
    data: np.ndarray,
    sample_frequency: float,
    physical_dimension: str,
    channel_type: str,
    *,
    modality: str | None = None,
    prefilter: str = "n/a",
) -> None:
    """
    Add a new channel to the recording.

    Args:
        label: Channel label or name (as per EDF specification)
        data: Channel data
        sample_frequency: Sampling frequency in Hz (as per EDF specification)
        physical_dimension: Physical dimension/unit of measurement (as per EDF specification)
        channel_type: BIDS channel type ('EEG', 'EMG', 'ECG', 'ACC', 'SEEG', ...).
            Required; validated against the modality vocabulary. There is no
            default (a missing type must be explicit, e.g. 'OTHER'/'MISC').
        modality: Coarse modality ('EEG', 'EMG', 'IEEG', 'MEG', 'BEH', 'MISC').
            If None, it is inferred from ``channel_type``.
        prefilter: Pre-filtering applied to the channel (keyword-only).
    """
    canonical_type = validate_channel_type(channel_type)
    canonical_modality = (
        infer_modality_from_channel_type(canonical_type)
        if modality is None
        else validate_modality(modality)
    )

    if self.signals is None:
        # Create DataFrame with time index
        time = np.arange(len(data)) / sample_frequency
        self.signals = pd.DataFrame(index=time)

    self.signals[label] = data
    self.channels[label] = {
        "sample_frequency": sample_frequency,
        "physical_dimension": physical_dimension,
        "prefilter": prefilter,
        "channel_type": canonical_type,
        "modality": canonical_modality,
    }

`add_event(onset, duration, description)` ¶

Add an event/annotation to the recording.

Args: onset: Event onset time in seconds. duration: Event duration in seconds. description: Event description string.

Source code in biosigio/core/emg.py

def add_event(self, onset: float, duration: float, description: str) -> None:
    """
    Add an event/annotation to the recording.

    Args:
        onset: Event onset time in seconds.
        duration: Event duration in seconds.
        description: Event description string.
    """
    new_event = pd.DataFrame(
        [{"onset": float(onset), "duration": float(duration), "description": description}]
    )
    # Avoid concatenating onto the empty, object-dtype events frame, which
    # would coerce the numeric columns to object. Start from the typed
    # new_event when there are no existing events.
    if self.events is None or self.events.empty:
        self.events = new_event
    else:
        self.events = pd.concat([self.events, new_event], ignore_index=True)
    # Sort events by onset time for consistency
    self.events = self.events.sort_values(by="onset").reset_index(drop=True)

`from_file(filepath, importer=None, force_csv=False, bids_channels='auto', mixed_rate='error', **kwargs)` `classmethod` ¶

The method to create a Recording object from file.

Args: filepath: Path to the input file importer: Name of the importer to use. Can be one of the following: - 'trigno': Delsys Trigno EMG system (CSV) - 'otb': OTB/OTB+ EMG system (OTB, OTB+) - 'eeglab': EEGLAB .set files (SET) - 'edf': EDF/EDF+/BDF/BDF+ format (EDF, BDF) - 'csv': Generic CSV (or TXT) files with columnar data - 'wfdb': Waveform Database (WFDB) - 'xdf': XDF format (multi-stream Lab Streaming Layer files) - 'meg': MEG via MNE (.fif, CTF .ds, KIT .con/.sqd/.kdf; requires the 'meg' extra) - 'brainvision': BrainVision .vhdr via MNE (requires the 'meg' extra) - 'tabular': biosigIO Parquet/Arrow/Feather (requires the 'arrow' extra) - 'neo': proprietary electrophysiology formats via python-neo (Intan, Blackrock, Spike2, Plexon, Micromed, Neuralynx, ...; requires the 'neo' extra) - 'zarr': biosigIO Zarr serving store (requires the 'zarr' extra) If None, the importer will be inferred from the file extension. Automatic import is supported for CSV/TXT files. force_csv: If True and importer is 'csv', forces using the generic CSV importer even if the file appears to match a specialized format. bids_channels: When 'auto' (default), look for a sibling BIDS _channels.tsv next to the file and apply its per-channel type/units over the importer's inferred values. Pass 'off' to disable. mixed_rate: Policy for an EDF/BDF file whose signals carry differing per-channel sampling rates (ignored for every other format, which is single-rate). 'error' (default) raises -- biosigIO stores one uniform grid and will not fabricate a common one silently. 'resample' upsamples the slower channels to the fastest rate (a lossy derived view; each channel keeps its native rate as original_sample_frequency). **kwargs: Additional arguments passed to the importer. For XDF files, useful kwargs include: - stream_names: List of stream names to import - stream_types: List of stream types to import (e.g., ["EMG", "EXG"]) - stream_ids: List of stream IDs to import

Returns: Recording: New Recording object with loaded data

Source code in biosigio/core/emg.py

@classmethod
def from_file(
    cls,
    filepath: str,
    importer: ImporterName | None = None,
    force_csv: bool = False,
    bids_channels: str = "auto",
    mixed_rate: str = "error",
    **kwargs,
) -> "Recording":
    """
    The method to create a Recording object from file.

    Args:
        filepath: Path to the input file
        importer: Name of the importer to use. Can be one of the following:
            - 'trigno': Delsys Trigno EMG system (CSV)
            - 'otb': OTB/OTB+ EMG system (OTB, OTB+)
            - 'eeglab': EEGLAB .set files (SET)
            - 'edf': EDF/EDF+/BDF/BDF+ format (EDF, BDF)
            - 'csv': Generic CSV (or TXT) files with columnar data
            - 'wfdb': Waveform Database (WFDB)
            - 'xdf': XDF format (multi-stream Lab Streaming Layer files)
            - 'meg': MEG via MNE (.fif, CTF .ds, KIT .con/.sqd/.kdf; requires the 'meg' extra)
            - 'brainvision': BrainVision .vhdr via MNE (requires the 'meg' extra)
            - 'tabular': biosigIO Parquet/Arrow/Feather (requires the 'arrow' extra)
            - 'neo': proprietary electrophysiology formats via python-neo
              (Intan, Blackrock, Spike2, Plexon, Micromed, Neuralynx, ...;
              requires the 'neo' extra)
            - 'zarr': biosigIO Zarr serving store (requires the 'zarr' extra)
            If None, the importer will be inferred from the file extension.
            Automatic import is supported for CSV/TXT files.
        force_csv: If True and importer is 'csv', forces using the generic CSV
                  importer even if the file appears to match a specialized format.
        bids_channels: When 'auto' (default), look for a sibling BIDS
                  _channels.tsv next to the file and apply its per-channel
                  type/units over the importer's inferred values. Pass 'off'
                  to disable.
        mixed_rate: Policy for an EDF/BDF file whose signals carry differing
                  per-channel sampling rates (ignored for every other format,
                  which is single-rate). 'error' (default) raises -- biosigIO
                  stores one uniform grid and will not fabricate a common one
                  silently. 'resample' upsamples the slower channels to the
                  fastest rate (a lossy derived view; each channel keeps its
                  native rate as ``original_sample_frequency``).
        **kwargs: Additional arguments passed to the importer.
            For XDF files, useful kwargs include:
            - stream_names: List of stream names to import
            - stream_types: List of stream types to import (e.g., ["EMG", "EXG"])
            - stream_ids: List of stream IDs to import

    Returns:
        Recording: New Recording object with loaded data
    """
    if importer is None:
        importer = cls._infer_importer(filepath)

    importers = {
        "trigno": "TrignoImporter",  # CSV with Delsys Trigno Headers
        "otb": "OTBImporter",  # OTB/OTB+ EMG system data
        "edf": "EDFImporter",  # EDF/EDF+/BDF format
        "eeglab": "EEGLABImporter",  # EEGLAB .set files
        "csv": "CSVImporter",  # Generic CSV/Text files
        "wfdb": "WFDBImporter",  # Waveform Database format
        "xdf": "XDFImporter",  # XDF multi-stream format
        "meg": "MEGImporter",  # MEG via MNE (.fif, CTF .ds, KIT .con/.sqd/.kdf)
        "brainvision": "BrainVisionImporter",  # BrainVision via MNE (.vhdr)
        "tabular": "TabularImporter",  # biosigIO Parquet / Arrow / Feather
        "neo": "NeoImporter",  # proprietary ephys via python-neo
        "zarr": "ZarrImporter",  # biosigIO Zarr serving store
    }

    if importer not in importers:
        raise ValueError(
            f"Unsupported importer: {importer}. "
            f"Available importers: {list(importers.keys())}\n"
            "- trigno: Delsys Trigno EMG system\n"
            "- otb: OTB/OTB+ EMG system\n"
            "- edf: EDF/EDF+/BDF format\n"
            "- eeglab: EEGLAB .set files\n"
            "- csv: Generic CSV/Text files\n"
            "- wfdb: Waveform Database\n"
            "- xdf: XDF multi-stream format\n"
            "- meg: MEG via MNE (.fif, CTF .ds, KIT .con/.sqd/.kdf)\n"
            "- brainvision: BrainVision via MNE (.vhdr)\n"
            "- tabular: biosigIO Parquet/Arrow/Feather (.parquet, .feather, .arrow)\n"
            "- neo: proprietary electrophysiology formats via python-neo "
            "(Intan, Blackrock, Spike2, Plexon, Micromed, Neuralynx, ...)\n"
            "- zarr: biosigIO Zarr serving store (.zarr)"
        )

    # If using CSV importer and force_csv is set, pass it as force_generic
    if importer == "csv":
        kwargs["force_generic"] = force_csv

    # mixed_rate is only meaningful for EDF/BDF, the one format whose signals may
    # carry differing per-channel sampling rates; forward it there and nowhere
    # else (so the default "error" never reaches an importer that can't use it).
    if importer == "edf":
        kwargs["mixed_rate"] = mixed_rate

    # Import the appropriate importer class
    importer_module = __import__(
        f"biosigio.importers.{importer}", globals(), locals(), [importers[importer]]
    )
    importer_class = getattr(importer_module, importers[importer])

    # Create importer instance and load data
    rec = importer_class().load(filepath, **kwargs)

    # Record provenance: which format this recording came from. setdefault so a
    # re-imported serialization file (tabular/zarr) keeps the ORIGINAL
    # source_format restored from its metadata rather than being relabeled.
    rec.metadata.setdefault("source_format", importer)

    # In a BIDS layout, the sibling _channels.tsv is the authoritative source
    # of per-channel type/units; apply it over the importer's header/label
    # guesses unless explicitly disabled with bids_channels="off".
    if bids_channels != "off":
        from ..bids import apply_channels_tsv, find_channels_tsv

        channels_tsv = find_channels_tsv(filepath)
        if channels_tsv:
            apply_channels_tsv(rec, channels_tsv)

    return rec

`get_channel_types()` ¶

Get list of unique channel types in the data.

Returns: List of channel types (e.g., ['EMG', 'ACC', 'GYRO'])

Source code in biosigio/core/emg.py

def get_channel_types(self) -> list[str]:
    """
    Get list of unique channel types in the data.

    Returns:
        List of channel types (e.g., ['EMG', 'ACC', 'GYRO'])
    """
    return list({info["channel_type"] for info in self.channels.values()})

`get_channels_by_modality(modality)` ¶

Get the channels belonging to a given modality.

Args: modality: Modality to filter by ('EEG', 'EMG', 'IEEG', 'MEG', 'BEH', 'MISC').

Returns: List of channel names of the specified modality.

Source code in biosigio/core/emg.py

def get_channels_by_modality(self, modality: str) -> list[str]:
    """
    Get the channels belonging to a given modality.

    Args:
        modality: Modality to filter by ('EEG', 'EMG', 'IEEG', 'MEG', 'BEH', 'MISC').

    Returns:
        List of channel names of the specified modality.
    """
    canonical_modality = validate_modality(modality)
    return [
        ch for ch, info in self.channels.items() if info.get("modality") == canonical_modality
    ]

`get_channels_by_type(channel_type)` ¶

Get list of channels of a specific type.

Args: channel_type: Type of channels to get ('EMG', 'ACC', 'GYRO', etc.)

Returns: List of channel names of the specified type

Source code in biosigio/core/emg.py

def get_channels_by_type(self, channel_type: str) -> list[str]:
    """
    Get list of channels of a specific type.

    Args:
        channel_type: Type of channels to get ('EMG', 'ACC', 'GYRO', etc.)

    Returns:
        List of channel names of the specified type
    """
    return [ch for ch, info in self.channels.items() if info["channel_type"] == channel_type]

`get_duration()` ¶

Total recording duration in seconds (n_samples / sampling_frequency).

Computed from the time index spacing, so it is the full window length (one sample period longer than the last sample's timestamp). Returns 0.0 when fewer than two samples are loaded (a single sample has no inferable sample period).

Source code in biosigio/core/emg.py

def get_duration(self) -> float:
    """Total recording duration in seconds (n_samples / sampling_frequency).

    Computed from the time index spacing, so it is the full window length
    (one sample period longer than the last sample's timestamp). Returns 0.0
    when fewer than two samples are loaded (a single sample has no inferable
    sample period).
    """
    if self.signals is None or len(self.signals) < 2:
        return 0.0
    index = self.signals.index
    sample_period = float(index[1] - index[0])
    return len(index) * sample_period

`get_metadata(key)` ¶

Get metadata value.

Args: key: Metadata key

Returns: Value associated with the key

Source code in biosigio/core/emg.py

def get_metadata(self, key: str) -> Any:
    """
    Get metadata value.

    Args:
        key: Metadata key

    Returns:
        Value associated with the key
    """
    return self.metadata.get(key)

`get_modalities()` ¶

Get the list of unique modalities present in the data.

Returns: List of modalities (e.g., ['EEG', 'EMG', 'MISC']).

Source code in biosigio/core/emg.py

def get_modalities(self) -> list[str]:
    """
    Get the list of unique modalities present in the data.

    Returns:
        List of modalities (e.g., ['EEG', 'EMG', 'MISC']).
    """
    return list(
        {info.get("modality") for info in self.channels.values() if info.get("modality")}
    )

`get_n_channels()` ¶

Number of channels in the recording.

Source code in biosigio/core/emg.py

def get_n_channels(self) -> int:
    """Number of channels in the recording."""
    return len(self.channels)

`get_n_samples()` ¶

Number of time samples per channel (0 if no signals are loaded).

Source code in biosigio/core/emg.py

def get_n_samples(self) -> int:
    """Number of time samples per channel (0 if no signals are loaded)."""
    return 0 if self.signals is None else len(self.signals)

`get_sampling_frequency()` ¶

Sampling frequency in Hz, when all channels share a single rate.

Raises: ValueError: if no channels are loaded, or channels have differing sampling frequencies; for a mixed-rate recording read channels[ch]["sample_frequency"] per channel instead.

Source code in biosigio/core/emg.py

def get_sampling_frequency(self) -> float:
    """Sampling frequency in Hz, when all channels share a single rate.

    Raises:
        ValueError: if no channels are loaded, or channels have differing
            sampling frequencies; for a mixed-rate recording read
            ``channels[ch]["sample_frequency"]`` per channel instead.
    """
    if not self.channels:
        raise ValueError("No channels loaded")
    rates = {info["sample_frequency"] for info in self.channels.values()}
    if len(rates) > 1:
        raise ValueError(
            "Channels have differing sampling frequencies; read "
            "channels[ch]['sample_frequency'] per channel instead."
        )
    return float(next(iter(rates)))

`has_metadata(key)` ¶

Return True if key is present in the recording metadata.

Source code in biosigio/core/emg.py

def has_metadata(self, key: str) -> bool:
    """Return True if ``key`` is present in the recording metadata."""
    return key in self.metadata

`plot_signals(channels=None, time_range=None, offset_scale=0.8, uniform_scale=True, detrend=False, grid=True, title=None, show=True, plt_module=None)` ¶

Plot signals in a single plot with vertical offsets.

Args: channels: List of channels to plot. If None, plot all channels. time_range: Tuple of (start_time, end_time) to plot. If None, plot all data. offset_scale: Portion of allocated space each signal can use (0.0 to 1.0). uniform_scale: Whether to use the same scale for all signals. detrend: Whether to remove mean from signals before plotting. grid: Whether to show grid lines. title: Optional title for the figure. show: Whether to display the plot. plt_module: Matplotlib pyplot module to use.

Source code in biosigio/core/emg.py

def plot_signals(
    self,
    channels=None,
    time_range=None,
    offset_scale=0.8,
    uniform_scale=True,
    detrend=False,
    grid=True,
    title=None,
    show=True,
    plt_module=None,
):
    """
    Plot signals in a single plot with vertical offsets.

    Args:
        channels: List of channels to plot. If None, plot all channels.
        time_range: Tuple of (start_time, end_time) to plot. If None, plot all data.
        offset_scale: Portion of allocated space each signal can use (0.0 to 1.0).
        uniform_scale: Whether to use the same scale for all signals.
        detrend: Whether to remove mean from signals before plotting.
        grid: Whether to show grid lines.
        title: Optional title for the figure.
        show: Whether to display the plot.
        plt_module: Matplotlib pyplot module to use.
    """
    # Delegate to the static plotting function in visualization module
    static_plot_signals(
        rec_object=self,
        channels=channels,
        time_range=time_range,
        offset_scale=offset_scale,
        uniform_scale=uniform_scale,
        detrend=detrend,
        grid=grid,
        title=title,
        show=show,
        plt_module=plt_module,
    )

`resample(target_rate)` ¶

Return a NEW, anti-aliased down-sampled copy of this recording.

Low-resolution demos need a smaller, lighter recording; this rebuilds the uniform signal grid at target_rate using a polyphase resampler (scipy.signal.resample_poly), which applies a Kaiser-windowed sinc anti-alias FIR before decimation. A naive stride-decimation would fold energy above the new Nyquist back into the band (aliasing); resample_poly removes that energy first, so no aliasing occurs.

Non-destructive: self is left untouched and a new Recording is returned, mirroring select_channels's copy semantics.

Resampling factors come from the integer source/target rates: g = gcd(int(src), int(target)); up = int(target)//g; down = int(src)//g and resample_poly(x, up, down) runs once, vectorized over all channels along axis=0.

Args: target_rate: Desired sampling rate in Hz. Must be <= the source rate (this is a DOWN-sampling helper). A target equal to the source returns an unchanged copy; a target above it raises ValueError rather than silently up-sampling (up-sampling cannot recover detail and is out of scope for the low-res pipeline).

Returns: Recording: A new Recording with the resampled signals, each channel's sample_frequency set to the achieved rate (source * up / down, which equals target_rate for integer rates), and channel/recording metadata and events preserved. Events are unchanged because their onsets/durations are in SECONDS, which stay valid under any rate change (only the per-sample grid shrinks, not wall-clock time).

Raises: ValueError: If no signals are loaded, if channels do not share a single sample_frequency (biosigio stores one uniform grid; mixed-rate resampling is out of scope), or if target_rate exceeds the source rate.

Source code in biosigio/core/emg.py

def resample(self, target_rate: float) -> "Recording":
    """Return a NEW, anti-aliased down-sampled copy of this recording.

    Low-resolution demos need a smaller, lighter recording; this rebuilds the
    uniform signal grid at ``target_rate`` using a polyphase resampler
    (``scipy.signal.resample_poly``), which applies a Kaiser-windowed sinc
    anti-alias FIR before decimation. A naive stride-decimation would fold
    energy above the new Nyquist back into the band (aliasing); resample_poly
    removes that energy first, so no aliasing occurs.

    Non-destructive: ``self`` is left untouched and a new Recording is returned,
    mirroring ``select_channels``'s copy semantics.

    Resampling factors come from the integer source/target rates:
    ``g = gcd(int(src), int(target)); up = int(target)//g; down = int(src)//g``
    and ``resample_poly(x, up, down)`` runs once, vectorized over all channels
    along ``axis=0``.

    Args:
        target_rate: Desired sampling rate in Hz. Must be <= the source rate
            (this is a DOWN-sampling helper). A target equal to the source
            returns an unchanged copy; a target above it raises ``ValueError``
            rather than silently up-sampling (up-sampling cannot recover
            detail and is out of scope for the low-res pipeline).

    Returns:
        Recording: A new Recording with the resampled signals, each channel's
            ``sample_frequency`` set to the achieved rate (source * up / down,
            which equals ``target_rate`` for integer rates), and channel/recording
            metadata and events preserved. Events are unchanged because their
            onsets/durations are in SECONDS, which stay valid under any rate
            change (only the per-sample grid shrinks, not wall-clock time).

    Raises:
        ValueError: If no signals are loaded, if channels do not share a single
            ``sample_frequency`` (biosigio stores one uniform grid; mixed-rate
            resampling is out of scope), or if ``target_rate`` exceeds the
            source rate.
    """
    from math import gcd

    from scipy.signal import resample_poly

    if self.signals is None:
        raise ValueError("No signals loaded")

    if target_rate <= 0:
        raise ValueError(f"target_rate must be positive, got {target_rate}")

    # biosigio stores all channels on one uniform-length grid; a per-channel rate
    # mix is out of scope here, matching the exporter's single-rate guard.
    distinct_rates = {info["sample_frequency"] for info in self.channels.values()}
    if len(distinct_rates) > 1:
        raise ValueError(
            "Resampling requires a single sampling rate across all channels, but "
            f"multiple were found: {sorted(distinct_rates)} Hz. biosigio stores one "
            "uniform grid; resample each rate group separately."
        )

    source_rate = float(next(iter(distinct_rates)))

    # Down-sampling only: refuse to up-sample/alias; an equal rate is a no-op
    # copy so callers can resample unconditionally without special-casing.
    if target_rate > source_rate:
        raise ValueError(
            f"target_rate {target_rate} Hz exceeds source rate {source_rate} Hz; "
            "resample() only down-samples (low-res). Up-sampling is out of scope."
        )

    new_rec = Recording()
    new_rec.channels = {ch: info.copy() for ch, info in self.channels.items()}
    new_rec.metadata = self.metadata.copy()
    # Onsets/durations are in SECONDS, so they remain valid after the grid
    # changes; copy them through unchanged.
    new_rec.events = self.events.copy() if self.events is not None else self.events

    if target_rate == source_rate:
        # No grid change: copy signals through untouched (fresh RangeIndex for
        # consistency with the resampled path).
        new_rec.signals = self.signals.copy().reset_index(drop=True)
        return new_rec

    # Rational resampling factors from the integer rates.
    src_i = int(round(source_rate))
    tgt_i = int(round(target_rate))
    g = gcd(src_i, tgt_i)
    up = tgt_i // g
    down = src_i // g

    # The achieved rate is exactly source * up / down. Store THAT, not the
    # requested float, so the metadata can never disagree with the data (a
    # non-integer or odd target snaps to the nearest achievable rational rate;
    # warn so the caller knows). This avoids silently writing e.g. 99.5 Hz
    # onto a grid that resample_poly actually produced at 100 Hz.
    actual_rate = source_rate * up / down
    if abs(actual_rate - target_rate) > 1e-9:
        logging.warning(
            "Requested resample to %g Hz; nearest achievable rational rate is "
            "%g Hz, which is what is stored on the channels.",
            target_rate,
            actual_rate,
        )

    columns = list(self.signals.columns)
    data = self.signals.to_numpy(dtype=float)
    # resample_poly over axis=0 resamples every channel column at once with the
    # shared anti-alias FIR.
    resampled = resample_poly(data, up, down, axis=0)

    new_rec.signals = pd.DataFrame(resampled, columns=columns)
    new_rec.signals.index = pd.RangeIndex(len(new_rec.signals))
    for info in new_rec.channels.values():
        info["sample_frequency"] = actual_rate

    return new_rec

`select_channels(channels=None, channel_type=None, inplace=False, *, modality=None)` ¶

Select specific channels from the data and return a new Recording object.

Args: channels: Channel name or list of channel names to select. If None and channel_type is specified, selects all channels of that type. channel_type: Type of channels to select ('EMG', 'ACC', 'GYRO', etc.). If specified with channels, filters the selection to only channels of this type.

Returns: Recording: A new Recording object containing only the selected channels

Examples: # Select specific channels new_rec = rec.select_channels(['EMG1', 'ACC1'])

# Select all EMG channels
emg_only = rec.select_channels(channel_type='EMG')

# Select specific EMG channels only, this example does not select ACC channels
emg_subset = rec.select_channels(['EMG1', 'ACC1'], channel_type='EMG')

Source code in biosigio/core/emg.py

def select_channels(
    self,
    channels: str | list[str] | None = None,
    channel_type: str | None = None,
    inplace: bool = False,
    *,
    modality: str | None = None,
) -> "Recording":
    """
    Select specific channels from the data and return a new Recording object.

    Args:
        channels: Channel name or list of channel names to select. If None and
                channel_type is specified, selects all channels of that type.
        channel_type: Type of channels to select ('EMG', 'ACC', 'GYRO', etc.).
                    If specified with channels, filters the selection to only
                    channels of this type.

    Returns:
        Recording: A new Recording object containing only the selected channels

    Examples:
        # Select specific channels
        new_rec = rec.select_channels(['EMG1', 'ACC1'])

        # Select all EMG channels
        emg_only = rec.select_channels(channel_type='EMG')

        # Select specific EMG channels only, this example does not select ACC channels
        emg_subset = rec.select_channels(['EMG1', 'ACC1'], channel_type='EMG')
    """
    if self.signals is None:
        raise ValueError("No signals loaded")

    if channels is None and channel_type is None and modality is None:
        raise ValueError("Specify at least one of: channels, channel_type, or modality.")

    # If type/modality specified but no channels, select all matching channels
    if channels is None and channel_type is not None:
        channels = self.get_channels_by_type(channel_type)
        if not channels:
            raise ValueError(f"No channels found of type: {channel_type}")
    elif channels is None and modality is not None:
        channels = self.get_channels_by_modality(modality)
        if not channels:
            raise ValueError(f"No channels found of modality: {modality}")
    elif isinstance(channels, str):
        channels = [channels]

    if channels is None:
        raise ValueError("Specify at least one of: channels, channel_type, or modality.")

    # Validate channels exist
    if not all(ch in self.signals.columns for ch in channels):
        missing = [ch for ch in channels if ch not in self.signals.columns]
        raise ValueError(f"Channels not found: {missing}")

    # Filter by type if specified
    if channel_type is not None:
        channels = [ch for ch in channels if self.channels[ch]["channel_type"] == channel_type]
        if not channels:
            raise ValueError(f"None of the selected channels are of type: {channel_type}")

    # Filter by modality if specified
    if modality is not None:
        canonical_modality = validate_modality(modality)
        channels = [
            ch for ch in channels if self.channels[ch].get("modality") == canonical_modality
        ]
        if not channels:
            raise ValueError(f"None of the selected channels are of modality: {modality}")

    # Create new Recording object
    new_rec = Recording()

    # Copy selected signals and channels
    new_rec.signals = self.signals[channels].copy()
    new_rec.channels = {ch: self.channels[ch].copy() for ch in channels}

    # Copy metadata
    new_rec.metadata = self.metadata.copy()

    if not inplace:
        return new_rec
    else:
        self.signals = new_rec.signals
        self.channels = new_rec.channels
        self.metadata = new_rec.metadata
        return self

`set_channel(label, *, channel_type=None, modality=None, physical_dimension=None, prefilter=None)` ¶

Update metadata of an existing channel (the supported relabel path).

Args: label: Existing channel label. channel_type: New BIDS channel type (validated). When given without an explicit modality, the modality is re-derived from it. modality: New coarse modality (validated). physical_dimension: New physical unit. prefilter: New prefilter string.

Raises: KeyError: If label is not an existing channel. ValueError: If channel_type or modality is not in the modality vocabulary.

Source code in biosigio/core/emg.py

def set_channel(
    self,
    label: str,
    *,
    channel_type: str | None = None,
    modality: str | None = None,
    physical_dimension: str | None = None,
    prefilter: str | None = None,
) -> None:
    """
    Update metadata of an existing channel (the supported relabel path).

    Args:
        label: Existing channel label.
        channel_type: New BIDS channel type (validated). When given without an
            explicit ``modality``, the modality is re-derived from it.
        modality: New coarse modality (validated).
        physical_dimension: New physical unit.
        prefilter: New prefilter string.

    Raises:
        KeyError: If ``label`` is not an existing channel.
        ValueError: If ``channel_type`` or ``modality`` is not in the
            modality vocabulary.
    """
    if label not in self.channels:
        raise KeyError(f"Channel not found: {label}")
    info = self.channels[label]
    if channel_type is not None:
        info["channel_type"] = validate_channel_type(channel_type)
        if modality is None:
            info["modality"] = infer_modality_from_channel_type(info["channel_type"])
    if modality is not None:
        info["modality"] = validate_modality(modality)
    if physical_dimension is not None:
        info["physical_dimension"] = physical_dimension
    if prefilter is not None:
        info["prefilter"] = prefilter

`set_metadata(key, value)` ¶

Set metadata value.

Args: key: Metadata key value: Metadata value

Source code in biosigio/core/emg.py

def set_metadata(self, key: str, value: Any) -> None:
    """
    Set metadata value.

    Args:
        key: Metadata key
        value: Metadata value
    """
    self.metadata[key] = value

`to_arrow(filepath)` ¶

Export to a biosigIO Arrow/Feather file (fast zero-copy IPC).

Same self-describing schema as :meth:to_parquet; round-trips via Recording.from_file. Requires the arrow extra (pyarrow).

Args: filepath: Output .feather / .arrow path.

Returns: str: The written file path.

Source code in biosigio/core/emg.py

def to_arrow(self, filepath: str) -> str:
    """Export to a biosigIO Arrow/Feather file (fast zero-copy IPC).

    Same self-describing schema as :meth:`to_parquet`; round-trips via
    ``Recording.from_file``. Requires the ``arrow`` extra (pyarrow).

    Args:
        filepath: Output ``.feather`` / ``.arrow`` path.

    Returns:
        str: The written file path.
    """
    from ..exporters.tabular import TabularExporter

    return TabularExporter.to_arrow(self, filepath)

`to_edf(filepath, method='both', fft_noise_range=None, svd_rank=None, precision_threshold=0.01, format='auto', bypass_analysis=None, verify=False, verify_tolerance=1e-06, verify_channel_map=None, verify_plot=False, events_df=None, create_channels_tsv=True, clip_outliers='auto', **kwargs)` ¶

Export the recording to EDF/BDF format, optionally including events.

Args: filepath: Path to save the EDF/BDF file method: Method for signal analysis ('svd', 'fft', or 'both') 'svd': Uses Singular Value Decomposition for noise floor estimation 'fft': Uses Fast Fourier Transform for noise floor estimation 'both': Uses both methods and takes the minimum noise floor (default) fft_noise_range: Optional tuple (min_freq, max_freq) specifying frequency range for noise in FFT method svd_rank: Optional manual rank cutoff for signal/noise separation in SVD method precision_threshold: Maximum acceptable precision loss percentage (default: 0.01%) format: Format to use ('auto', 'edf', or 'bdf'). Default is 'auto'. If 'edf' or 'bdf' is specified, that format will be used directly. If 'auto', the format (EDF/16-bit or BDF/24-bit) is chosen based on signal analysis to minimize precision loss while preferring EDF if sufficient. bypass_analysis: If True, skip signal analysis step when format is explicitly set to 'edf' or 'bdf'. If None (default), analysis is skipped automatically when format is forced. Set to False to force analysis even with a specified format. Ignored if format='auto'. verify: If True, reload the exported file and compare signals with the original to check for data integrity loss. Results are printed. (default: False) verify_tolerance: Absolute tolerance used when comparing signals during verification. (default: 1e-6) verify_channel_map: Optional dictionary mapping original channel names (keys) to reloaded channel names (values) for verification. Used if verify is True and channel names might differ. verify_plot: If True and verify is True, plots a comparison of original vs reloaded signals. events_df: Optional DataFrame with events ('onset', 'duration', 'description'). If None, uses self.events. (This provides flexibility) create_channels_tsv: If True, create a BIDS-compliant channels.tsv file (default: True) clip_outliers: Singularity handling for the per-channel physical window. 'auto' (default) keeps the full range losslessly but clips rare extreme outliers to a robust window only when keeping them would crater the bulk signal's resolution at the chosen format (with a warning); True always clips to the robust window; False never clips. See EDFExporter.export for the advanced outlier_sigmas / min_effective_bits knobs. **kwargs: Additional arguments for the EDF exporter

Returns: Union[str, None]: If verify is True, returns a string with verification results. Otherwise, returns None.

Raises: ValueError: If no signals are loaded

Source code in biosigio/core/emg.py

def to_edf(
    self,
    filepath: str,
    method: str = "both",
    fft_noise_range: tuple | None = None,
    svd_rank: int | None = None,
    precision_threshold: float = 0.01,
    format: Literal["auto", "edf", "bdf"] = "auto",
    bypass_analysis: bool | None = None,
    verify: bool = False,
    verify_tolerance: float = 1e-6,
    verify_channel_map: dict[str, str] | None = None,
    verify_plot: bool = False,
    events_df: pd.DataFrame | None = None,
    create_channels_tsv: bool = True,
    clip_outliers: bool | str = "auto",
    **kwargs,
) -> dict | None:
    """
    Export the recording to EDF/BDF format, optionally including events.

    Args:
        filepath: Path to save the EDF/BDF file
        method: Method for signal analysis ('svd', 'fft', or 'both')
            'svd': Uses Singular Value Decomposition for noise floor estimation
            'fft': Uses Fast Fourier Transform for noise floor estimation
            'both': Uses both methods and takes the minimum noise floor (default)
        fft_noise_range: Optional tuple (min_freq, max_freq) specifying frequency range for noise in FFT method
        svd_rank: Optional manual rank cutoff for signal/noise separation in SVD method
        precision_threshold: Maximum acceptable precision loss percentage (default: 0.01%)
        format: Format to use ('auto', 'edf', or 'bdf'). Default is 'auto'.
                If 'edf' or 'bdf' is specified, that format will be used directly.
                If 'auto', the format (EDF/16-bit or BDF/24-bit) is chosen based
                on signal analysis to minimize precision loss while preferring EDF
                if sufficient.
        bypass_analysis: If True, skip signal analysis step when format is explicitly
                         set to 'edf' or 'bdf'. If None (default), analysis is skipped
                         automatically when format is forced. Set to False to force
                         analysis even with a specified format. Ignored if format='auto'.
        verify: If True, reload the exported file and compare signals with the original
                to check for data integrity loss. Results are printed. (default: False)
        verify_tolerance: Absolute tolerance used when comparing signals during verification. (default: 1e-6)
        verify_channel_map: Optional dictionary mapping original channel names (keys)
                            to reloaded channel names (values) for verification.
                            Used if `verify` is True and channel names might differ.
        verify_plot: If True and verify is True, plots a comparison of original vs reloaded signals.
        events_df: Optional DataFrame with events ('onset', 'duration', 'description').
                  If None, uses self.events. (This provides flexibility)
        create_channels_tsv: If True, create a BIDS-compliant channels.tsv file (default: True)
        clip_outliers: Singularity handling for the per-channel physical window.
            'auto' (default) keeps the full range losslessly but clips rare extreme
            outliers to a robust window only when keeping them would crater the bulk
            signal's resolution at the chosen format (with a warning); True always
            clips to the robust window; False never clips. See EDFExporter.export for
            the advanced ``outlier_sigmas`` / ``min_effective_bits`` knobs.
        **kwargs: Additional arguments for the EDF exporter

    Returns:
        Union[str, None]: If verify is True, returns a string with verification results.
                         Otherwise, returns None.

    Raises:
        ValueError: If no signals are loaded
    """
    from ..exporters.edf import EDFExporter  # Local import

    if self.signals is None:
        raise ValueError("No signals loaded")

    # --- Determine if analysis should be bypassed ---
    final_bypass_analysis = False
    if format.lower() == "auto":
        if bypass_analysis is True:
            logging.warning(
                "bypass_analysis=True ignored because format='auto'. Analysis is required."
            )
        # Analysis is always needed for 'auto' format
        final_bypass_analysis = False
    elif format.lower() in ["edf", "bdf"]:
        if bypass_analysis is None:
            # Default behaviour: skip analysis if format is forced
            final_bypass_analysis = True
            msg = (
                f"Format forced to '{format}'. Skipping signal analysis for faster export. "
                "Set bypass_analysis=False to force analysis."
            )
            logging.log(logging.CRITICAL, msg)
        elif bypass_analysis is True:
            final_bypass_analysis = True
            logging.log(logging.CRITICAL, "bypass_analysis=True set. Skipping signal analysis.")
        else:  # bypass_analysis is False
            final_bypass_analysis = False
            logging.info(
                f"Format forced to '{format}' but bypass_analysis=False. Performing signal analysis."
            )
    else:
        # Should not happen if Literal type hint works, but good practice
        logging.warning(
            f"Unknown format '{format}'. Defaulting to 'auto' behavior (analysis enabled)."
        )
        format = "auto"
        final_bypass_analysis = False

    # Determine which events DataFrame to use
    if events_df is None:
        events_to_export = self.events
    else:
        events_to_export = events_df

    # Combine parameters
    all_params: dict[str, Any] = {
        "precision_threshold": precision_threshold,
        "method": method,
        "fft_noise_range": fft_noise_range,
        "svd_rank": svd_rank,
        "format": format,
        "bypass_analysis": final_bypass_analysis,
        "events_df": events_to_export,  # Pass the events dataframe
        "create_channels_tsv": create_channels_tsv,
        "clip_outliers": clip_outliers,
        **kwargs,
    }

    EDFExporter.export(self, filepath, **all_params)

    verification_report_dict = None
    if verify:
        logging.info(f"Verification requested. Reloading exported file: {filepath}")
        try:
            # Reload the exported file
            reloaded_rec = Recording.from_file(filepath, importer="edf")

            logging.info("Comparing original signals with reloaded signals...")
            # Compare signals using the imported function
            verification_results = compare_signals(
                self, reloaded_rec, tolerance=verify_tolerance, channel_map=verify_channel_map
            )

            # Generate and log report using the imported function
            report_verification_results(verification_results, verify_tolerance)
            verification_report_dict = verification_results

            # Plot comparison using imported function if requested
            summary = verification_results.get("channel_summary", {})
            comparison_mode = summary.get("comparison_mode", "unknown")
            compared_count = sum(1 for k in verification_results if k != "channel_summary")

            if verify_plot and compared_count > 0 and comparison_mode != "failed":
                plot_comparison(self, reloaded_rec, channel_map=verify_channel_map)
            elif verify_plot:
                logging.warning(
                    "Skipping verification plot: No channels were successfully compared."
                )

        except Exception as e:
            logging.error(f"Verification failed during reload or comparison: {e}")
            verification_report_dict = {
                "error": str(e),
                "channel_summary": {"comparison_mode": "failed"},
            }

    return verification_report_dict

`to_parquet(filepath)` ¶

Export to a self-describing biosigIO Parquet file.

Signals are stored as a columnar table (channels = columns, time index preserved); channels/events/metadata travel in the file's schema metadata, so Recording.from_file round-trips it losslessly. Great for analytics (DuckDB/Polars/pandas/Spark). Requires the arrow extra (pyarrow).

Args: filepath: Output .parquet path.

Returns: str: The written file path.

Source code in biosigio/core/emg.py

def to_parquet(self, filepath: str) -> str:
    """Export to a self-describing biosigIO Parquet file.

    Signals are stored as a columnar table (channels = columns, time index
    preserved); channels/events/metadata travel in the file's schema metadata,
    so ``Recording.from_file`` round-trips it losslessly. Great for analytics
    (DuckDB/Polars/pandas/Spark). Requires the ``arrow`` extra (pyarrow).

    Args:
        filepath: Output ``.parquet`` path.

    Returns:
        str: The written file path.
    """
    from ..exporters.tabular import TabularExporter

    return TabularExporter.to_parquet(self, filepath)

`to_zarr(filepath, **kwargs)` ¶

Export to a sharded Zarr v3 serving store with a min/max view pyramid.

Writes one cloud-native store that serves viewing, inference, and training from a single conversion: level 0 of each (modality, rate) group is the anti-aliased, per-modality-resampled inference signal, with a min/max render pyramid above it (flagged not-for-inference). A derived serving copy, not the archival source (BIDS/EDF stay authoritative). Requires the zarr extra (zarr v3). See :class:~biosigio.exporters.zarr.ZarrExporter for the tuning knobs (modality_rates, dtype, chunk/shard sizing, ...).

Args: filepath: Output store path (.zarr appended if missing). **kwargs: Forwarded to :meth:ZarrExporter.export.

Returns: str: The written store path.

Source code in biosigio/core/emg.py

def to_zarr(self, filepath: str, **kwargs) -> str:
    """Export to a sharded Zarr v3 serving store with a min/max view pyramid.

    Writes one cloud-native store that serves viewing, inference, and training
    from a single conversion: ``level 0`` of each ``(modality, rate)`` group is
    the anti-aliased, per-modality-resampled inference signal, with a min/max
    render pyramid above it (flagged not-for-inference). A derived serving copy,
    not the archival source (BIDS/EDF stay authoritative). Requires the ``zarr``
    extra (zarr v3). See :class:`~biosigio.exporters.zarr.ZarrExporter` for the
    tuning knobs (``modality_rates``, ``dtype``, chunk/shard sizing, ...).

    Args:
        filepath: Output store path (``.zarr`` appended if missing).
        **kwargs: Forwarded to :meth:`ZarrExporter.export`.

    Returns:
        str: The written store path.
    """
    from ..exporters.zarr import ZarrExporter

    # The empty-signal guard lives once, in ZarrExporter.export ("No signals
    # loaded"), matching the tabular path; no duplicate guard here.
    return ZarrExporter.export(self, filepath, **kwargs)

`XDFImporter` ¶

Bases: BaseImporter

Importer for XDF (Extensible Data Format) files.

XDF files can contain multiple data streams. This importer allows selective import of specific streams by name, type, or ID.

Memory Optimization: For large XDF files, use stream selection parameters to load only the streams you need. The importer uses pyxdf's native stream selection to avoid loading unnecessary data into memory.

Example: >>> # First, explore the file (memory-efficient) >>> from biosigio.importers.xdf import summarize_xdf >>> summary = summarize_xdf("recording.xdf") >>> print(summary) >>> >>> # Import specific streams (only loads selected streams) >>> importer = XDFImporter() >>> rec = importer.load("recording.xdf", stream_names=["EMG_stream"]) >>> >>> # Or import by type >>> rec = importer.load("recording.xdf", stream_types=["EMG", "EXG"])

Source code in biosigio/importers/xdf.py

class XDFImporter(BaseImporter):
    """
    Importer for XDF (Extensible Data Format) files.

    XDF files can contain multiple data streams. This importer allows selective
    import of specific streams by name, type, or ID.

    Memory Optimization:
        For large XDF files, use stream selection parameters to load only the
        streams you need. The importer uses pyxdf's native stream selection
        to avoid loading unnecessary data into memory.

    Example:
        >>> # First, explore the file (memory-efficient)
        >>> from biosigio.importers.xdf import summarize_xdf
        >>> summary = summarize_xdf("recording.xdf")
        >>> print(summary)
        >>>
        >>> # Import specific streams (only loads selected streams)
        >>> importer = XDFImporter()
        >>> rec = importer.load("recording.xdf", stream_names=["EMG_stream"])
        >>>
        >>> # Or import by type
        >>> rec = importer.load("recording.xdf", stream_types=["EMG", "EXG"])
    """

    def load(
        self,
        filepath: str,
        stream_names: list[str] | None = None,
        stream_types: list[str] | None = None,
        stream_ids: list[int] | None = None,
        sync_streams: bool = True,
        default_channel_type: str = "OTHER",
        include_timestamps: bool = False,
        reference_stream: str | None = None,
        max_memory_gb: float | None = None,
    ) -> Recording:
        """
        Load EMG data from an XDF file.

        Streams can be selected by name, type, or ID. If multiple selection
        criteria are provided, streams matching ANY criterion are included.
        If no selection criteria are provided, all streams with numeric data
        are loaded.

        Memory Optimization:
            - Stream selection is passed directly to pyxdf, so only requested
              streams are loaded into memory. This significantly reduces RAM
              usage for large files with multiple streams.
            - Use summarize_xdf() first to explore file contents without loading data.
            - The max_memory_gb parameter can warn or raise if estimated memory
              usage exceeds the limit.

        Args:
            filepath: Path to the XDF file
            stream_names: List of stream names to import (case-insensitive)
            stream_types: List of stream types to import (e.g., ["EMG", "EXG"])
            stream_ids: List of stream IDs to import
            sync_streams: If True, synchronize streams to common timestamps.
                         If False, streams are loaded without synchronization.
            default_channel_type: Default channel type for channels without
                                 explicit type info (default: "OTHER"; no silent
                                 EMG assumption)
            include_timestamps: If True, add a timestamp channel for each stream
                               named "{stream_name}_LSL_timestamps" containing
                               the original LSL timestamps. Useful for preserving
                               timing information when exporting to formats like
                               EDF that require regular sampling.
            reference_stream: Optional stream name to use as the time base
                             reference. If not specified, the stream with the
                             highest sampling rate is used (recommended to
                             avoid data loss from downsampling).
            max_memory_gb: Optional maximum memory usage in GB. If specified,
                          raises MemoryError if estimated memory exceeds this
                          limit. Use summarize_xdf() to estimate memory needs.

        Returns:
            Recording: Recording object containing the loaded data

        Raises:
            ValueError: If no matching streams found or file cannot be read
            ImportError: If pyxdf is not installed
            MemoryError: If estimated memory exceeds max_memory_gb
        """
        try:
            import pyxdf
        except ImportError as e:
            raise ImportError(
                "pyxdf is required for XDF file support. Install it with: pip install pyxdf"
            ) from e

        filepath = str(filepath)

        # Check memory usage before loading if max_memory_gb is specified
        if max_memory_gb is not None:
            self._check_memory_usage(
                filepath, stream_names, stream_types, stream_ids, max_memory_gb
            )

        # Build pyxdf select_streams parameter for efficient loading
        select_streams = self._build_select_streams(
            filepath, stream_names, stream_types, stream_ids
        )

        # Load only selected streams (pyxdf handles the filtering at load time)
        data, header = pyxdf.load_xdf(filepath, select_streams=select_streams)

        if not data:
            raise ValueError(f"No streams found in XDF file: {filepath}")

        # Filter streams based on selection criteria (for additional filtering)
        selected_streams = self._select_streams(data, stream_names, stream_types, stream_ids)

        if not selected_streams:
            # If no criteria specified, select all streams with numeric data
            if stream_names is None and stream_types is None and stream_ids is None:
                selected_streams = [
                    s
                    for s in data
                    if isinstance(s["time_series"], np.ndarray)
                    and s["time_series"].dtype.kind in "iufc"
                ]
            if not selected_streams:
                raise ValueError(
                    "No matching streams found. Use summarize_xdf() to explore the file."
                )

        # Create Recording object
        rec = Recording()

        # Store metadata
        rec.set_metadata("source_file", filepath)
        rec.set_metadata("device", "XDF")
        rec.set_metadata("stream_count", len(selected_streams))

        if sync_streams and len(selected_streams) > 1:
            self._load_synchronized_streams(
                rec, selected_streams, default_channel_type, include_timestamps, reference_stream
            )
        else:
            # Load streams (uses highest sample rate as reference unless specified)
            self._load_streams(
                rec, selected_streams, default_channel_type, include_timestamps, reference_stream
            )

        return rec

    def _build_select_streams(
        self,
        filepath: str,
        stream_names: list[str] | None,
        stream_types: list[str] | None,
        stream_ids: list[int] | None,
    ) -> list[dict] | list[int] | None:
        """
        Build the value for pyxdf's ``select_streams`` parameter for efficient loading.

        pyxdf accepts ``select_streams`` as either:
        - a list of integer stream IDs (e.g. ``[1, 2, 3]``), or
        - a list of dictionaries with selection criteria (e.g. ``[{"name": "EMG"}]``).

        This method converts our selection criteria into that native format, returning
        either a list of IDs or ``None`` (to load all streams). We resolve names and
        types to IDs using the memory-efficient metadata parser.
        """
        if stream_names is None and stream_types is None and stream_ids is None:
            return None  # Load all streams

        # If stream_ids are specified, use them directly (most efficient)
        if stream_ids is not None:
            return list(stream_ids)  # Return a copy to avoid mutation

        # For stream_names and stream_types, we need to resolve them to IDs
        # using the memory-efficient metadata parser
        if stream_names or stream_types:
            summary = summarize_xdf(filepath)
            # Use a set to avoid duplicate IDs if a stream matches both name and type
            matching_ids: set[int] = set()

            for stream in summary.streams:
                # Check name match (use separate if, not elif, to check all criteria)
                if stream_names and stream.name.lower() in [n.lower() for n in stream_names]:
                    matching_ids.add(stream.stream_id)
                # Check type match
                if stream_types and stream.stream_type.upper() in [t.upper() for t in stream_types]:
                    matching_ids.add(stream.stream_id)

            if matching_ids:
                return list(matching_ids)
            # Criteria were specified but no streams matched: return empty list
            # to avoid loading all streams (which would happen with None)
            return []

        return None

    def _check_memory_usage(
        self,
        filepath: str,
        stream_names: list[str] | None,
        stream_types: list[str] | None,
        stream_ids: list[int] | None,
        max_memory_gb: float,
    ) -> None:
        """
        Estimate memory usage and raise MemoryError if it exceeds the limit.

        Uses the memory-efficient summarize_xdf() to estimate data size.
        """
        # Data type sizes in bytes
        dtype_sizes = {
            "float32": 4,
            "double64": 8,
            "int8": 1,
            "int16": 2,
            "int32": 4,
            "int64": 8,
            "string": 50,  # Estimate for strings
        }

        summary = summarize_xdf(filepath)
        total_bytes = 0

        for stream in summary.streams:
            # Check if this stream would be selected (use separate ifs to match
            # _build_select_streams OR logic: include if ANY criterion matches)
            include = False
            if stream_names is None and stream_types is None and stream_ids is None:
                include = True
            if stream_names and stream.name.lower() in [n.lower() for n in stream_names]:
                include = True
            if stream_types and stream.stream_type.upper() in [t.upper() for t in stream_types]:
                include = True
            if stream_ids and stream.stream_id in stream_ids:
                include = True

            if include:
                dtype_size = dtype_sizes.get(stream.channel_format, 8)
                # Estimate: (samples * channels * dtype_size) + (samples * 8 for timestamps)
                stream_bytes = stream.sample_count * stream.channel_count * dtype_size
                stream_bytes += stream.sample_count * 8  # timestamps (float64)
                total_bytes += stream_bytes

        # Add overhead for pandas DataFrame and processing (~50% extra)
        estimated_gb = (total_bytes * 1.5) / (1024**3)

        if estimated_gb > max_memory_gb:
            raise MemoryError(
                f"Estimated memory usage ({estimated_gb:.2f} GB) exceeds limit "
                f"({max_memory_gb:.2f} GB). Consider:\n"
                f"  - Loading fewer streams using stream_names, stream_types, or stream_ids\n"
                f"  - Using summarize_xdf() to identify which streams you need\n"
                f"  - Processing the file in chunks"
            )

    def _select_streams(
        self,
        data: list[dict],
        stream_names: list[str] | None,
        stream_types: list[str] | None,
        stream_ids: list[int] | None,
    ) -> list[dict]:
        """Select streams based on criteria."""
        if stream_names is None and stream_types is None and stream_ids is None:
            return []  # Return empty to trigger "all streams" behavior

        selected = []
        for stream in data:
            info = stream["info"]
            name = info["name"][0] if "name" in info else ""
            stype = info["type"][0] if "type" in info else ""
            sid = info.get("stream_id", 0)

            # Check name match (case-insensitive)
            if stream_names and any(name.lower() == n.lower() for n in stream_names):
                selected.append(stream)
                continue

            # Check type match (case-insensitive)
            if stream_types and any(stype.upper() == t.upper() for t in stream_types):
                selected.append(stream)
                continue

            # Check ID match
            if stream_ids and sid in stream_ids:
                selected.append(stream)
                continue

        return selected

    def _load_streams(
        self,
        rec: Recording,
        streams: list[dict],
        default_channel_type: str,
        include_timestamps: bool = False,
        reference_stream: str | None = None,
    ) -> None:
        """Load streams and resample to a common time base.

        By default, uses the stream with the highest sampling rate as the
        reference to avoid data loss from downsampling. A specific reference
        stream can be specified by name.
        """
        # First pass: collect stream info and find reference stream
        stream_info_list: list[dict[str, Any]] = []
        for stream in streams:
            # pyxdf returns deeply dynamic dict values; cast at this boundary.
            info = cast(dict[str, Any], stream["info"])
            stream_name = info["name"][0] if "name" in info else "Unknown"
            time_series = stream["time_series"]
            timestamps = stream["time_stamps"]

            # Skip non-numpy arrays (e.g., marker streams are lists) or non-numeric data
            if not isinstance(time_series, np.ndarray):
                continue
            if time_series.dtype.kind not in "iufc" or len(time_series) == 0:
                continue

            # Get sampling rate
            srate = stream.get("effective_srate")
            if not srate:
                srate = float(info["nominal_srate"][0]) if "nominal_srate" in info else 0.0

            stream_info_list.append(
                {
                    "stream": stream,
                    "name": stream_name,
                    "info": info,
                    "time_series": time_series,
                    "timestamps": timestamps,
                    "srate": srate,
                }
            )

        if not stream_info_list:
            raise ValueError("No valid data found in selected streams")

        # Determine reference stream: user-specified, or highest sample rate
        ref_stream_info = None
        if reference_stream:
            # Find the user-specified reference stream
            for si in stream_info_list:
                if si["name"].lower() == reference_stream.lower():
                    ref_stream_info = si
                    break
            if ref_stream_info is None:
                raise ValueError(
                    f"Reference stream '{reference_stream}' not found in selected streams. "
                    f"Available: {[si['name'] for si in stream_info_list]}"
                )
        else:
            # Use stream with highest sampling rate (avoids downsampling data loss)
            ref_stream_info = max(stream_info_list, key=lambda x: x["srate"] or 0)

        base_srate = ref_stream_info["srate"]
        base_timestamps = cast(np.ndarray, ref_stream_info["timestamps"])

        # Second pass: collect all channel data
        all_data: dict[str, dict[str, Any]] = {}
        stream_timestamp_data: dict[str, dict[str, Any]] = {}  # Store timestamp data per stream

        for si in stream_info_list:
            stream_name = si["name"]
            time_series = cast(np.ndarray, si["time_series"])
            timestamps = cast(np.ndarray, si["timestamps"])
            srate = si["srate"]
            info = cast(dict[str, Any], si["info"])

            # Store timestamp data for this stream if requested
            if include_timestamps:
                stream_timestamp_data[stream_name] = {
                    "timestamps": timestamps,
                    "srate": srate,
                }

            # Get channel info
            channel_labels, channel_types, channel_units = self._extract_channel_info(
                info, time_series.shape[1] if time_series.ndim > 1 else 1, stream_name
            )

            # Handle 1D data (single channel)
            if time_series.ndim == 1:
                time_series = time_series.reshape(-1, 1)

            # Add channels
            for i, label in enumerate(channel_labels):
                if i < time_series.shape[1]:
                    # Make label unique if needed
                    unique_label = label
                    counter = 1
                    while unique_label in all_data:
                        unique_label = f"{label}_{counter}"
                        counter += 1

                    all_data[unique_label] = {
                        "data": time_series[:, i],
                        "timestamps": timestamps,
                        "srate": srate,
                        "unit": channel_units[i] if i < len(channel_units) else "a.u.",
                        "type": channel_types[i]
                        if i < len(channel_types) and channel_types[i]
                        else default_channel_type,
                    }

        # Create time index from reference stream
        # Convert to relative time starting from 0
        if base_timestamps is not None and len(base_timestamps) > 0:
            time_index = base_timestamps - base_timestamps[0]
        else:
            # Fallback: create time index from sample count and rate
            n_samples = len(ref_stream_info["time_series"])
            if base_srate and base_srate > 0:
                time_index = np.arange(n_samples) / base_srate
            else:
                # If no valid sample rate, use sample indices as time
                time_index = np.arange(n_samples, dtype=float)

        # Create DataFrame
        df = pd.DataFrame(index=time_index)

        for label, ch_info in all_data.items():
            # Resample if needed (different stream lengths)
            ch_data = ch_info["data"]
            ch_timestamps = ch_info["timestamps"]
            if len(ch_data) != len(time_index):
                if len(ch_timestamps) > 0:
                    # Interpolate to match base timestamps
                    relative_ch_ts = ch_timestamps - ch_timestamps[0]
                    ch_data = np.interp(time_index, relative_ch_ts, ch_data)
                else:
                    # No timestamps available to resample mismatched data
                    raise ValueError(
                        f"Length mismatch for channel '{label}': "
                        f"{len(ch_data)} samples vs {len(time_index)} time points, "
                        "and no timestamps available for interpolation."
                    )

            df[label] = ch_data

            # Validate the stream-derived type against the modality vocabulary;
            # arbitrary LSL type strings (e.g. "Markers", "Kinematics") map to
            # OTHER rather than being stored unvalidated (which would later fail
            # at export) or silently assumed to be EMG.
            try:
                ch_type = validate_channel_type(ch_info["type"])
            except ValueError:
                ch_type = "OTHER"
            rec.channels[label] = {
                "sample_frequency": ch_info["srate"] if ch_info["srate"] else base_srate,
                "physical_dimension": ch_info["unit"],
                "prefilter": "n/a",
                "channel_type": ch_type,
                "modality": infer_modality_from_channel_type(ch_type),
            }

        # Add timestamp channels if requested
        if include_timestamps and stream_timestamp_data:
            for stream_name, ts_info in stream_timestamp_data.items():
                ts_label = f"{stream_name}_LSL_timestamps"
                original_timestamps = ts_info["timestamps"]

                # Resample timestamps to match the common time index
                if len(original_timestamps) == 0:
                    # No timestamps available; create a NaN-filled array
                    resampled_ts = np.full(len(time_index), np.nan, dtype=float)
                elif len(original_timestamps) != len(time_index):
                    relative_ts = original_timestamps - original_timestamps[0]
                    resampled_ts = np.interp(time_index, relative_ts, original_timestamps)
                else:
                    resampled_ts = original_timestamps

                df[ts_label] = resampled_ts

                rec.channels[ts_label] = {
                    "sample_frequency": ts_info["srate"] if ts_info["srate"] else base_srate,
                    "physical_dimension": "s",  # seconds
                    "prefilter": "n/a",
                    "channel_type": "MISC",  # Miscellaneous channel type
                    "modality": "MISC",
                }

        rec.signals = df
        rec.set_metadata("srate", base_srate)

    def _load_synchronized_streams(
        self,
        rec: Recording,
        streams: list[dict],
        default_channel_type: str,
        include_timestamps: bool = False,
        reference_stream: str | None = None,
    ) -> None:
        """Load streams with timestamp synchronization.

        This method is an intentional wrapper around _load_streams, serving as
        a dedicated extension point for future synchronization enhancements.
        Currently, pyxdf handles clock synchronization during file loading,
        so this delegates to _load_streams without additional processing.
        """
        self._load_streams(rec, streams, default_channel_type, include_timestamps, reference_stream)

    def _extract_channel_info(
        self,
        info: dict,
        n_channels: int,
        stream_name: str,
    ) -> tuple:
        """Extract channel labels, types, and units from stream info.

        This method safely extracts channel metadata from XDF stream info,
        handling malformed or missing metadata gracefully.
        """
        channel_labels = []
        channel_types = []
        channel_units = []

        try:
            if "desc" in info and info["desc"] and info["desc"][0]:
                desc = info["desc"][0]
                if isinstance(desc, dict) and "channels" in desc and desc["channels"]:
                    channels_info = desc["channels"][0]
                    if isinstance(channels_info, dict) and "channel" in channels_info:
                        for ch in channels_info["channel"]:
                            if isinstance(ch, dict):
                                # Safely extract label
                                label = ""
                                if "label" in ch:
                                    label_val = ch.get("label", [""])
                                    if isinstance(label_val, list) and label_val:
                                        label = str(label_val[0]) if label_val[0] else ""
                                    elif isinstance(label_val, str):
                                        label = label_val

                                # Safely extract type
                                ch_type = ""
                                if "type" in ch:
                                    type_val = ch.get("type", [""])
                                    if isinstance(type_val, list) and type_val:
                                        ch_type = str(type_val[0]) if type_val[0] else ""
                                    elif isinstance(type_val, str):
                                        ch_type = type_val

                                # Safely extract unit
                                unit = ""
                                if "unit" in ch:
                                    unit_val = ch.get("unit", [""])
                                    if isinstance(unit_val, list) and unit_val:
                                        unit = str(unit_val[0]) if unit_val[0] else ""
                                    elif isinstance(unit_val, str):
                                        unit = unit_val

                                channel_labels.append(
                                    label if label else f"{stream_name}_Ch{len(channel_labels) + 1}"
                                )
                                # Infer type from label if not explicitly provided
                                if not ch_type and label:
                                    ch_type = _determine_channel_type_from_label(label)
                                channel_types.append(ch_type)
                                # Default to a.u. (arbitrary units); specific units like uV
                                # should be provided in stream metadata
                                channel_units.append(unit if unit else "a.u.")
        except (KeyError, IndexError, TypeError, AttributeError):
            # If metadata parsing fails, we'll fall back to default labels below
            pass

        # Fill in missing labels
        while len(channel_labels) < n_channels:
            channel_labels.append(f"{stream_name}_Ch{len(channel_labels) + 1}")
            channel_types.append("")
            channel_units.append("a.u.")

        return channel_labels, channel_types, channel_units

`load(filepath, stream_names=None, stream_types=None, stream_ids=None, sync_streams=True, default_channel_type='OTHER', include_timestamps=False, reference_stream=None, max_memory_gb=None)` ¶

Load EMG data from an XDF file.

Streams can be selected by name, type, or ID. If multiple selection criteria are provided, streams matching ANY criterion are included. If no selection criteria are provided, all streams with numeric data are loaded.

Memory Optimization: - Stream selection is passed directly to pyxdf, so only requested streams are loaded into memory. This significantly reduces RAM usage for large files with multiple streams. - Use summarize_xdf() first to explore file contents without loading data. - The max_memory_gb parameter can warn or raise if estimated memory usage exceeds the limit.

Args: filepath: Path to the XDF file stream_names: List of stream names to import (case-insensitive) stream_types: List of stream types to import (e.g., ["EMG", "EXG"]) stream_ids: List of stream IDs to import sync_streams: If True, synchronize streams to common timestamps. If False, streams are loaded without synchronization. default_channel_type: Default channel type for channels without explicit type info (default: "OTHER"; no silent EMG assumption) include_timestamps: If True, add a timestamp channel for each stream named "{stream_name}_LSL_timestamps" containing the original LSL timestamps. Useful for preserving timing information when exporting to formats like EDF that require regular sampling. reference_stream: Optional stream name to use as the time base reference. If not specified, the stream with the highest sampling rate is used (recommended to avoid data loss from downsampling). max_memory_gb: Optional maximum memory usage in GB. If specified, raises MemoryError if estimated memory exceeds this limit. Use summarize_xdf() to estimate memory needs.

Returns: Recording: Recording object containing the loaded data

Raises: ValueError: If no matching streams found or file cannot be read ImportError: If pyxdf is not installed MemoryError: If estimated memory exceeds max_memory_gb

Source code in biosigio/importers/xdf.py

def load(
    self,
    filepath: str,
    stream_names: list[str] | None = None,
    stream_types: list[str] | None = None,
    stream_ids: list[int] | None = None,
    sync_streams: bool = True,
    default_channel_type: str = "OTHER",
    include_timestamps: bool = False,
    reference_stream: str | None = None,
    max_memory_gb: float | None = None,
) -> Recording:
    """
    Load EMG data from an XDF file.

    Streams can be selected by name, type, or ID. If multiple selection
    criteria are provided, streams matching ANY criterion are included.
    If no selection criteria are provided, all streams with numeric data
    are loaded.

    Memory Optimization:
        - Stream selection is passed directly to pyxdf, so only requested
          streams are loaded into memory. This significantly reduces RAM
          usage for large files with multiple streams.
        - Use summarize_xdf() first to explore file contents without loading data.
        - The max_memory_gb parameter can warn or raise if estimated memory
          usage exceeds the limit.

    Args:
        filepath: Path to the XDF file
        stream_names: List of stream names to import (case-insensitive)
        stream_types: List of stream types to import (e.g., ["EMG", "EXG"])
        stream_ids: List of stream IDs to import
        sync_streams: If True, synchronize streams to common timestamps.
                     If False, streams are loaded without synchronization.
        default_channel_type: Default channel type for channels without
                             explicit type info (default: "OTHER"; no silent
                             EMG assumption)
        include_timestamps: If True, add a timestamp channel for each stream
                           named "{stream_name}_LSL_timestamps" containing
                           the original LSL timestamps. Useful for preserving
                           timing information when exporting to formats like
                           EDF that require regular sampling.
        reference_stream: Optional stream name to use as the time base
                         reference. If not specified, the stream with the
                         highest sampling rate is used (recommended to
                         avoid data loss from downsampling).
        max_memory_gb: Optional maximum memory usage in GB. If specified,
                      raises MemoryError if estimated memory exceeds this
                      limit. Use summarize_xdf() to estimate memory needs.

    Returns:
        Recording: Recording object containing the loaded data

    Raises:
        ValueError: If no matching streams found or file cannot be read
        ImportError: If pyxdf is not installed
        MemoryError: If estimated memory exceeds max_memory_gb
    """
    try:
        import pyxdf
    except ImportError as e:
        raise ImportError(
            "pyxdf is required for XDF file support. Install it with: pip install pyxdf"
        ) from e

    filepath = str(filepath)

    # Check memory usage before loading if max_memory_gb is specified
    if max_memory_gb is not None:
        self._check_memory_usage(
            filepath, stream_names, stream_types, stream_ids, max_memory_gb
        )

    # Build pyxdf select_streams parameter for efficient loading
    select_streams = self._build_select_streams(
        filepath, stream_names, stream_types, stream_ids
    )

    # Load only selected streams (pyxdf handles the filtering at load time)
    data, header = pyxdf.load_xdf(filepath, select_streams=select_streams)

    if not data:
        raise ValueError(f"No streams found in XDF file: {filepath}")

    # Filter streams based on selection criteria (for additional filtering)
    selected_streams = self._select_streams(data, stream_names, stream_types, stream_ids)

    if not selected_streams:
        # If no criteria specified, select all streams with numeric data
        if stream_names is None and stream_types is None and stream_ids is None:
            selected_streams = [
                s
                for s in data
                if isinstance(s["time_series"], np.ndarray)
                and s["time_series"].dtype.kind in "iufc"
            ]
        if not selected_streams:
            raise ValueError(
                "No matching streams found. Use summarize_xdf() to explore the file."
            )

    # Create Recording object
    rec = Recording()

    # Store metadata
    rec.set_metadata("source_file", filepath)
    rec.set_metadata("device", "XDF")
    rec.set_metadata("stream_count", len(selected_streams))

    if sync_streams and len(selected_streams) > 1:
        self._load_synchronized_streams(
            rec, selected_streams, default_channel_type, include_timestamps, reference_stream
        )
    else:
        # Load streams (uses highest sample rate as reference unless specified)
        self._load_streams(
            rec, selected_streams, default_channel_type, include_timestamps, reference_stream
        )

    return rec

`XDFStreamInfo` `dataclass` ¶

Information about a single XDF stream.

Source code in biosigio/importers/xdf.py

@dataclass
class XDFStreamInfo:
    """Information about a single XDF stream."""

    stream_id: int
    name: str
    stream_type: str
    channel_count: int
    nominal_srate: float
    effective_srate: float | None
    channel_format: str
    source_id: str
    hostname: str
    sample_count: int
    duration_seconds: float
    channel_labels: list[str]
    channel_types: list[str]
    channel_units: list[str]

    def __str__(self) -> str:
        """Human-readable string representation."""
        lines = [
            f"Stream {self.stream_id}: {self.name}",
            f"  Type: {self.stream_type}",
            f"  Channels: {self.channel_count}",
            f"  Nominal srate: {self.nominal_srate} Hz",
        ]
        if self.effective_srate:
            lines.append(f"  Effective srate: {self.effective_srate:.2f} Hz")
        lines.extend(
            [
                f"  Samples: {self.sample_count}",
                f"  Duration: {self.duration_seconds:.2f} s",
                f"  Format: {self.channel_format}",
            ]
        )
        if self.channel_labels:
            labels_preview = ", ".join(self.channel_labels[:5])
            if len(self.channel_labels) > 5:
                labels_preview += f", ... (+{len(self.channel_labels) - 5} more)"
            lines.append(f"  Channel labels: {labels_preview}")
        return "\n".join(lines)

`str()` ¶

Human-readable string representation.

Source code in biosigio/importers/xdf.py

def __str__(self) -> str:
    """Human-readable string representation."""
    lines = [
        f"Stream {self.stream_id}: {self.name}",
        f"  Type: {self.stream_type}",
        f"  Channels: {self.channel_count}",
        f"  Nominal srate: {self.nominal_srate} Hz",
    ]
    if self.effective_srate:
        lines.append(f"  Effective srate: {self.effective_srate:.2f} Hz")
    lines.extend(
        [
            f"  Samples: {self.sample_count}",
            f"  Duration: {self.duration_seconds:.2f} s",
            f"  Format: {self.channel_format}",
        ]
    )
    if self.channel_labels:
        labels_preview = ", ".join(self.channel_labels[:5])
        if len(self.channel_labels) > 5:
            labels_preview += f", ... (+{len(self.channel_labels) - 5} more)"
        lines.append(f"  Channel labels: {labels_preview}")
    return "\n".join(lines)

`XDFSummary` `dataclass` ¶

Summary of an XDF file's contents.

Source code in biosigio/importers/xdf.py

@dataclass
class XDFSummary:
    """Summary of an XDF file's contents."""

    filepath: str
    streams: list[XDFStreamInfo]
    header_info: dict[str, Any]

    def __str__(self) -> str:
        """Human-readable string representation."""
        lines = [
            f"XDF File: {self.filepath}",
            f"Number of streams: {len(self.streams)}",
            "",
        ]
        for stream in self.streams:
            lines.append(str(stream))
            lines.append("")
        return "\n".join(lines)

    def get_streams_by_type(self, stream_type: str) -> list[XDFStreamInfo]:
        """Get all streams of a specific type (case-insensitive)."""
        return [s for s in self.streams if s.stream_type.upper() == stream_type.upper()]

    def get_stream_by_name(self, name: str) -> XDFStreamInfo | None:
        """Get a stream by name (case-insensitive)."""
        for stream in self.streams:
            if stream.name.lower() == name.lower():
                return stream
        return None

    def get_stream_by_id(self, stream_id: int) -> XDFStreamInfo | None:
        """Get a stream by its ID."""
        for stream in self.streams:
            if stream.stream_id == stream_id:
                return stream
        return None

`str()` ¶

Human-readable string representation.

Source code in biosigio/importers/xdf.py

def __str__(self) -> str:
    """Human-readable string representation."""
    lines = [
        f"XDF File: {self.filepath}",
        f"Number of streams: {len(self.streams)}",
        "",
    ]
    for stream in self.streams:
        lines.append(str(stream))
        lines.append("")
    return "\n".join(lines)

`get_stream_by_id(stream_id)` ¶

Get a stream by its ID.

Source code in biosigio/importers/xdf.py

def get_stream_by_id(self, stream_id: int) -> XDFStreamInfo | None:
    """Get a stream by its ID."""
    for stream in self.streams:
        if stream.stream_id == stream_id:
            return stream
    return None

`get_stream_by_name(name)` ¶

Get a stream by name (case-insensitive).

Source code in biosigio/importers/xdf.py

def get_stream_by_name(self, name: str) -> XDFStreamInfo | None:
    """Get a stream by name (case-insensitive)."""
    for stream in self.streams:
        if stream.name.lower() == name.lower():
            return stream
    return None

`get_streams_by_type(stream_type)` ¶

Get all streams of a specific type (case-insensitive).

Source code in biosigio/importers/xdf.py

def get_streams_by_type(self, stream_type: str) -> list[XDFStreamInfo]:
    """Get all streams of a specific type (case-insensitive)."""
    return [s for s in self.streams if s.stream_type.upper() == stream_type.upper()]

`_determine_channel_type_from_label(label)` ¶

Determine channel type based on label naming conventions.

Source code in biosigio/importers/xdf.py

def _determine_channel_type_from_label(label: str) -> str:
    """Determine channel type based on label naming conventions."""
    label_upper = label.upper()

    if "EMG" in label_upper or "MUS" in label_upper:
        return "EMG"
    elif "ACC" in label_upper:
        return "ACC"
    elif "GYRO" in label_upper:
        return "GYRO"
    elif "EEG" in label_upper or label_upper in [
        "FP1",
        "FP2",
        "F3",
        "F4",
        "C3",
        "C4",
        "P3",
        "P4",
        "O1",
        "O2",
        "F7",
        "F8",
        "T3",
        "T4",
        "T5",
        "T6",
        "FZ",
        "CZ",
        "PZ",
        "OZ",
    ]:
        return "EEG"
    elif "ECG" in label_upper or "EKG" in label_upper:
        return "ECG"
    elif "EOG" in label_upper:
        return "EOG"
    elif "TRIG" in label_upper or "MARKER" in label_upper or "EVENT" in label_upper:
        return "TRIG"

    return ""

`_parse_xdf_metadata_only(filepath)` ¶

Parse XDF file metadata without loading signal data.

This function reads only the structural chunks (FileHeader, StreamHeader, StreamFooter) and skips over Samples chunks entirely, resulting in minimal memory usage even for large files.

Args: filepath: Path to the XDF file

Returns: Tuple of (streams_data, header_info) where: - streams_data: dict mapping stream_id to {"header": {...}, "footer": {...}} - header_info: dict with file-level header information

Note: Memory estimates for string-type streams (markers) are approximate since string lengths vary. The estimate uses 50 bytes per sample as a rough average.

Source code in biosigio/importers/xdf.py

def _parse_xdf_metadata_only(filepath: str) -> tuple[dict, dict]:
    """
    Parse XDF file metadata without loading signal data.

    This function reads only the structural chunks (FileHeader, StreamHeader,
    StreamFooter) and skips over Samples chunks entirely, resulting in minimal
    memory usage even for large files.

    Args:
        filepath: Path to the XDF file

    Returns:
        Tuple of (streams_data, header_info) where:
        - streams_data: dict mapping stream_id to {"header": {...}, "footer": {...}}
        - header_info: dict with file-level header information

    Note:
        Memory estimates for string-type streams (markers) are approximate since
        string lengths vary. The estimate uses 50 bytes per sample as a rough average.
    """
    import gzip
    import struct
    from xml.etree.ElementTree import fromstring

    def read_varlen_int(f):
        """Read a variable-length integer from the file.

        Raises EOFError if the file is truncated.
        """
        length_indicator = f.read(1)
        if not length_indicator:
            raise EOFError("Unexpected end of file while reading variable-length integer.")
        nbytes = length_indicator[0]
        if nbytes == 1:
            data = f.read(1)
            if len(data) != 1:
                raise EOFError("Unexpected end of file while reading 1-byte integer value.")
            return data[0]
        elif nbytes == 4:
            data = f.read(4)
            if len(data) != 4:
                raise EOFError("Unexpected end of file while reading 4-byte integer value.")
            return struct.unpack("<I", data)[0]
        elif nbytes == 8:
            data = f.read(8)
            if len(data) != 8:
                raise EOFError("Unexpected end of file while reading 8-byte integer value.")
            return struct.unpack("<Q", data)[0]
        else:
            raise ValueError(f"Invalid variable-length integer indicator: {nbytes}")

    def xml_to_dict(element):
        """Convert XML element to a dict, similar to pyxdf's _xml2dict."""
        result = {}
        for child in element:
            if len(child) == 0:
                result[child.tag] = child.text
            else:
                child_dict = xml_to_dict(child)
                if child.tag in result:
                    # If tag already exists, convert to list
                    if not isinstance(result[child.tag], list):
                        result[child.tag] = [result[child.tag]]
                    result[child.tag].append(child_dict)
                else:
                    result[child.tag] = child_dict
        return result

    def parse_stream_header_xml(xml_string: str) -> dict:
        """Parse StreamHeader XML into a dict with all metadata including desc."""
        root = fromstring(xml_string)
        result = {}
        for child in root:
            if child.tag == "desc":
                # Parse desc fully for channel info
                result["desc"] = xml_to_dict(child)
            elif len(child) == 0:
                result[child.tag] = child.text
            else:
                result[child.tag] = xml_to_dict(child)
        return result

    def parse_footer_xml(xml_string: str) -> dict:
        """Parse StreamFooter XML into a dict."""
        root = fromstring(xml_string)
        result = {}
        for child in root:
            if len(child) == 0:
                result[child.tag] = child.text
            else:
                result[child.tag] = xml_to_dict(child)
        return result

    streams_data: dict = {}
    header_info: dict = {}

    # Initialize file handle to None for safe cleanup
    f = None
    try:
        # Open file (handle both .xdf and .xdfz compressed files)
        if filepath.endswith(".xdfz") or filepath.endswith(".xdf.gz"):
            f = gzip.open(filepath, "rb")
        else:
            f = open(filepath, "rb")

        # Read and verify magic bytes
        magic = f.read(4)
        if magic != b"XDF:":
            raise ValueError(f"Invalid XDF file: expected 'XDF:' magic bytes, got {magic!r}")

        # Process chunks
        while True:
            try:
                chunk_len = read_varlen_int(f)
            except EOFError:
                break

            tag_bytes = f.read(2)
            if len(tag_bytes) < 2:
                break
            tag = struct.unpack("<H", tag_bytes)[0]

            if tag == 1:
                # FileHeader chunk
                xml_bytes = f.read(chunk_len - 2)
                xml_string = xml_bytes.decode("utf-8", errors="replace")
                try:
                    root = fromstring(xml_string)
                    header_info = xml_to_dict(root)
                except Exception as e:
                    # If FileHeader XML is malformed, continue without file-level metadata
                    logger.warning("Failed to parse XDF FileHeader: %s", e)

            elif tag == 2:
                # StreamHeader chunk - parse fully including desc
                stream_id = struct.unpack("<I", f.read(4))[0]
                xml_bytes = f.read(chunk_len - 6)
                xml_string = xml_bytes.decode("utf-8", errors="replace")
                try:
                    header = parse_stream_header_xml(xml_string)
                    if stream_id not in streams_data:
                        streams_data[stream_id] = {"header": {}, "footer": {}}
                    streams_data[stream_id]["header"] = header
                except Exception as e:
                    # If StreamHeader XML is malformed, record the stream with empty header
                    # rather than failing the entire import
                    logger.warning(
                        "Failed to parse StreamHeader for stream %d: %s. "
                        "Channel metadata may be missing.",
                        stream_id,
                        e,
                    )
                    if stream_id not in streams_data:
                        streams_data[stream_id] = {"header": {}, "footer": {}}

            elif tag == 6:
                # StreamFooter chunk - contains sample_count, timestamps, etc.
                stream_id = struct.unpack("<I", f.read(4))[0]
                xml_bytes = f.read(chunk_len - 6)
                xml_string = xml_bytes.decode("utf-8", errors="replace")
                try:
                    footer = parse_footer_xml(xml_string)
                    if stream_id not in streams_data:
                        streams_data[stream_id] = {"header": {}, "footer": {}}
                    streams_data[stream_id]["footer"] = footer
                except Exception as e:
                    # If footer XML is malformed, ignore it and leave this stream
                    # without footer metadata rather than failing the entire import
                    logger.warning(
                        "Failed to parse StreamFooter for stream %d: %s. "
                        "Sample count and duration may be unavailable.",
                        stream_id,
                        e,
                    )

            elif tag in (3, 4, 5):
                # Samples (3), ClockOffset (4), Boundary (5) - skip entirely
                # This is the key optimization: we don't load any signal data
                f.seek(chunk_len - 2, 1)  # Seek relative to current position

            else:
                # Unknown chunk type - skip
                f.seek(chunk_len - 2, 1)

    finally:
        if f is not None:
            f.close()

    return streams_data, header_info

`infer_modality_from_channel_type(channel_type)` ¶

Derive the coarse modality for a channel type.

The mapping is deterministic: neural types map to EEG/IEEG/MEG, EMG maps to EMG, and every other valid type (ECG, EOG, ACC, TRIG, ...) maps to MISC.

Args: channel_type: A channel type string (case-insensitive); validated first.

Returns: One of :data:VALID_MODALITIES.

Raises: ValueError: If channel_type is not a known type.

Source code in biosigio/core/modality.py

def infer_modality_from_channel_type(channel_type: str) -> str:
    """Derive the coarse modality for a channel type.

    The mapping is deterministic: neural types map to EEG/IEEG/MEG, ``EMG`` maps
    to EMG, and every other valid type (ECG, EOG, ACC, TRIG, ...) maps to MISC.

    Args:
        channel_type: A channel type string (case-insensitive); validated first.

    Returns:
        One of :data:`VALID_MODALITIES`.

    Raises:
        ValueError: If ``channel_type`` is not a known type.
    """
    ct = validate_channel_type(channel_type)
    return _CHANNEL_TYPE_TO_MODALITY.get(ct, "MISC")

`summarize_xdf(filepath)` ¶

Summarize the contents of an XDF file without loading signal data.

This function parses XDF chunk headers and metadata only, skipping actual signal data. This enables memory-efficient exploration of large XDF files (even multi-GB files) with minimal RAM usage.

The function extracts metadata from StreamHeader and StreamFooter chunks, which contain all necessary information about streams without requiring the actual time series data to be loaded.

Args: filepath: Path to the XDF file

Returns: XDFSummary: Object containing information about all streams in the file

Example: >>> summary = summarize_xdf("recording.xdf") >>> print(summary) >>> # Find EMG streams >>> emg_streams = summary.get_streams_by_type("EMG")

Source code in biosigio/importers/xdf.py

def summarize_xdf(filepath: str | Path) -> XDFSummary:
    """
    Summarize the contents of an XDF file without loading signal data.

    This function parses XDF chunk headers and metadata only, skipping actual
    signal data. This enables memory-efficient exploration of large XDF files
    (even multi-GB files) with minimal RAM usage.

    The function extracts metadata from StreamHeader and StreamFooter chunks,
    which contain all necessary information about streams without requiring
    the actual time series data to be loaded.

    Args:
        filepath: Path to the XDF file

    Returns:
        XDFSummary: Object containing information about all streams in the file

    Example:
        >>> summary = summarize_xdf("recording.xdf")
        >>> print(summary)
        >>> # Find EMG streams
        >>> emg_streams = summary.get_streams_by_type("EMG")
    """
    filepath = str(filepath)
    streams_data, header_info = _parse_xdf_metadata_only(filepath)

    streams = []
    for stream_id, stream_data in streams_data.items():
        header = stream_data.get("header", {})
        footer = stream_data.get("footer", {})

        # Extract basic info from header
        name = header.get("name", "Unknown")
        stream_type = header.get("type", "Unknown")
        channel_count = int(header.get("channel_count", 0))
        nominal_srate = float(header.get("nominal_srate", 0.0))
        channel_format = header.get("channel_format", "unknown")
        source_id = header.get("source_id", "")
        hostname = header.get("hostname", "")

        # Get sample count and timing from footer (if available)
        sample_count = int(footer.get("sample_count", 0))
        first_timestamp = footer.get("first_timestamp")
        last_timestamp = footer.get("last_timestamp")
        measured_srate = footer.get("measured_srate")

        # Calculate effective sample rate
        effective_srate = None
        if measured_srate is not None:
            effective_srate = float(measured_srate)
        elif sample_count > 0 and first_timestamp is not None and last_timestamp is not None:
            duration = float(last_timestamp) - float(first_timestamp)
            if duration > 0:
                effective_srate = sample_count / duration

        # Calculate duration
        if first_timestamp is not None and last_timestamp is not None:
            duration_seconds = float(last_timestamp) - float(first_timestamp)
        elif effective_srate and effective_srate > 0 and sample_count > 0:
            duration_seconds = sample_count / effective_srate
        elif nominal_srate > 0 and sample_count > 0:
            duration_seconds = sample_count / nominal_srate
        else:
            duration_seconds = 0.0

        # Extract channel info from desc
        channel_labels = []
        channel_types = []
        channel_units = []

        desc = header.get("desc", {})
        if isinstance(desc, dict) and "channels" in desc:
            channels_info = desc.get("channels", {})
            if isinstance(channels_info, dict) and "channel" in channels_info:
                channel_list = channels_info["channel"]
                # Handle single channel case (not a list)
                if isinstance(channel_list, dict):
                    channel_list = [channel_list]
                for ch in channel_list:
                    if isinstance(ch, dict):
                        label = ch.get("label", "")
                        ch_type = ch.get("type", "")
                        unit = ch.get("unit", "")
                        channel_labels.append(label)
                        channel_types.append(ch_type)
                        channel_units.append(unit)

        # If no channel info in desc, create default labels
        if not channel_labels:
            channel_labels = [f"Ch{i + 1}" for i in range(channel_count)]
            channel_types = [""] * channel_count
            channel_units = [""] * channel_count

        stream_info = XDFStreamInfo(
            stream_id=stream_id,
            name=name,
            stream_type=stream_type,
            channel_count=channel_count,
            nominal_srate=nominal_srate,
            effective_srate=effective_srate,
            channel_format=channel_format,
            source_id=source_id,
            hostname=hostname,
            sample_count=sample_count,
            duration_seconds=duration_seconds,
            channel_labels=channel_labels,
            channel_types=channel_types,
            channel_units=channel_units,
        )
        streams.append(stream_info)

    return XDFSummary(filepath=filepath, streams=streams, header_info=header_info)

`validate_channel_type(channel_type)` ¶

Normalize and validate a channel type against :data:VALID_CHANNEL_TYPES.

Args: channel_type: A channel type string (case-insensitive).

Returns: The canonical uppercase channel type.

Raises: ValueError: If channel_type is empty, n/a, or not a known type.

Source code in biosigio/core/modality.py

def validate_channel_type(channel_type: str) -> str:
    """Normalize and validate a channel type against :data:`VALID_CHANNEL_TYPES`.

    Args:
        channel_type: A channel type string (case-insensitive).

    Returns:
        The canonical uppercase channel type.

    Raises:
        ValueError: If ``channel_type`` is empty, ``n/a``, or not a known type.
    """
    ct = channel_type.strip().upper()
    if ct in ("", "N/A", "NA"):
        raise ValueError(
            "channel_type 'n/a' is not allowed; use 'OTHER' or 'MISC' for an unknown type."
        )
    if ct not in VALID_CHANNEL_TYPES:
        raise ValueError(
            f"Unknown channel_type {channel_type!r}. Valid channel types: "
            f"{sorted(VALID_CHANNEL_TYPES)}"
        )
    return ct

Usage Examples¶

Basic Loading¶

from biosigio import Recording
from biosigio.importers.xdf import XDFImporter

# Method 1: Using Recording.from_file (recommended)
rec = Recording.from_file('recording.xdf')

# Method 2: Using the importer directly
importer = XDFImporter()
rec = importer.load('recording.xdf')

Exploring File Contents¶

Before loading, explore what streams are available:

from biosigio.importers.xdf import summarize_xdf

summary = summarize_xdf('recording.xdf')
print(summary)

# Output example:
# XDF File: recording.xdf
# ----------------------------------------
# Stream 1: MyEEG (EEG)
#   Channels: 8, Rate: 256.0 Hz
#   Samples: 15360, Duration: 60.0s
# Stream 2: MyEMG (EMG)
#   Channels: 2, Rate: 2048.0 Hz
#   Samples: 122880, Duration: 60.0s
# Stream 3: Markers (Markers)
#   Channels: 1, Rate: 0.0 Hz (irregular)
#   Samples: 10

Selective Stream Loading¶

# Load only specific stream types
rec = Recording.from_file('recording.xdf', stream_types=['EMG'])

# Load multiple types
rec = Recording.from_file('recording.xdf', stream_types=['EMG', 'EEG'])

# Load by stream name
rec = Recording.from_file('recording.xdf', stream_names=['MyEMGDevice'])

# Load by stream ID
rec = Recording.from_file('recording.xdf', stream_ids=[2])

Setting Default Channel Type¶

# For streams without explicit channel type metadata
rec = Recording.from_file('recording.xdf', default_channel_type='EMG')

Preserving LSL Timestamps¶

# Include original LSL timestamps as additional channels
rec = Recording.from_file('recording.xdf', include_timestamps=True)

# Each stream gets a "{stream_name}_LSL_timestamps" channel
# Useful for synchronization with other LSL-recorded data

File Format Support¶

The XDF importer supports:

Single-stream and multi-stream XDF files
Compressed XDF files (.xdfz)
Numeric data types: float32, float64, int8, int16, int32, int64
Different sampling rates across streams (with resampling)
Channel labels from stream descriptors

Stream Selection Parameters¶

Parameter	Type	Description
`stream_names`	`list[str]`	Filter by stream names (case-insensitive)
`stream_types`	`list[str]`	Filter by stream types (e.g., "EMG", "EEG")
`stream_ids`	`list[int]`	Filter by stream IDs
`default_channel_type`	`str`	Default type for channels without explicit type
`include_timestamps`	`bool`	If True, add LSL timestamp channels for each stream

Return Values¶

The load() method returns a Recording object with:

Signals (pandas.DataFrame)¶

Time-indexed signal data
Channels as columns
Resampled to common time base if multiple streams

Channels (dict)¶

For each channel: - channel_type: Inferred or default type - physical_dimension: Unit (default "a.u.") - sample_frequency: Effective sampling rate - prefilter: Pre-filtering string (default "n/a") - modality: Coarse modality inferred from the channel type

Metadata (dict)¶

device: "XDF"
source_file: Path to the XDF file
stream_count: Number of selected streams
srate: Sampling rate of the reference stream (base sample rate)

Helper Classes¶

XDFSummary¶

Provides an overview of the XDF file:

summary = summarize_xdf('recording.xdf')

# Access all streams
for stream in summary.streams:
    print(f"{stream.name}: {stream.channel_count} channels")

# Find streams by type
emg_streams = summary.get_streams_by_type('EMG')

# Find stream by name
stream = summary.get_stream_by_name('MyDevice')

XDFStreamInfo¶

Contains metadata for a single stream:

stream_id: Unique stream identifier
name: Stream name
stream_type: Stream type (EEG, EMG, etc.)
channel_count: Number of channels
nominal_srate: Declared sampling rate
effective_srate: Actual measured sampling rate
channel_format: Data format (float32, string, etc.)
source_id: Source identifier
hostname: Recording machine hostname
sample_count: Number of samples
duration_seconds: Recording duration
channel_labels: List of channel names

Implementation Notes¶

String/Marker Streams: Streams with channel_format='string' are excluded from signal loading but appear in summaries.
Time Alignment: When loading multiple streams, timestamps are aligned to start at 0.
Resampling: Multiple streams with different rates are resampled using linear interpolation to the highest rate.
Channel Naming: Channels are prefixed with stream name to avoid conflicts (e.g., "StreamName_ChannelLabel").

XDF Importer¶

Class Documentation¶

biosigio.importers.xdf ¶

logger = logging.getLogger(__name__) module-attribute ¶

BaseImporter ¶

load(filepath) abstractmethod ¶

Recording ¶

__init__() ¶

add_channel(label, data, sample_frequency, physical_dimension, channel_type, *, modality=None, prefilter='n/a') ¶

add_event(onset, duration, description) ¶

from_file(filepath, importer=None, force_csv=False, bids_channels='auto', mixed_rate='error', **kwargs) classmethod ¶

get_channel_types() ¶

get_channels_by_modality(modality) ¶

get_channels_by_type(channel_type) ¶

get_duration() ¶

get_metadata(key) ¶

get_modalities() ¶

get_n_channels() ¶

get_n_samples() ¶

get_sampling_frequency() ¶

has_metadata(key) ¶

plot_signals(channels=None, time_range=None, offset_scale=0.8, uniform_scale=True, detrend=False, grid=True, title=None, show=True, plt_module=None) ¶

resample(target_rate) ¶

select_channels(channels=None, channel_type=None, inplace=False, *, modality=None) ¶

set_channel(label, *, channel_type=None, modality=None, physical_dimension=None, prefilter=None) ¶

set_metadata(key, value) ¶

to_arrow(filepath) ¶

to_edf(filepath, method='both', fft_noise_range=None, svd_rank=None, precision_threshold=0.01, format='auto', bypass_analysis=None, verify=False, verify_tolerance=1e-06, verify_channel_map=None, verify_plot=False, events_df=None, create_channels_tsv=True, clip_outliers='auto', **kwargs) ¶

to_parquet(filepath) ¶

to_zarr(filepath, **kwargs) ¶

XDFImporter ¶

load(filepath, stream_names=None, stream_types=None, stream_ids=None, sync_streams=True, default_channel_type='OTHER', include_timestamps=False, reference_stream=None, max_memory_gb=None) ¶

XDFStreamInfo dataclass ¶

__str__() ¶

XDFSummary dataclass ¶

__str__() ¶

get_stream_by_id(stream_id) ¶

get_stream_by_name(name) ¶

get_streams_by_type(stream_type) ¶

_determine_channel_type_from_label(label) ¶

_parse_xdf_metadata_only(filepath) ¶

infer_modality_from_channel_type(channel_type) ¶

summarize_xdf(filepath) ¶

validate_channel_type(channel_type) ¶

Usage Examples¶

Basic Loading¶

Exploring File Contents¶

Selective Stream Loading¶

Setting Default Channel Type¶

Preserving LSL Timestamps¶

File Format Support¶

Stream Selection Parameters¶

Return Values¶

Signals (pandas.DataFrame)¶

Channels (dict)¶

Metadata (dict)¶

Helper Classes¶

XDFSummary¶

XDFStreamInfo¶

Implementation Notes¶

`biosigio.importers.xdf` ¶

`logger = logging.getLogger(name)` `module-attribute` ¶

`BaseImporter` ¶

`load(filepath)` `abstractmethod` ¶

`Recording` ¶

`init()` ¶

`add_channel(label, data, sample_frequency, physical_dimension, channel_type, *, modality=None, prefilter='n/a')` ¶

`add_event(onset, duration, description)` ¶

`from_file(filepath, importer=None, force_csv=False, bids_channels='auto', mixed_rate='error', **kwargs)` `classmethod` ¶

`get_channel_types()` ¶

`get_channels_by_modality(modality)` ¶

`get_channels_by_type(channel_type)` ¶

`get_duration()` ¶

`get_metadata(key)` ¶

`get_modalities()` ¶

`get_n_channels()` ¶

`get_n_samples()` ¶

`get_sampling_frequency()` ¶

`has_metadata(key)` ¶

`plot_signals(channels=None, time_range=None, offset_scale=0.8, uniform_scale=True, detrend=False, grid=True, title=None, show=True, plt_module=None)` ¶

`resample(target_rate)` ¶

`select_channels(channels=None, channel_type=None, inplace=False, *, modality=None)` ¶

`set_channel(label, *, channel_type=None, modality=None, physical_dimension=None, prefilter=None)` ¶

`set_metadata(key, value)` ¶

`to_arrow(filepath)` ¶

`to_edf(filepath, method='both', fft_noise_range=None, svd_rank=None, precision_threshold=0.01, format='auto', bypass_analysis=None, verify=False, verify_tolerance=1e-06, verify_channel_map=None, verify_plot=False, events_df=None, create_channels_tsv=True, clip_outliers='auto', **kwargs)` ¶

`to_parquet(filepath)` ¶

`to_zarr(filepath, **kwargs)` ¶

`XDFImporter` ¶

`load(filepath, stream_names=None, stream_types=None, stream_ids=None, sync_streams=True, default_channel_type='OTHER', include_timestamps=False, reference_stream=None, max_memory_gb=None)` ¶

`XDFStreamInfo` `dataclass` ¶

`str()` ¶

`XDFSummary` `dataclass` ¶

`str()` ¶

`get_stream_by_id(stream_id)` ¶

`get_stream_by_name(name)` ¶

`get_streams_by_type(stream_type)` ¶

`_determine_channel_type_from_label(label)` ¶

`_parse_xdf_metadata_only(filepath)` ¶

`infer_modality_from_channel_type(channel_type)` ¶

`summarize_xdf(filepath)` ¶

`validate_channel_type(channel_type)` ¶