Data preprocessing pipeline

This page explains what happens when generating features and building PyTorch DataLoaders.

The feature extraction stage simulates the filtering stage implemented on the H1 Neuronova chip. The preprocessing stage is made simple with only one module call once the data-format is respected, as shown in the previous section.

NWaveDataGen Module

When you create an NWaveDataGen and call .dataloaders(), it:

Loads .wav files from data_parent (format depends on task: classification vs regression).
Pads or trims every audio clip to recording_duration_s.
Applies a chip-like filterbank frontend.
Time-bins features using sim_time_s.
Splits the dataset into train/val/test and returns PyTorch DataLoaders.

Key parameters

`recording_duration_s`

Controls the input fixed waveform length (in seconds) used for all samples.

The waveform is padded with zeros or trimmed to:

target_len_samples = recording_duration_s * sample_rate

If padding is not desired or not appropriate for the user use-case, it is user's responsability to pad in other ways the data as NWAVE doesn't support "same" or other types of padding since are not realistic in Neuronova real-time use cases.

`sim_time_s`

Controls the time bin size used during feature extraction.

The number of samples per time bin is:

bin_samples = sim_time_s * sample_rate

The number of time frames (time bins) is:

T = floor(target_len_samples / bin_samples)

Examples: shapes from `(recording_duration_s, sim_time_s)`

Assuming the default frequency set (H1 chip) has 16 center frequencies (i.e., F = 16):

Example A

sample_rate = 32000
recording_duration_s = 1.0
sim_time_s = 1e-3 (1 ms)

Then: - target_len_samples = 1.0 * 32000 = 32000 - bin_samples = 0.001 * 32000 = 32 - T = floor(32000 / 32) = 1000

Resulting feature shape per batch become (B, 1000, 16) following the [B, T, C] format

Example B

sample_rate = 16000
recording_duration_s = 0.5
sim_time_s = 2e-3 (2 ms)

Then:

target_len_samples = 0.5 * 16000 = 8000
bin_samples = 0.002 * 16000 = 32
T = floor(8000 / 32) = 250

Feature shape per sample in a batch:

(250, 16)

Nyquist constraints

The filterbank uses a set of predefined center frequencies. Any center frequency at or above Nyquist (sample_rate / 2) is invalid for digital filtering.

If your sample_rate makes some configured frequencies exceed Nyquist (for example, using a low sampling rate with a high-frequency set), NWAVE automatically drops out-of-range frequencies before building the filterbank and logs a warning indicating how many frequencies were kept vs dropped

Practical implications:

The number of frequency channels F can decrease if your sampling rate is too low for the configured frequency set.
To preserve the full bank you'd need a data with high enough sampling rate such that all configured frequencies are strictly below sample_rate/2.

Classification dataloaders

dm.dataloaders() returns a dictionary like:

{
  "train": DataLoader,
  "val": DataLoader,
  "test": DataLoader,  # only if test_split > 0
}

Each batch contains batched tensors:

inputs: a torch.Tensor containing features
targets: class labels (dataset-dependent, typically integer indices)

Input batch shapes is (B, T, C).

Where B = batch_size.

`return_filename`

If return_filename=True, each sample additionally includes the original filename:

For classification it is returned as class_name/filename.wav (relative to data_parent)
The dataloader batch will include an extra filename field (e.g., list/tuple of strings), useful for debugging and results analysis direcly on raw data.

Regression dataloaders

Regression uses pairs of:

input features: a torch.Tensor
target tensor: a torch.Tensor loaded from target/<stem>.pt

Target time dimension must match the feature time dimension T.

Batch contents:

inputs: batched tensor of features
targets: batched tensor of regression targets (shape depends on your target encoding)

If return_filename=True, each sample additionally includes the .wav filename (e.g. file_001.wav).

Minimal example

from nwavesdk.data import NWaveDataGen, NWaveDataloaderConfig

data_config = NWaveDataloaderConfig(
    batch_size=4,
    val_split=0.2,
    test_split=0.0,
    random_state=123,
    num_workers=4,
    shuffle_train=True,
)

dm = NWaveDataGen(
    data_parent="data_for_nwave",
    sample_rate=32000,          # Hz
    recording_duration_s=1.0,   # seconds
    sim_time_s=1e-3,            # seconds (1 ms)
    dataloader_config=data_config,
    task="classification",
    return_filename=True,
)

loaders = dm.dataloaders()