Data preprocessing pipeline
This page explains what happens when generating features and building PyTorch DataLoaders.
The feature extraction stage simulates the filtering stage implemented on the H1 Neuronova chip. The preprocessing stage is made simple with only one module call once the data-format is respected, as shown in the previous section.
NWaveDataGen Module
When you create an NWaveDataGen and call .dataloaders(), it:
- Loads
.wavfiles fromdata_parent(format depends on task: classification vs regression). - Pads or trims every audio clip to
recording_duration_s. - Applies a chip-like filterbank frontend.
- Time-bins features using
sim_time_s. - Splits the dataset into train/val/test and returns PyTorch
DataLoaders.
Key parameters
recording_duration_s
Controls the input fixed waveform length (in seconds) used for all samples.
The waveform is padded with zeros or trimmed to:
target_len_samples = recording_duration_s * sample_rate
If padding is not desired or not appropriate for the user use-case, it is user's responsability to pad in other ways the data as NWAVE doesn't support "same" or other types of padding since are not realistic in Neuronova real-time use cases.
sim_time_s
Controls the time bin size used during feature extraction.
The number of samples per time bin is:
bin_samples = sim_time_s * sample_rate
The number of time frames (time bins) is:
T = floor(target_len_samples / bin_samples)
Examples: shapes from (recording_duration_s, sim_time_s)
Assuming the default frequency set (H1 chip) has 16 center frequencies (i.e., F = 16):
Example A
sample_rate = 32000recording_duration_s = 1.0sim_time_s = 1e-3(1 ms)
Then:
- target_len_samples = 1.0 * 32000 = 32000
- bin_samples = 0.001 * 32000 = 32
- T = floor(32000 / 32) = 1000
Resulting feature shape per batch become (B, 1000, 16) following the [B, T, C] format
Example B
sample_rate = 16000recording_duration_s = 0.5sim_time_s = 2e-3(2 ms)
Then:
target_len_samples = 0.5 * 16000 = 8000bin_samples = 0.002 * 16000 = 32T = floor(8000 / 32) = 250
Feature shape per sample in a batch:
(250, 16)
Nyquist constraints
The filterbank uses a set of predefined center frequencies. Any center frequency at or above Nyquist (sample_rate / 2) is invalid for digital filtering.
If your sample_rate makes some configured frequencies exceed Nyquist (for example, using a low sampling rate with a high-frequency set), NWAVE automatically drops out-of-range frequencies before building the filterbank and logs a warning indicating how many frequencies were kept vs dropped
Practical implications:
- The number of frequency channels
Fcan decrease if your sampling rate is too low for the configured frequency set. - To preserve the full bank you'd need a data with high enough sampling rate such that all configured frequencies are strictly below
sample_rate/2.
Classification dataloaders
dm.dataloaders() returns a dictionary like:
{
"train": DataLoader,
"val": DataLoader,
"test": DataLoader, # only if test_split > 0
}
Each batch contains batched tensors:
- inputs: a
torch.Tensorcontaining features - targets: class labels (dataset-dependent, typically integer indices)
Input batch shapes is (B, T, C).
Where B = batch_size.
return_filename
If return_filename=True, each sample additionally includes the original filename:
- For classification it is returned as
class_name/filename.wav(relative todata_parent) - The dataloader batch will include an extra filename field (e.g., list/tuple of strings), useful for debugging and results analysis direcly on raw data.
Regression dataloaders
Regression uses pairs of:
- input features: a
torch.Tensor - target tensor: a
torch.Tensorloaded fromtarget/<stem>.pt
Target time dimension must match the feature time dimension T.
Batch contents:
- inputs: batched tensor of features
- targets: batched tensor of regression targets (shape depends on your target encoding)
If return_filename=True, each sample additionally includes the .wav filename (e.g. file_001.wav).
Minimal example
from nwavesdk.data import NWaveDataGen, NWaveDataloaderConfig
data_config = NWaveDataloaderConfig(
batch_size=4,
val_split=0.2,
test_split=0.0,
random_state=123,
num_workers=4,
shuffle_train=True,
)
dm = NWaveDataGen(
data_parent="data_for_nwave",
sample_rate=32000, # Hz
recording_duration_s=1.0, # seconds
sim_time_s=1e-3, # seconds (1 ms)
dataloader_config=data_config,
task="classification",
return_filename=True,
)
loaders = dm.dataloaders()