Data formats
This section describes how NWAVE expects the input data to be organized on disk for each supported task.
Classification format
Classification datasets must be organized as a folder containing one subfolder per class, each one containing .wav files:
data_parent/
class_a/
sample_001.wav
sample_002.wav
class_b/
example_001.wav
example_002.wav
...
Audio requirements
- File type:
.wav - Channels: mono or stereo (stereo is automatically converted to mono by averaging channels)
- Sample rate: audio is resampled to the
sample_rateyou pass toNWaveDataGen - Length: each waveform is padded or trimmed to exactly
recording_duration_s
Labels
Labels are derived from folder names (e.g., class_a, class_b).
Regression format
Regression datasets are organized with a wavs/ folder for inputs and a target/ folder for targets:
data_parent/
wavs/
file_001.wav
file_002.wav
target/
file_001.pt
file_002.pt
- Each
.wavmust have a corresponding target tensor stored as: target/<stem>.ptwhere<stem>is the wav filename without extension.- Example:
wavs/file_001.wav→target/file_001.pt
Target tensor requirements
Targets must be torch.Tensor objects saved with torch.save(...).
The target must match the time axis of the extracted features (same number of time bins produced by the filterbank + time binning). The accepted shapes for target labels y on which the per-time-bin regression will happen are (T,) (mono-channel regression) or (T, C) (multi-channel regression)
Hardware note: what is supported on-chip
At the moment, the Neuronova chip path is optimized for classification tasks.
Classification is fully supported and optimized, while regression on membrane must be explored and hardware aware network have two limiting factors:
- is hard to define due to positive membrane constraints of chip, which prevent representing/accumulating values required by regression
- threshold is not tunable and zero reset-mechanism: this two combined causes the membrane to always be below the threshold and lacking the expressability of values above the fixed threshold value
However, software regression is still provided for experimentation and dataset preparation since is possible to approach the problem with spikes or using function of spikes.
At the same time, user can still test regression task on the membrane with a LIF net or with a last layer with LIF neuron(s) to experiment the core net learning and using an appropriate membrane for regression itsself.