Skip to content

Data formats

This section describes how NWAVE expects the input data to be organized on disk for each supported task.


Classification format

Classification datasets must be organized as a folder containing one subfolder per class, each one containing .wav files:

data_parent/
  class_a/
    sample_001.wav
    sample_002.wav
  class_b/
    example_001.wav
    example_002.wav
  ...

Audio requirements

  • File type: .wav
  • Channels: mono or stereo (stereo is automatically converted to mono by averaging channels)
  • Sample rate: audio is resampled to the sample_rate you pass to NWaveDataGen
  • Length: each waveform is padded or trimmed to exactly recording_duration_s

Labels

Labels are derived from folder names (e.g., class_a, class_b).


Regression format

Regression datasets are organized with a wavs/ folder for inputs and a target/ folder for targets:

data_parent/
  wavs/
    file_001.wav
    file_002.wav
  target/
    file_001.pt
    file_002.pt
  • Each .wav must have a corresponding target tensor stored as:
  • target/<stem>.pt where <stem> is the wav filename without extension.
  • Example: wavs/file_001.wavtarget/file_001.pt

Target tensor requirements

Targets must be torch.Tensor objects saved with torch.save(...).

The target must match the time axis of the extracted features (same number of time bins produced by the filterbank + time binning). The accepted shapes for target labels y on which the per-time-bin regression will happen are (T,) (mono-channel regression) or (T, C) (multi-channel regression)


Hardware note: what is supported on-chip

At the moment, the Neuronova chip path is optimized for classification tasks.

Classification is fully supported and optimized, while regression on membrane must be explored and hardware aware network have two limiting factors:

  • is hard to define due to positive membrane constraints of chip, which prevent representing/accumulating values required by regression
  • threshold is not tunable and zero reset-mechanism: this two combined causes the membrane to always be below the threshold and lacking the expressability of values above the fixed threshold value

However, software regression is still provided for experimentation and dataset preparation since is possible to approach the problem with spikes or using function of spikes.

At the same time, user can still test regression task on the membrane with a LIF net or with a last layer with LIF neuron(s) to experiment the core net learning and using an appropriate membrane for regression itsself.