Surrogate Gradients
NWAVE uses spike surrogate gradients to train spiking neurons with backpropagation. All surrogates here implement:
- Forward: a hard threshold (Heaviside)
- Backward: a smooth, non-zero derivative used only for gradient flow
These surrogates are compatible with PyTorch autograd and can be passed to layers via spike_grad=....
How to use
Neuron layers accept a spike_grad argument:
from nwavesdk.surrogate import fast_sigmoid, atan, sigmoid
from nwavesdk.layers import HWLayer, LIFLayer
hw = HWLayer(n_neurons=64, taus=20e-3, dt=1e-3, spike_grad=fast_sigmoid())
lif = LIFLayer(n_neurons=64, taus=20e-3, thresholds=1.0, reset_mechanism="subtraction",
dt=1e-3, spike_grad=atan(alpha=2.0))
Note
Surrogates affect training gradients, not forward spiking behavior (forward remains a hard threshold).
Available surrogates
ATan
Factory: atan(alpha: float = 2.0)
Forward spike with hard threshold
Backward \( \frac{\partial y}{\partial x} = \frac{\alpha/2}{1 + \left(\frac{\pi}{2}\alpha x\right)^2} \)
Key parameter
- alpha controls slope/sharpness. Higher values concentrate gradients near 0.
When to use - Often stable and smooth; can work well when you want a heavier tail than sigmoid-like surrogates.
FastSigmoid
Factory: fast_sigmoid(slope: float = 25.0)
Forward spike with hard threshold
Backward \( \frac{\partial y}{\partial x} = \frac{1}{(\text{slope}\cdot|x| + 1)^2} \)
Key parameter
- slope controls steepness around 0 (default 25).
When to use - A robust default surrogate; commonly used for stable training.
Sigmoid
Factory: sigmoid(slope: float = 25.0)
Forward spike with hard threshold
Backward \( \frac{\partial y}{\partial x} = \frac{k e^{-kx}}{(e^{-kx} + 1)^2} \quad\text{with}\quad k=\text{slope} \)
Key parameter
- slope controls steepness (default 25).
When to use - Useful if you prefer a classic logistic-gradient shape around threshold.
Notes on GPU comPaSSo behavior (LIFLayer)
When LIFLayer(device="gpu") uses the comPaSSo path:
- Custom surrogate modules are bypassed
- Internally, a FastSigmoid-like surrogate with fixed slope = 25 is used for numerical stability
- This can lead to small training differences compared to CPU serial simulation
Warning
If you rely on a specific surrogate (e.g., ATan with tuned alpha), use the CPU path.
On GPU comPaSSo, surrogate choice is currently not configurable.
Practical tips
- Start with
fast_sigmoid()as a baseline. - If gradients vanish too quickly, try lowering slope/alpha a bit; if training is too noisy, increase them.
- For reproducible comparisons across surrogates, keep everything else fixed (seed, mismatch settings, quantization).