Skip to content

Surrogate Gradients


NWAVE uses spike surrogate gradients to train spiking neurons with backpropagation. All surrogates here implement:

  • Forward: a hard threshold (Heaviside)
  • Backward: a smooth, non-zero derivative used only for gradient flow

These surrogates are compatible with PyTorch autograd and can be passed to layers via spike_grad=....


How to use

Neuron layers accept a spike_grad argument:

from nwavesdk.surrogate import fast_sigmoid, atan, sigmoid
from nwavesdk.layers import HWLayer, LIFLayer

hw = HWLayer(n_neurons=64, taus=20e-3, dt=1e-3, spike_grad=fast_sigmoid())
lif = LIFLayer(n_neurons=64, taus=20e-3, thresholds=1.0, reset_mechanism="subtraction",
               dt=1e-3, spike_grad=atan(alpha=2.0))

Note

Surrogates affect training gradients, not forward spiking behavior (forward remains a hard threshold).


Available surrogates

ATan

Factory: atan(alpha: float = 2.0)

Forward spike with hard threshold

Backward \( \frac{\partial y}{\partial x} = \frac{\alpha/2}{1 + \left(\frac{\pi}{2}\alpha x\right)^2} \)

Key parameter - alpha controls slope/sharpness. Higher values concentrate gradients near 0.

When to use - Often stable and smooth; can work well when you want a heavier tail than sigmoid-like surrogates.


FastSigmoid

Factory: fast_sigmoid(slope: float = 25.0)

Forward spike with hard threshold

Backward \( \frac{\partial y}{\partial x} = \frac{1}{(\text{slope}\cdot|x| + 1)^2} \)

Key parameter - slope controls steepness around 0 (default 25).

When to use - A robust default surrogate; commonly used for stable training.


Sigmoid

Factory: sigmoid(slope: float = 25.0)

Forward spike with hard threshold

Backward \( \frac{\partial y}{\partial x} = \frac{k e^{-kx}}{(e^{-kx} + 1)^2} \quad\text{with}\quad k=\text{slope} \)

Key parameter - slope controls steepness (default 25).

When to use - Useful if you prefer a classic logistic-gradient shape around threshold.


Notes on GPU comPaSSo behavior (LIFLayer)

When LIFLayer(device="gpu") uses the comPaSSo path:

  • Custom surrogate modules are bypassed
  • Internally, a FastSigmoid-like surrogate with fixed slope = 25 is used for numerical stability
  • This can lead to small training differences compared to CPU serial simulation

Warning

If you rely on a specific surrogate (e.g., ATan with tuned alpha), use the CPU path. On GPU comPaSSo, surrogate choice is currently not configurable.


Practical tips

  • Start with fast_sigmoid() as a baseline.
  • If gradients vanish too quickly, try lowering slope/alpha a bit; if training is too noisy, increase them.
  • For reproducible comparisons across surrogates, keep everything else fixed (seed, mismatch settings, quantization).