▲SpikingBrain 7B – More efficient than classic LLMsgithub.com

54 points by somethingsome 7 hours ago | 13 comments

augment_me 2 hours ago [-]

To me it sounds like sparse matrix multiplication repackaged as "event-driven spiking computation", where the spikes are simply the non-zero elements that sparse GPU kernels have always been designed to process.

The supposedly dynamic/temporal nature of the model seems to be not applied for GPU execution, collapsing it into a single static computation equivalent to just applying a pre-calculated sparsity mask.

Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...

GregarianChild 2 hours ago [-]

The 'brain-inspired' community has always been doing this, since Carver Mead introduced the term 'neuromorphic' in the late 1980s. Reselling banalities as a new great insight. My favourite is "Neuromorphic computing breakthrough could enable blockchain on Mars" [1]. What else can they do? After all, that community has now multiple decades of failure under it's belt. Not a single success. Failure to make progress in AI and failure to say anything of interest about the brain. To paraphrase a US president: In this world nothing can be said to be certain, except death, taxes and neuromphicists exaggerating. (Aside: I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'. I am not sure about details ...). The whole 'brain talk' malarkey goes back way longer. In particular psychology and related subjects, since their origins as a specialty in the 19th century, have heavily used brain-inspired metaphors that were intended to mislead. Already in the 19th century that was criticised. See [3] for an interesting discussion.

There is something interesting in this post, namely that it's based on non-Nvidia GPUs, in this case MetaX [2]. I don't know how competitive MetaX are today, but I would not bet against China in the longer term.

[1] https://cointelegraph.com/news/neuromorphic-computing-breakt...

[2] https://en.wikipedia.org/wiki/MetaX

[3] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6

cpldcpu 2 hours ago [-]

I believe the argument is that you can also encode information in the time domain.

If we just look at spikes as a different numerical representation, then they are clearly inferior. For example, consider that encoding the number 7 will require seven consecutive pulses on a single spiking line. Encoding the number in binary will require one pulse on three parallel lines.

Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...

On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.

CuriouslyC 7 minutes ago [-]

https://en.wikipedia.org/wiki/Frequency-division_multiplexin...

The brain is doing shit like this.

dist-epoch 10 minutes ago [-]

> you can also encode information in the time domain.

Also known as a serial interface. They are very successful: PCIe lane, SATA, USB.

cpldcpu 4 hours ago [-]

>The current implementation adopts pseudo-spiking, where activations are approximated as spike-like signals at the tensor level, rather than true asynchronous event-driven spiking on neuromorphic hardware.

Isn't that in essence very similar to Quantization Aware Training (QaT)?

spwa4 3 hours ago [-]

Can you explain more? Why would that be the case? What is being passed from one layer to the next is not a linear value but the delay until the next spike, which is very different.

cpldcpu 2 hours ago [-]

It was also a question from my side. :)

But I understand that they simulate the spikes as integer events in the forward pass (as described here https://github.com/BICLab/Int2Spike) and calculate a continuous gradient based on high resolution weights for the backward pass.

This seems to be very similar to the straight-through-estimator (STE) approach that us usually used for quantization aware training. I may be wrong though.

RLAIF 20 minutes ago [-]

SpikingBrain treats 'spikes' as 1-bit quantization stickers. True neural-level sparsity should be input-dependent, time-resolved, and self-organized during learning. If a new circuit diagram cannot 'grow' with every forward pass, then don't blame everyone for treating it as Another Sparse Marketing - oh wait, Neuromorphic Marketing.

asdfasdf1 4 hours ago [-]

SpikingBrain Technical Report: Spiking Brain-inspired Large Models https://arxiv.org/abs/2509.05276

bob1029 3 hours ago [-]

https://news.ycombinator.com/item?id=45206420

cpldcpu 3 hours ago [-]

Well, it would still allow to deploy the trained model to SNN hardware, if it existed.

imtringued 2 hours ago [-]

In a few years China will be completely independent from Nvidia.

https://en.wikipedia.org/wiki/MetaX

They have GPU manufacturers that nobody in the west has ever heard of.

Loading comments...

augment_me 2 hours ago [-]

Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...

GregarianChild 2 hours ago [-]

[1] https://cointelegraph.com/news/neuromorphic-computing-breakt...

[2] https://en.wikipedia.org/wiki/MetaX

[3] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6

cpldcpu 2 hours ago [-]

I believe the argument is that you can also encode information in the time domain.

Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...

On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.

CuriouslyC 7 minutes ago [-]

https://en.wikipedia.org/wiki/Frequency-division_multiplexin...

The brain is doing shit like this.

dist-epoch 10 minutes ago [-]

> you can also encode information in the time domain.

Also known as a serial interface. They are very successful: PCIe lane, SATA, USB.

cpldcpu 4 hours ago [-]

Isn't that in essence very similar to Quantization Aware Training (QaT)?

spwa4 3 hours ago [-]

Can you explain more? Why would that be the case? What is being passed from one layer to the next is not a linear value but the delay until the next spike, which is very different.

cpldcpu 2 hours ago [-]

It was also a question from my side. :)

This seems to be very similar to the straight-through-estimator (STE) approach that us usually used for quantization aware training. I may be wrong though.

RLAIF 20 minutes ago [-]

asdfasdf1 4 hours ago [-]

SpikingBrain Technical Report: Spiking Brain-inspired Large Models https://arxiv.org/abs/2509.05276

bob1029 3 hours ago [-]

https://news.ycombinator.com/item?id=45206420

cpldcpu 3 hours ago [-]

Well, it would still allow to deploy the trained model to SNN hardware, if it existed.

imtringued 2 hours ago [-]

In a few years China will be completely independent from Nvidia.

https://en.wikipedia.org/wiki/MetaX

They have GPU manufacturers that nobody in the west has ever heard of.