How hierarchical softmax speeds up inference in languge models. Why does it work, how does it work and everything in between.
Share this post
Simplest explanation on hierarchical softmax
Share this post
How hierarchical softmax speeds up inference in languge models. Why does it work, how does it work and everything in between.