Hyena Hierarchy seems to aim to be a drop-in replacement for attention : https://arxiv.org/pdf/2302.10866.pdf
It looks good on paper, but I haven’t been able to find anybody using it in a model. Does anyone have an example of a code or implementation ? Is there really a big improvement on long context lengths ?
You must log in or # to comment.