Lingua Custodia (@linguacustodia) 's Twitter Profile
Lingua Custodia

@linguacustodia

Natural Language Processing (NLP) for Finance

ID: 1327550497

linkhttp://www.linguacustodia.finance/ calendar_today04-04-2013 18:29:51

202 Tweet

185 Takipçi

202 Takip Edilen

JundeWu (@jundemorsenwu) 's Twitter Profile Photo

Mamba3 just silently dropped on ICLR🤯 A faster, longer-context, and more scalable LLM architecture than Transformers A few years ago, some researchers started rethinking sequence modeling from a different angle. Instead of stacking more attention layers, they went back to an

Mamba3 just silently dropped on ICLR🤯

A faster, longer-context, and more scalable LLM architecture than Transformers 

A few years ago, some researchers started rethinking sequence modeling from a different angle.
Instead of stacking more attention layers, they went back to an
Alexandre TL (@alexandretl2) 's Twitter Profile Photo

RoPE vs NoPE in hybrid linear attention models like Kimi Linear / Qwen3Next is tricky for example, when using NoPE, we found that slowly expanding the window size of the attention layers during training (64->4k) greatly helps convergence:

RoPE vs NoPE in hybrid linear attention models like Kimi Linear / Qwen3Next is tricky
for example, when using NoPE, we found that slowly expanding the window size of the attention layers during training (64->4k) greatly helps convergence:
Lingua Custodia (@linguacustodia) 's Twitter Profile Photo

🥳 #Adopt_AI - Grand Palais is rapidly approaching! Iconic event at an iconic venue! 😍 We'll be at booth F6—we'd love to see you there! 🤗 📅 25-26 November 🗺️ The Grand Palais, Paris 🥖 👉 adoptai.artefact.com/partner/ac083f…

Lingua Custodia (@linguacustodia) 's Twitter Profile Photo

We can only agree with Christine Lagarde latest speech on EU risk of lagging on #AI. As Andreea Niculcea puts it (c) , "We become users of someone else’s models, someone else’s compute, someone else’s innovation — with our margins squeezed and our clients drifting to whoever offers