Alvaro Arroyo (@arroyo_alvr) 's Twitter Profile
Alvaro Arroyo

@arroyo_alvr

PhD ML @UniofOxford ; Sequence Modelling & Graph Representation Learning; Previously at @imperialcollege

ID: 1839292093493293058

calendar_today26-09-2024 13:13:12

59 Tweet

144 Followers

146 Following

Andrew Gordon Wilson (@andrewgwils) 's Twitter Profile Photo

The response has been surprisingly muted on GPT-5. My sense is that it’s an improvement, but not groundbreaking, unlike the delta for GPT-2 -> 3 ->4. It may provide more evidence that scaling is exhausting itself as a paradigm. Time to go back to ideas.

Joey Bose (@bose_joey) 's Twitter Profile Photo

📢Interested in doing a PhD in generative models 🤖, AI4Science 🧬, Sampling 🧑‍🔬, and beyond? I am hiring PhD students at Imperial College London Imperial Computing for the next application cycle. 🔗See the call below: joeybose.github.io/phd-positions/ And a light expression of interest:

François Chollet (@fchollet) 's Twitter Profile Photo

The most important skill for a researcher is not technical ability. It's taste. The ability to identify interesting and tractable problems, and recognize important ideas when they show up. This can't be taught directly. It's cultivated through curiosity and broad reading.

charliebtan (@charliebtan) 's Twitter Profile Photo

Super excited to announce our recent work was accepted to NeurIPS 2025! 🌟 We introduce Prose, a 280M-parameter transferable normalizing flow proposal for efficient sampling of unseen peptide sequences 😮 Many thanks to the fantastic team!

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin "Through experiments across several models (410M–120B parameters), we confirm that when the beginning-of-sequence token develops extreme activation norms in the middle layers, both compression valleys

Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin

"Through experiments across several models (410M–120B parameters), we confirm that when the beginning-of-sequence token develops extreme activation norms in the middle layers, both compression valleys
Sepp Hochreiter (@hochreitersepp) 's Twitter Profile Photo

gLSTM extends xLSTM to a graph neural network architecture: arxiv.org/abs/2510.08450 "gLSTM mitigates sensitivity over-squashing and capacity over-squashing." "gLSTM achieves comfortably state of the art results on the Diameter and Eccentricity Graph Property Prediction tasks"

gLSTM extends xLSTM to a graph neural network architecture: arxiv.org/abs/2510.08450

"gLSTM mitigates sensitivity over-squashing and capacity over-squashing."

"gLSTM achieves comfortably state of the art results on the Diameter and Eccentricity Graph Property Prediction tasks"