Marco Virgolin 🇺🇦 (@marcovirgolin) 's Twitter Profile
Marco Virgolin 🇺🇦

@marcovirgolin

❤️ Machine Learning 🤖, LLMs 🦙, Evolutionary Computation 🧬, eXplainable AI 🔍, Bouldering 🧗

Opinions are my own

ID: 580688158

linkhttps://marcovirgolin.github.io calendar_today15-05-2012 08:28:35

715 Tweet

309 Followers

335 Following

Amirhossein Kazemnejad (@a_kazemnejad) 's Twitter Profile Photo

When we plot the attentions we find the PEs exhibit different patterns. NoPE & T5's Relative PE show both short-range and long-range attention, ALiBi favors short-range, while Rotary & APE distribute attention more uniformly.🤯 [10/n]

When we plot the attentions we find the PEs exhibit different patterns. NoPE & T5's Relative PE show both short-range and long-range attention, ALiBi favors short-range, while Rotary & APE distribute attention more uniformly.🤯 [10/n]
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Do we need RL to align LLMs with Human feedback? 🔍👀  Last week, Stanford University researchers unveiled a paper introducing Direct Preference Optimization (DPO) - a new algorithm that could change the way we align LLMs with Human Feedback arxiv.org/abs/2305.18290 🧵 1/3

Do we need RL to align LLMs with Human feedback? 🔍👀 
Last week, <a href="/Stanford/">Stanford University</a> researchers unveiled a paper introducing Direct Preference Optimization (DPO) - a new algorithm that could change the way we align LLMs with Human Feedback

arxiv.org/abs/2305.18290

🧵 1/3
Jason Wei (@_jasonwei) 's Twitter Profile Photo

Moving from Google Brain to OpenAI, one of the biggest changes for me was the shift from doing individual/small-group research to working on a team with several dozen people. Specifically, working on a bigger team has led me to think more about UX for researchers. Some examples:

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

My fun weekend hack: llama2.c 🦙🤠 github.com/karpathy/llama… Lets you train a baby Llama 2 model in PyTorch, then inference it with one 500-line file with no dependencies, in pure C. My pretrained model (on TinyStories) samples stories in fp32 at 18 tok/s on my MacBook Air M1 CPU.

My fun weekend hack: llama2.c 🦙🤠
github.com/karpathy/llama…
Lets you train a baby Llama 2 model in PyTorch, then inference it with one 500-line file with no dependencies, in pure C. My pretrained model (on TinyStories) samples stories in fp32 at 18 tok/s on my MacBook Air M1 CPU.
Arvind Narayanan (@random_walker) 's Twitter Profile Photo

Prompt injection keeps getting wilder. Unlike text-based methods, you can't tell by looking at one of these images or audio clips that it's malicious. It can be embedded in a website/email attachment and will alter the model's behavior if it processes it. arxiv.org/abs/2307.10490

Prompt injection keeps getting wilder. Unlike text-based methods, you can't tell by looking at one of these images or audio clips that it's malicious. It can be embedded in a website/email attachment and will alter the model's behavior if it processes it.
arxiv.org/abs/2307.10490
Richard Song @ ICLR 2025 (@xingyousong) 's Twitter Profile Photo

Excited to announce our latest work in genetic programming: AutoRobotics-Zero (ARZ)! Using AutoML-Zero’s search method, we’re able to build compact, interpretable robot policies which can quickly adapt to drastic in-episode environment changes, such as broken legs. arXiv:

Excited to announce our latest work in genetic programming: AutoRobotics-Zero (ARZ)!

Using AutoML-Zero’s search method, we’re able to build compact, interpretable robot policies which can quickly adapt to drastic in-episode environment changes, such as broken legs.

arXiv:
Hamel Husain (@hamelhusain) 's Twitter Profile Photo

Here is how you can obtain a massive speedup with llama-v2 models, much faster than anything else I tried. It's so fast that its unreal. I made some additional notes on how to avoid a temporary foot gun as well hamel.dev/notes/llm/infe…

Here is how you can obtain a massive speedup with llama-v2 models, much faster than anything else I tried.   

It's so fast that its unreal.  I made some additional notes on how to avoid a temporary foot gun as well

hamel.dev/notes/llm/infe…
Sakana AI (@sakanaailabs) 's Twitter Profile Photo

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery!

sakana.ai/ai-scientist/

From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI
Giovanni Iacca (@gih82) 's Twitter Profile Photo

Why bother using deep learning on control tasks when you can use a sparse mixture of (very) shallow experts to get good performance *and* interpretability by design? How? Please see our recently accepted #AAAI2025 paper. Huge congratulations to the amazing team of authors! 🔥

Jeremy Howard (@jeremyphoward) 's Twitter Profile Photo

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

I'll get straight to the point.

We trained 2 new models. Like BERT, but modern. ModernBERT.

Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.

It's much faster, more accurate, longer context, and more useful. 🧵
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training &amp; inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With
Miles Cranmer (@milescranmer) 's Twitter Profile Photo

Why 'I don’t know' is the true test for AGI—it’s a strictly harder problem than text generation! This magnificent 62-page paper (arxiv.org/abs/2408.02357) formally proves AGI hallucinations are inevitable, with 50 pages (!!) of supplementary proofs.

Why 'I don’t know' is the true test for AGI—it’s a strictly harder problem than text generation!

This magnificent 62-page paper (arxiv.org/abs/2408.02357) formally proves AGI hallucinations are inevitable, with 50 pages (!!) of supplementary proofs.