Marco Virgolin 🇺🇦 (@marcovirgolin) Twitter Tweets • TwiCopy

Andrej Karpathy

@karpathy

2 years ago

E = mc^2 + AI 😂😂😂 t-shirt meme potential

thumb_up_off_alt2,2K

chat_bubble_outline136

repeat162

shareShare

When we plot the attentions we find the PEs exhibit different patterns. NoPE & T5's Relative PE show both short-range and long-range attention, ALiBi favors short-range, while Rotary & APE distribute attention more uniformly.🤯 [10/n]

thumb_up_off_alt41

chat_bubble_outline3

repeat4

shareShare

Philipp Schmid

@_philschmid

2 years ago

Do we need RL to align LLMs with Human feedback? 🔍👀 Last week, Stanford University researchers unveiled a paper introducing Direct Preference Optimization (DPO) - a new algorithm that could change the way we align LLMs with Human Feedback arxiv.org/abs/2305.18290 🧵 1/3

Do we need RL to align LLMs with Human feedback? 🔍👀
Last week, <a href="/Stanford/">Stanford University</a> researchers unveiled a paper introducing Direct Preference Optimization (DPO) - a new algorithm that could change the way we align LLMs with Human Feedback

arxiv.org/abs/2305.18290

🧵 1/3

thumb_up_off_alt233

chat_bubble_outline3

repeat60

shareShare

Jason Wei

@_jasonwei

2 years ago

Moving from Google Brain to OpenAI, one of the biggest changes for me was the shift from doing individual/small-group research to working on a team with several dozen people. Specifically, working on a bigger team has led me to think more about UX for researchers. Some examples:

thumb_up_off_alt1,1K

chat_bubble_outline24

repeat151

shareShare

Marco Virgolin 🇺🇦

@marcovirgolin

2 years ago

so dark

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Marco Virgolin 🇺🇦

@marcovirgolin

2 years ago

Arg

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Andrej Karpathy

@karpathy

2 years ago

My fun weekend hack: llama2.c 🦙🤠 github.com/karpathy/llama… Lets you train a baby Llama 2 model in PyTorch, then inference it with one 500-line file with no dependencies, in pure C. My pretrained model (on TinyStories) samples stories in fp32 at 18 tok/s on my MacBook Air M1 CPU.

thumb_up_off_alt5,5K

chat_bubble_outline90

repeat706

shareShare

Arvind Narayanan

@random_walker

2 years ago

Prompt injection keeps getting wilder. Unlike text-based methods, you can't tell by looking at one of these images or audio clips that it's malicious. It can be embedded in a website/email attachment and will alter the model's behavior if it processes it. arxiv.org/abs/2307.10490

thumb_up_off_alt897

chat_bubble_outline8

repeat222

shareShare

Richard Song @ ICLR 2025

@xingyousong

2 years ago

Excited to announce our latest work in genetic programming: AutoRobotics-Zero (ARZ)! Using AutoML-Zero’s search method, we’re able to build compact, interpretable robot policies which can quickly adapt to drastic in-episode environment changes, such as broken legs. arXiv:

thumb_up_off_alt379

chat_bubble_outline11

repeat71

shareShare

Hamel Husain

@hamelhusain

2 years ago

Here is how you can obtain a massive speedup with llama-v2 models, much faster than anything else I tried. It's so fast that its unreal. I made some additional notes on how to avoid a temporary foot gun as well hamel.dev/notes/llm/infe…

thumb_up_off_alt716

chat_bubble_outline23

repeat118

shareShare

Yann LeCun

@ylecun

2 years ago

Haha 😂🤔😭

thumb_up_off_alt2,2K

chat_bubble_outline45

repeat186

shareShare

Sakana AI

@sakanaailabs

a year ago

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI

thumb_up_off_alt6,6K

chat_bubble_outline286

repeat1,1K

shareShare

Marco Virgolin 🇺🇦

@marcovirgolin

a year ago

scholar.google.se/citations?user… yeee

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Miles Cranmer

@milescranmer

a year ago

Just watched Cursor’s AI coding agent struggle to fix a missing parenthesis for 15 minutes. Guess I still have a job

thumb_up_off_alt42

chat_bubble_outline2

repeat4

shareShare

Marco Virgolin 🇺🇦

@marcovirgolin

a year ago

Indeed, amazing foundations for new gamechanging AI systems

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Giovanni Iacca

@gih82

a year ago

Why bother using deep learning on control tasks when you can use a sparse mixture of (very) shallow experts to get good performance *and* interpretability by design? How? Please see our recently accepted #AAAI2025 paper. Huge congratulations to the amazing team of authors! 🔥

thumb_up_off_alt11

chat_bubble_outline0

repeat3

shareShare

Jeremy Howard

@jeremyphoward

a year ago

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

thumb_up_off_alt4,4K

chat_bubble_outline127

repeat676

shareShare

DeepSeek

@deepseek_ai

9 months ago

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

thumb_up_off_alt16,16K

chat_bubble_outline901

repeat2,2K

shareShare

Miles Cranmer

@milescranmer

9 months ago

Why 'I don’t know' is the true test for AGI—it’s a strictly harder problem than text generation! This magnificent 62-page paper (arxiv.org/abs/2408.02357) formally proves AGI hallucinations are inevitable, with 50 pages (!!) of supplementary proofs.

thumb_up_off_alt942

chat_bubble_outline46

repeat140

shareShare

Marco Virgolin 🇺🇦

@marcovirgolin

9 months ago

Beautiful work

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Marco Virgolin 🇺🇦

Andrej Karpathy

Amirhossein Kazemnejad

Philipp Schmid

Jason Wei

Marco Virgolin 🇺🇦

Marco Virgolin 🇺🇦

Andrej Karpathy

Arvind Narayanan

Richard Song @ ICLR 2025

Hamel Husain

Yann LeCun

Sakana AI

Marco Virgolin 🇺🇦

Miles Cranmer

Marco Virgolin 🇺🇦

Giovanni Iacca

Jeremy Howard

DeepSeek

Miles Cranmer

Marco Virgolin 🇺🇦