Ilia Karmanov (@ikdeepl) Twitter Tweets • TwiCopy

Philipp Schmid

9 months ago

SFT Memorizes, RL Generalizes. New Paper from Google DeepMind shows that Reinforcement Learning generalizes at cross-domain, while SFT primarily memorizes. rule-based tasks, while SFT memorizes the training rule. 👀 Experiments 1️⃣ Model & Tasks: Llama-3.2-Vision-11B;

SFT Memorizes, RL Generalizes. New Paper from <a href="/GoogleDeepMind/">Google DeepMind</a> shows that Reinforcement Learning generalizes at cross-domain, while SFT primarily memorizes. rule-based tasks, while SFT memorizes the training rule. 👀

Experiments
1️⃣ Model & Tasks: Llama-3.2-Vision-11B;

thumb_up_off_alt969

chat_bubble_outline15

repeat173

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

9 months ago

Language Models Use Trigonometry to Do Addition "We first discover that numbers are represented in these LLMs as a generalized helix, which is strongly causally implicated for the tasks of addition and subtraction, and is also causally relevant for integer division,

thumb_up_off_alt2,2K

chat_bubble_outline58

repeat356

shareShare

Subhash Kantamneni

@thesubhashk

9 months ago

(1/N) LLMs represent numbers on a helix? And use trigonometry to do addition? Answers below 🧵

thumb_up_off_alt940

chat_bubble_outline23

repeat159

shareShare

Jeff Dean

@jeffdean

9 months ago

Delighted to be a minor co-author on this work, led by Pranav Nair: Combining losses for different Matyroshka-nested groups of bits in each weight within a neural network leads to an accuracy improvement for models, especially for low-bit-precision levels (e.g. 2-bit

thumb_up_off_alt380

chat_bubble_outline10

repeat49

shareShare

Yash Bhalgat

@ysbhalgat

9 months ago

After the FineWeb blog post, Hugging Face 🤗 has dropped another must-read: The Ultra-Scale Playbook – Training LLMs on GPU Clusters. They ran 4000+ experiments across 512 GPUs to break down the real challenges of scaling LLM training -- memory bottlenecks, compute efficiency,

After the FineWeb blog post, <a href="/huggingface/">Hugging Face</a> 🤗 has dropped another must-read: The Ultra-Scale Playbook – Training LLMs on GPU Clusters.

They ran 4000+ experiments across 512 GPUs to break down the real challenges of scaling LLM training -- memory bottlenecks, compute efficiency,

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Angry Tom

@angrytomtweets

8 months ago

RIP Sora. Alibaba just dropped Wan 2.1, and it's absolutely insane. This is the next evolution of AI video generation. Here are 10 mind-blowing features and examples: 1. A ferret entering water 🔊

thumb_up_off_alt3,3K

chat_bubble_outline140

repeat567

shareShare

Wonder of Science

@wonderofscience

8 months ago

A baby platypus is called a puggle.

thumb_up_off_alt7,7K

chat_bubble_outline56

repeat816

shareShare

swissinfo.ch

@swissinfo_en

8 months ago

#Switzerland might be one of the worst places to be a working #woman. On #InternationalWomensDay, we look at a report by The Economist comparing working conditions for women across OECD countries. Read more about the #genderpaygap in Switzerland here 👉 buff.ly/L1X5G3k

thumb_up_off_alt15

chat_bubble_outline3

repeat4

shareShare

unusual_whales

@unusual_whales

8 months ago

Uh, oh. "Trump’s first 50 days: Stocks heading toward worst start to a presidential term since 2009," per MW

thumb_up_off_alt13,13K

chat_bubble_outline953

repeat1,1K

shareShare

Nature is Amazing ☘️

@amazlngnature

8 months ago

Man makes wheelchairs for crippled dogs! Thank you 🙏🏻

thumb_up_off_alt97,97K

chat_bubble_outline1,1K

repeat10,10K

shareShare

Kevin Meng

@mengk20

7 months ago

AI models are *not* solving problems the way we think using Docent, we find that Claude solves *broken* eval tasks - memorizing answers & hallucinating them! details in 🧵 we really need to look at our data harder, and it's time to rethink how we do evals...

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat107

shareShare

Jeff Dean

@jeffdean

7 months ago

We're using a ReLU to set tariffs?

thumb_up_off_alt5,5K

chat_bubble_outline86

repeat453

shareShare

Drew Pavlou 🇦🇺🇺🇦🇹🇼

@drewpavlou

7 months ago

The French overseas territory St. Pierre et Miquelon (population 5,800) now has the highest tariff rates in the world at 99% Their exports are valued at just $3.5 million dollars a year. My guess as to what happened here is that they likely export a tiny amount (like $100 k

thumb_up_off_alt39,39K

chat_bubble_outline448

repeat5,5K

shareShare

Andrew Tao

@drewtao

5 months ago

Vision Language Models can be amazing at document understanding. Please check out our Nano sized model. More to come!

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

National Wildlife Federation

@nwf

5 months ago

A Sunset to Remember ☀️🌊 “The paddleboarder portrays the peaceful coexistence of people and wildlife,” as captured at sunset in August 2020. 🥇 People in Nature | 📷 Renee Capozzola 2024 National Wildlife Photo Contest Winners 📲: ow.ly/k1FK50UAyEr

thumb_up_off_alt20

chat_bubble_outline0

repeat8

shareShare

Demis Hassabis

@demishassabis

3 months ago

Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to Thang Luong and the team! deepmind.google/discover/blog/…

thumb_up_off_alt6,6K

chat_bubble_outline199

repeat765

shareShare

Grant Sanderson

@3blue1brown

3 months ago

New video on the details of diffusion models: youtu.be/iv-5mZ_9CPY Produced by Welch Labs, this is the first in a small series of 3b1b this summer. I enjoyed providing editorial feedback throughout the last several months, and couldn't be happier with the result.

thumb_up_off_alt2,2K

chat_bubble_outline33

repeat403

shareShare

Syeda Nahida Akter

@snat02792153

a month ago

Most LLMs learn to think only after pretraining—via SFT or RL. But what if they could learn to think during it? 🤔 Introducing RLP: Reinforcement Learning Pre-training—a verifier-free objective that teaches models to “think before predicting.” 🔥 Result: Massive reasoning

thumb_up_off_alt260

chat_bubble_outline7

repeat43

shareShare