Mario Tormo (@mt0rm0) Twitter Tweets • TwiCopy

Deedy

6 months ago

DeepSeek just dropped the single best end-to-end paper on large model training. It covers — Software (MLA, training in FP8, DeepEP, LogFMT) — Hardware (Multi-Rail Fat Tree, Ethernet RoCE switches) — Mix (IBGDA, 3FS filesystem) DeepSeek's engineering depth is insane. Must read.

thumb_up_off_alt4,4K

chat_bubble_outline42

repeat708

shareShare

Kevin Patrick Murphy

@sirbayes

6 months ago

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy! arxiv.org/abs/2412.05265

thumb_up_off_alt2,2K

chat_bubble_outline23

repeat445

shareShare

Ethan Mollick

@emollick

5 months ago

Huh. Looks like Plato was right. A new paper shows all language models converge on the same "universal geometry" of meaning. Researchers can translate between ANY model's embeddings without seeing the original text. Implications for philosophy and vector databases alike.

thumb_up_off_alt13,13K

chat_bubble_outline402

repeat1,1K

shareShare

Shubham Saboo

@saboo_shubham_

5 months ago

This AI Agent can think, code, reason and browse in a single loop just like humans. Outperforms other AI Agent frameworks like Manus AI, Genspark AI, and OpenAI Deep Research. And it's 100% Opensource.

thumb_up_off_alt1,1K

chat_bubble_outline42

repeat188

shareShare

elvis

@omarsar0

5 months ago

New Lens on RAG Systems RAG systems are more brittle than you think, even when provided sufficient context. Great work from Google and collaborators. Good tips for devs included. Here are my notes:

thumb_up_off_alt1,1K

chat_bubble_outline33

repeat234

shareShare

The AI Timeline

@theaitimeline

5 months ago

🚨This week's top AI/ML research papers: - Spurious Rewards - FLUX.1 Kontext - Learning to Reason without External Rewards - Reasoning LLMs are Wandering Solution Explorers - VLM-3R - Silence is Not Consensus - Beyond Markovian - The Entropy Mechanism of RL for Reasoning LMs -

thumb_up_off_alt697

chat_bubble_outline8

repeat74

shareShare

Alfredo Canziani

@alfcnz

5 months ago

Releasing the Energy-Book 🔋 from its first appendix's chapter, where I explain how I create my figures. 🎨 Feel free to report errors via the issues' tracker, contribute to the exercises, and show me what you can draw, via the discussion section. 🥳 github.com/Atcold/Energy-…

thumb_up_off_alt558

chat_bubble_outline13

repeat79

shareShare

Javi Lopez ⛩️

@javilopen

5 months ago

🔥 Midjourney video is almost here... And it has, somehow, that incredible artistic aesthetic that MJ is famous for. Yes, these are all MJ video generations!!! 🧵👇

thumb_up_off_alt3,3K

chat_bubble_outline157

repeat257

shareShare

Manuel Faysse

@manuelfaysse

4 months ago

🚨Should We Still Pretrain Encoders with Masked Language Modeling? We have recently seen massively trained causal decoders take the lead in embedding benchmarks, surpassing encoders w/ bidirectional attention. We revisit whether Bert-style encoders are a thing of the past? (1/N)

thumb_up_off_alt303

chat_bubble_outline7

repeat38

shareShare

Eugene Yan

@eugeneyan

4 months ago

How do you build an LLM-evaluator / LLM-as-Judge? The book for "AI Evals for PMs and Engineers" has a chapter devoted to it (35% discount: maven.com/parlance-labs/…) First, we need to define the right metrics. For example, we can start by listing the failure modes from our error

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat97

shareShare

Diario Red

@diario_red_

4 months ago

🗣️ Manuel Vallecillo Galeano: un agente de la autoridad que trafica con odio ✍🏼Romàn🔻 diario-red.com/articulo/espan…

thumb_up_off_alt2,2K

chat_bubble_outline169

repeat1,1K

shareShare

Alfredo Canziani

@alfcnz

4 months ago

My NYU Center for Data Science colleague, Carlos Fernandez-Granda, released the 700-page textbook «Probability and Statistics for Data Science», where he condenses 10 years of teaching experience at @NYUniversity. 200 exercises, 102 notebooks, 115 videos! 🥳🥳🥳 ps4ds.net

My <a href="/NYUDataScience/">NYU Center for Data Science</a> colleague, Carlos Fernandez-Granda, released the 700-page textbook «Probability and Statistics for Data Science», where he condenses 10 years of teaching experience at @NYUniversity.
200 exercises, 102 notebooks, 115 videos! 🥳🥳🥳
ps4ds.net

thumb_up_off_alt607

chat_bubble_outline11

repeat124

shareShare

Gary Marcus

@garymarcus

3 months ago

My work here is truly done. Nobody with intellectual integrity can still believe that pure scaling will get us to AGI. GPT-5 may be a moderate quantitative improvement (and it may be cheaper) but it still fails in all the same qualitative ways as its predecessors, on chess, on

thumb_up_off_alt1,1K

chat_bubble_outline200

repeat234

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

Stanford Deep Learning for Computer Vision taught by Professor Fei-Fei Li (Fei-Fei Li) and Assistant Professor Ehsan Adeli. Such an enjoyable YT series. (link in comment)

thumb_up_off_alt3,3K

chat_bubble_outline26

repeat491

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

The paper turns LLM role play into a retrieval task so the agent stays in character. When the researchers tested their system against "jailbreak" attempts (where someone tries to push the model out of its assigned role), the model relied more heavily on its reference examples

thumb_up_off_alt11

chat_bubble_outline0

repeat3

shareShare

Kevin Weil 🇺🇸

@kevinweil

2 months ago

I want to frame this whole article.

thumb_up_off_alt5,5K

chat_bubble_outline75

repeat377

shareShare

FacFisicaUV

@facfisicauv

a month ago

Lamentem comunicar-vos el traspàs del professor José Bernabeu Alberola. En nom de tota la comunitat de la Facultat de Física volem transmetre el nostre més sentit condol als seus familiars, amics i col·laboradors, i acompanyar-los en tan difícils moments. Descanse en pau.

thumb_up_off_alt35

chat_bubble_outline4

repeat17

shareShare

Charly Wargnier

@datachaz

a month ago

Microsoft just killed the GPU mafia! 🤯 They've open-sourced bitnet.cpp, a blazing-fast 1-bit LLM inference framework optimized for CPUs. This is a major step forward for running large models locally, without expensive GPUs or cloud costs. Demo app + repo + paper in 🧵 ↓