Victoria X Lin (@victorialinml) Twitter Tweets • TwiCopy

Rulin Shao

9 months ago

New features added to MassiveDS-pipe to make it painless to build and serve trillion-token datastore: 1. Distributed API serving (<30ms latency); 2. Efficient indices: IVF-Flat, IVF-PQ; 3. Memory-free fast passage loading. It has been adopted by AI2 OpenScholar and Meta EWE 🥳

thumb_up_off_alt132

chat_bubble_outline4

repeat25

shareShare

Soumith Chintala

@soumithchintala

9 months ago

we've been working on democratizing fast kernel writing on the PyTorch team. try the challenge, either you or your AI!

thumb_up_off_alt353

chat_bubble_outline14

repeat27

shareShare

Xueguang Ma

@xueguang_ma

9 months ago

Introducing DRAMA🎭: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers. We propose to train a smaller dense retriever using a pruned LLM as the backbone, fine-tuned with diverse LLM data augmentations. With single-stage training, DRAMA achieves strong

thumb_up_off_alt77

chat_bubble_outline1

repeat21

shareShare

asiyah ♡

@pinkvirtu

8 months ago

and yet david burdeny still did it better in 2007 with a camera

thumb_up_off_alt618,618K

chat_bubble_outline245

repeat47,47K

shareShare

Aston Zhang

@astonzhangaz

7 months ago

Our Llama 4’s industry leading 10M+ multimodal context length (20+ hours of video) has been a wild ride. The iRoPE architecture I’d been working on helped a bit with the long-term infinite context goal toward AGI. Huge thanks to my incredible teammates! 🚀Llama 4 Scout 🔹17B

thumb_up_off_alt1,1K

chat_bubble_outline78

repeat137

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

7 months ago

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

thumb_up_off_alt5,5K

chat_bubble_outline323

repeat959

shareShare

Weijia Shi

@weijiashi2

7 months ago

Our previous work showed that 𝐜𝐫𝐞𝐚𝐭𝐢𝐧𝐠 𝐯𝐢𝐬𝐮𝐚𝐥 𝐜𝐡𝐚𝐢𝐧‑𝐨𝐟‑𝐭𝐡𝐨𝐮𝐠𝐡𝐭𝐬 𝐯𝐢𝐚 𝐭𝐨𝐨𝐥 𝐮𝐬𝐞 significantly boosts GPT‑4o’s visual reasoning performance. Excited to see this idea incorporated into OpenAI’s o3 and o4‑mini models (openai.com/index/thinking…).

thumb_up_off_alt258

chat_bubble_outline3

repeat40

shareShare

hardmaru

@hardmaru

7 months ago

We should host more top ML conferences (ICLR, ICML, NeurIPS) in Asia

thumb_up_off_alt687

chat_bubble_outline35

repeat61

shareShare

Rulin Shao

@rulinshao

6 months ago

Meet ReasonIR-8B✨the first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA🔥

thumb_up_off_alt342

chat_bubble_outline5

repeat62

shareShare

Weixin Liang

@liang_weixin

6 months ago

🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced! 📌 GitHub repo: github.com/facebookresear… 📄 Paper: arxiv.org/abs/2411.04996 How can we reduce pretraining costs for

thumb_up_off_alt435

chat_bubble_outline3

repeat84

shareShare

rohan anil

@_arohan_

6 months ago

This is really cool work! I wonder if we could generalize even better by introducing modality as feature embedding to the router instead. That is router gets privileged information.

thumb_up_off_alt65

chat_bubble_outline3

repeat4

shareShare

Kevin Patrick Murphy

@sirbayes

6 months ago

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy! arxiv.org/abs/2412.05265

thumb_up_off_alt2,2K

chat_bubble_outline23

repeat445

shareShare

Kevin Chih-Yao Ma

@chihyaoma

6 months ago

ByteDance | Seed has been consistently impressive over the past few months, publishing some truly insightful papers. BAGEL is one of them. I learned a lot from reading it. A few key takeaways: - Embedded "thinking" directly into native media generation, proving its effectiveness

thumb_up_off_alt10

chat_bubble_outline3

repeat2

shareShare

jack morris

@jxmnop

6 months ago

the scale of data collection in the AI labs pales in comparison to 2010s google it’s mostly web scraping and data-labeling. compare that to diligently photographing streets of every country, mapping earth via satellite, scanning every book known to man.. now *that* was ambitious

thumb_up_off_alt3,3K

chat_bubble_outline39

repeat149

shareShare

Hanna Hajishirzi

@hannahajishirzi

6 months ago

Surprising result! Spurious rewards -- even random rewards -- boost RLVR performance on Qwen models — but not on OLMo or others. The paper explores some hypotheses, but it’s still unclear why. Key takeaway: always validate across base models when probing reasoning with RLVR.

thumb_up_off_alt37

chat_bubble_outline2

repeat4

shareShare

Victoria X Lin

@victorialinml

5 months ago

We don't often see prep thread for paper announcement on X, but this mini crash course on the information capacity of LLM is well worth checking out

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Victoria X Lin

@victorialinml

5 months ago

Let's talk about Mixture-of-Transformers (MoT) and heterogeneous omni-model training. 1. Inspired by prior architectures consisting of modality-specific parameters—such as Flamingo, CogVLM, BEIT-3, and MoMA—MoT (arxiv.org/abs/2411.04996) pushes this idea further by using

thumb_up_off_alt129

chat_bubble_outline1

repeat4

shareShare

Lili Yu (Neurips24)

@liliyu_lili

5 months ago

Victoria X Lin splitting transformer parameters by⭐Understanding (X→text) vs. 📷Generation (X→image) functionality. We already did that in LMFusion

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Infini-AI-Lab

@infiniailab

5 months ago

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

thumb_up_off_alt207

chat_bubble_outline2

repeat76

shareShare

Saining Xie

@sainingxie

5 months ago

So this is not a benchmark for software engineering agents. It’s meant to test core reasoning and intelligence through coding—backed by 71 pages of deep analysis from some of the best competitive programmers out there. This effort was carried out by students across multiple

thumb_up_off_alt266

chat_bubble_outline12

repeat40

shareShare