Victoria X Lin (@victorialinml) 's Twitter Profile
Victoria X Lin

@victorialinml

Research Scientist @AIatMeta | MoMa🖼 • RA-DIT🔍• OPT-IML
Ex: @SFResearch • PhD @uwcse
📜 threads.net/@v.linspiration 🌴 Bay Area

ID: 225054090

linkhttps://victorialin.org calendar_today10-12-2010 15:16:16

1,1K Tweet

3,3K Takipçi

884 Takip Edilen

Rulin Shao (@rulinshao) 's Twitter Profile Photo

New features added to MassiveDS-pipe to make it painless to build and serve trillion-token datastore: 1. Distributed API serving (<30ms latency); 2. Efficient indices: IVF-Flat, IVF-PQ; 3. Memory-free fast passage loading. It has been adopted by AI2 OpenScholar and Meta EWE 🥳

New features added to MassiveDS-pipe to make it painless to build and serve trillion-token datastore:
1.  Distributed API serving (&lt;30ms latency);
2. Efficient indices: IVF-Flat, IVF-PQ;
3. Memory-free fast passage loading.
It has been adopted by AI2 OpenScholar and Meta EWE 🥳
Xueguang Ma (@xueguang_ma) 's Twitter Profile Photo

Introducing DRAMA🎭: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers. We propose to train a smaller dense retriever using a pruned LLM as the backbone, fine-tuned with diverse LLM data augmentations. With single-stage training, DRAMA achieves strong

Introducing DRAMA🎭: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers.

We propose to train a smaller dense retriever using a pruned LLM as the backbone, fine-tuned with diverse LLM data augmentations.

With single-stage training, DRAMA achieves strong
Aston Zhang (@astonzhangaz) 's Twitter Profile Photo

Our Llama 4’s industry leading 10M+ multimodal context length (20+ hours of video) has been a wild ride. The iRoPE architecture I’d been working on helped a bit with the long-term infinite context goal toward AGI. Huge thanks to my incredible teammates! 🚀Llama 4 Scout 🔹17B

Our Llama 4’s industry leading 10M+ multimodal context length (20+ hours of video) has been a wild ride. The iRoPE architecture I’d been working on helped a bit with the long-term infinite context goal toward AGI. Huge thanks to my incredible teammates!

🚀Llama 4 Scout
🔹17B
Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

Introducing our first set of Llama 4 models!

We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Our previous work showed that 𝐜𝐫𝐞𝐚𝐭𝐢𝐧𝐠 𝐯𝐢𝐬𝐮𝐚𝐥 𝐜𝐡𝐚𝐢𝐧‑𝐨𝐟‑𝐭𝐡𝐨𝐮𝐠𝐡𝐭𝐬 𝐯𝐢𝐚 𝐭𝐨𝐨𝐥 𝐮𝐬𝐞 significantly boosts GPT‑4o’s visual reasoning performance. Excited to see this idea incorporated into OpenAI’s o3 and o4‑mini models (openai.com/index/thinking…).

Rulin Shao (@rulinshao) 's Twitter Profile Photo

Meet ReasonIR-8B✨the first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA🔥

Meet ReasonIR-8B✨the first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA🔥
Weixin Liang (@liang_weixin) 's Twitter Profile Photo

🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced! 📌 GitHub repo: github.com/facebookresear… 📄 Paper: arxiv.org/abs/2411.04996 How can we reduce pretraining costs for

🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced!

📌 GitHub repo: github.com/facebookresear…
📄 Paper: arxiv.org/abs/2411.04996

How can we reduce pretraining costs for
rohan anil (@_arohan_) 's Twitter Profile Photo

This is really cool work! I wonder if we could generalize even better by introducing modality as feature embedding to the router instead. That is router gets privileged information.

Kevin Patrick Murphy (@sirbayes) 's Twitter Profile Photo

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy! arxiv.org/abs/2412.05265

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy!
arxiv.org/abs/2412.05265
Kevin Chih-Yao Ma (@chihyaoma) 's Twitter Profile Photo

ByteDance | Seed has been consistently impressive over the past few months, publishing some truly insightful papers. BAGEL is one of them. I learned a lot from reading it. A few key takeaways: - Embedded "thinking" directly into native media generation, proving its effectiveness

jack morris (@jxmnop) 's Twitter Profile Photo

the scale of data collection in the AI labs pales in comparison to 2010s google it’s mostly web scraping and data-labeling. compare that to diligently photographing streets of every country, mapping earth via satellite, scanning every book known to man.. now *that* was ambitious

Hanna Hajishirzi (@hannahajishirzi) 's Twitter Profile Photo

Surprising result! Spurious rewards -- even random rewards -- boost RLVR performance on Qwen models — but not on OLMo or others. The paper explores some hypotheses, but it’s still unclear why. Key takeaway: always validate across base models when probing reasoning with RLVR.

Victoria X Lin (@victorialinml) 's Twitter Profile Photo

We don't often see prep thread for paper announcement on X, but this mini crash course on the information capacity of LLM is well worth checking out

Victoria X Lin (@victorialinml) 's Twitter Profile Photo

Let's talk about Mixture-of-Transformers (MoT) and heterogeneous omni-model training. 1. Inspired by prior architectures consisting of modality-specific parameters—such as Flamingo, CogVLM, BEIT-3, and MoMA—MoT (arxiv.org/abs/2411.04996) pushes this idea further by using

Lili Yu (Neurips24) (@liliyu_lili) 's Twitter Profile Photo

Victoria X Lin splitting transformer parameters by⭐Understanding (X→text) vs. 📷Generation (X→image) functionality. We already did that in LMFusion

Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

Saining Xie (@sainingxie) 's Twitter Profile Photo

So this is not a benchmark for software engineering agents. It’s meant to test core reasoning and intelligence through coding—backed by 71 pages of deep analysis from some of the best competitive programmers out there. This effort was carried out by students across multiple