Victoria X Lin (@victorialinml) 's Twitter Profile
Victoria X Lin

@victorialinml

Research Scientist @AIatMeta | MoMa๐Ÿ–ผ โ€ข RA-DIT๐Ÿ”โ€ข OPT-IML
Ex: @SFResearch โ€ข PhD @uwcse
๐Ÿ“œ threads.net/@v.linspiration ๐ŸŒด Bay Area

ID: 225054090

linkhttps://victorialin.org calendar_today10-12-2010 15:16:16

1,1K Tweet

3,3K Followers

884 Following

Rulin Shao (@rulinshao) 's Twitter Profile Photo

New features added to MassiveDS-pipe to make it painless to build and serve trillion-token datastore: 1. Distributed API serving (<30ms latency); 2. Efficient indices: IVF-Flat, IVF-PQ; 3. Memory-free fast passage loading. It has been adopted by AI2 OpenScholar and Meta EWE ๐Ÿฅณ

New features added to MassiveDS-pipe to make it painless to build and serve trillion-token datastore:
1.  Distributed API serving (&lt;30ms latency);
2. Efficient indices: IVF-Flat, IVF-PQ;
3. Memory-free fast passage loading.
It has been adopted by AI2 OpenScholar and Meta EWE ๐Ÿฅณ
Xueguang Ma (@xueguang_ma) 's Twitter Profile Photo

Introducing DRAMA๐ŸŽญ: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers. We propose to train a smaller dense retriever using a pruned LLM as the backbone, fine-tuned with diverse LLM data augmentations. With single-stage training, DRAMA achieves strong

Introducing DRAMA๐ŸŽญ: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers.

We propose to train a smaller dense retriever using a pruned LLM as the backbone, fine-tuned with diverse LLM data augmentations.

With single-stage training, DRAMA achieves strong
Aston Zhang (@astonzhangaz) 's Twitter Profile Photo

Our Llama 4โ€™s industry leading 10M+ multimodal context length (20+ hours of video) has been a wild ride. The iRoPE architecture Iโ€™d been working on helped a bit with the long-term infinite context goal toward AGI. Huge thanks to my incredible teammates! ๐Ÿš€Llama 4 Scout ๐Ÿ”น17B

Our Llama 4โ€™s industry leading 10M+ multimodal context length (20+ hours of video) has been a wild ride. The iRoPE architecture Iโ€™d been working on helped a bit with the long-term infinite context goal toward AGI. Huge thanks to my incredible teammates!

๐Ÿš€Llama 4 Scout
๐Ÿ”น17B
Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

Introducing our first set of Llama 4 models! Weโ€™ve been hard at work doing a complete re-design of the Llama series. Iโ€™m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

Introducing our first set of Llama 4 models!

Weโ€™ve been hard at work doing a complete re-design of the Llama series. Iโ€™m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Our previous work showed that ๐œ๐ซ๐ž๐š๐ญ๐ข๐ง๐  ๐ฏ๐ข๐ฌ๐ฎ๐š๐ฅ ๐œ๐ก๐š๐ข๐งโ€‘๐จ๐Ÿโ€‘๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ๐ฌ ๐ฏ๐ข๐š ๐ญ๐จ๐จ๐ฅ ๐ฎ๐ฌ๐ž significantly boosts GPTโ€‘4oโ€™s visual reasoning performance. Excited to see this idea incorporated into OpenAIโ€™s o3 and o4โ€‘mini models (openai.com/index/thinkingโ€ฆ).

Rulin Shao (@rulinshao) 's Twitter Profile Photo

Meet ReasonIR-8Bโœจthe first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA๐Ÿ”ฅ

Meet ReasonIR-8Bโœจthe first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA๐Ÿ”ฅ
Weixin Liang (@liang_weixin) 's Twitter Profile Photo

๐ŸŽ‰ Excited to share: "๐Œ๐ข๐ฑ๐ญ๐ฎ๐ซ๐ž-๐จ๐Ÿ-๐“๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ๐ฌ (๐Œ๐จ๐“)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced! ๐Ÿ“Œ GitHub repo: github.com/facebookresearโ€ฆ ๐Ÿ“„ Paper: arxiv.org/abs/2411.04996 How can we reduce pretraining costs for

๐ŸŽ‰ Excited to share: "๐Œ๐ข๐ฑ๐ญ๐ฎ๐ซ๐ž-๐จ๐Ÿ-๐“๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ๐ฌ (๐Œ๐จ๐“)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced!

๐Ÿ“Œ GitHub repo: github.com/facebookresearโ€ฆ
๐Ÿ“„ Paper: arxiv.org/abs/2411.04996

How can we reduce pretraining costs for
rohan anil (@_arohan_) 's Twitter Profile Photo

This is really cool work! I wonder if we could generalize even better by introducing modality as feature embedding to the router instead. That is router gets privileged information.

Kevin Patrick Murphy (@sirbayes) 's Twitter Profile Photo

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy! arxiv.org/abs/2412.05265

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy!
arxiv.org/abs/2412.05265
Kevin Chih-Yao Ma (@chihyaoma) 's Twitter Profile Photo

ByteDance | Seed has been consistently impressive over the past few months, publishing some truly insightful papers. BAGEL is one of them. I learned a lot from reading it. A few key takeaways: - Embedded "thinking" directly into native media generation, proving its effectiveness

jack morris (@jxmnop) 's Twitter Profile Photo

the scale of data collection in the AI labs pales in comparison to 2010s google itโ€™s mostly web scraping and data-labeling. compare that to diligently photographing streets of every country, mapping earth via satellite, scanning every book known to man.. now *that* was ambitious

Hanna Hajishirzi (@hannahajishirzi) 's Twitter Profile Photo

Surprising result! Spurious rewards -- even random rewards -- boost RLVR performance on Qwen models โ€” but not on OLMo or others. The paper explores some hypotheses, but itโ€™s still unclear why. Key takeaway: always validate across base models when probing reasoning with RLVR.

Victoria X Lin (@victorialinml) 's Twitter Profile Photo

We don't often see prep thread for paper announcement on X, but this mini crash course on the information capacity of LLM is well worth checking out

Victoria X Lin (@victorialinml) 's Twitter Profile Photo

Let's talk about Mixture-of-Transformers (MoT) and heterogeneous omni-model training. 1. Inspired by prior architectures consisting of modality-specific parametersโ€”such as Flamingo, CogVLM, BEIT-3, and MoMAโ€”MoT (arxiv.org/abs/2411.04996) pushes this idea further by using

Lili Yu (Neurips24) (@liliyu_lili) 's Twitter Profile Photo

Victoria X Lin splitting transformer parameters byโญUnderstanding (Xโ†’text) vs. ๐Ÿ“ทGeneration (Xโ†’image) functionality. We already did that in LMFusion

Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

๐Ÿ”ฅ We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. ๐Ÿš€ Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% ๐ŸŒ Website: multiverse4fm.github.io ๐Ÿงต 1/n

Saining Xie (@sainingxie) 's Twitter Profile Photo

So this is not a benchmark for software engineering agents. Itโ€™s meant to test core reasoning and intelligence through codingโ€”backed by 71 pages of deep analysis from some of the best competitive programmers out there. This effort was carried out by students across multiple