Bhargavi Paranjape (@bvp22294) 's Twitter Profile
Bhargavi Paranjape

@bvp22294

Research @ Meta

ID: 2268381841

linkhttps://bhargaviparanjape.github.io/ calendar_today30-12-2013 05:32:11

73 Tweet

604 Takipçi

498 Takip Edilen

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

ART: Automatic multi-step reasoning and tool-use for large language models Achieves a substantial improvement over few-shot prompting and automatic CoT on unseen tasks in the BigBench and MMLU benchmarks. arxiv.org/abs/2303.09014

ART: Automatic multi-step reasoning and tool-use for large language models

Achieves a substantial improvement over few-shot prompting and automatic CoT on unseen tasks in the BigBench and MMLU benchmarks.

arxiv.org/abs/2303.09014
John Nay (@johnjnay) 's Twitter Profile Photo

Whose Opinions Do LLMs Reflect? -Eval of alignment of LM opinions w/ 60 US demographics -Major mismatch of LLM views & groups (esp. 65+ & widowed) -RLHF'ed LLMs converge to modal views of liberals & moderates RL causes caricatures (99% Biden approval) arxiv.org/abs/2303.17548

Whose Opinions Do LLMs Reflect?

-Eval of alignment of LM opinions w/ 60 US demographics 
-Major mismatch of LLM views & groups 
(esp. 65+ & widowed)

-RLHF'ed LLMs converge to modal views of liberals & moderates
RL causes caricatures (99% Biden approval)

arxiv.org/abs/2303.17548
Alisa Liu (@alisawuffles) 's Twitter Profile Photo

Ambiguity is an intrinsic feature of language; as LMs are increasingly deployed to interface with human communication, handling ambiguity is critical🧐. So, we collect 🎉AmbiEnt, a benchmark with diverse ambiguities, challenging even GPT-4 🤖. arxiv.org/abs/2304.14399 🧵

Ambiguity is an intrinsic feature of language; as LMs are increasingly deployed to interface with human communication, handling ambiguity is critical🧐. So, we collect 🎉AmbiEnt, a benchmark with diverse ambiguities, challenging even GPT-4 🤖.

arxiv.org/abs/2304.14399 🧵
Terra Blevins (@terrablvns) 's Twitter Profile Photo

New paper alert!! ✨ Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models (PLMs) ✨ We evaluate how well PLMs translate words in context and then leverage this prompting setup to perform zero-shot WSD on 18 languages! 1/n

New paper alert!! ✨ Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models (PLMs) ✨ We evaluate how well PLMs translate words in context and then leverage this prompting setup to perform zero-shot WSD on 18 languages! 1/n
Gabriel Ilharco (@gabriel_ilharco) 's Twitter Profile Photo

Today we are releasing a CLIP ViT-L/14 model with 79.2% zero-shot accuracy on ImageNet. Our model outperforms OpenAI's CLIP by a large margin, and outperforms even bigger models (ViT-g/14) trained on LAION-2B Check it out at huggingface.co/laion/CLIP-ViT…!

Mandar Joshi (@mandarjoshi_) 's Twitter Profile Photo

Excited to present Pix2Act! An agent that can interact with GUIs using the same conceptual interface that humans commonly use — via pixel-based screenshots and generic keyboard and mouse actions -- arxiv.org/abs/2306.00245 (1/4)

Excited to present Pix2Act! An agent that can interact with GUIs using the same conceptual interface that humans commonly use  —  via  pixel-based screenshots and generic keyboard and mouse actions -- arxiv.org/abs/2306.00245 (1/4)
Tim Dettmers (@tim_dettmers) 's Twitter Profile Photo

We present SpQR, which allows lossless LLM inference at 4.75 bits with a 15% speedup. You can run a 33B LLM on a single 24GB GPU fully lossless. SpQR works by isolating sensitive weights with higher precision and roughly doubles improvements from GPTQ: arxiv.org/abs/2306.03078🧵

We present SpQR, which allows lossless LLM inference at 4.75 bits with a 15% speedup. You can run a 33B LLM on a single 24GB GPU fully lossless. SpQR works by isolating sensitive weights with higher precision and roughly doubles improvements from GPTQ: arxiv.org/abs/2306.03078🧵
Yizhong Wang (@yizhongwyz) 's Twitter Profile Photo

🦙🐪🐫 So many instruction tuning datasets came out recently! How valuable are they, and how far are open models really from proprietary ones like ChatGPT? 🧐We did a systematic exploration, and built Tülu---a suite of LLaMa-tuned models up to 65B! 📜arxiv.org/abs/2306.04751

🦙🐪🐫 So many instruction tuning datasets came out recently! How valuable are they, and how far are open models really from proprietary ones like ChatGPT?

🧐We did a systematic exploration, and built Tülu---a suite of LLaMa-tuned models up to 65B!

📜arxiv.org/abs/2306.04751
Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Ever wondered which data black-box LLMs like GPT are pretrained on? 🤔 We build a benchmark WikiMIA and develop Min-K% Prob 🕵️, a method for detecting undisclosed pretraining data from LLMs (relying solely on output probs). Check out our project: swj0419.github.io/detect-pretrai… [1/n]

Ever wondered which data black-box LLMs like GPT are pretrained on? 🤔

We build a benchmark WikiMIA and develop Min-K% Prob 🕵️, a method for detecting undisclosed pretraining data from LLMs (relying solely on output probs).

Check out our project: swj0419.github.io/detect-pretrai…
[1/n]
Victor Zhong (@hllo_wrld) 's Twitter Profile Photo

I am hiring strong PhD students in ML and NLP at the University of Waterloo to start in 2024. This is an excellent opportunity to be a part of a vibrant new NLP group w/ 5 professors. Please see more details here: victorzhong.com. Deadline is Dec 1!

Mike Lewis (@ml_perception) 's Twitter Profile Photo

Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come... ai.meta.com/blog/meta-llam…

Bhargavi Paranjape (@bvp22294) 's Twitter Profile Photo

Check out ✨Husky✨, Joongwon Kim's new work on open-source LM agents for multi-step reasoning + tool-use! 📄 Paper: arxiv.org/abs/2406.06469 📷 Code: github.com/agent-husky/Hu…

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context

Gagan Bansal (@bansalg_) 's Twitter Profile Photo

Excited to finally release Magentic-One! The thing I love about this multi-agent team is that the same implementation achieves very strong performance across three challenging agentic benchmarks. If you are someone working on agentic systems, you know how challenging this can

Excited to finally release Magentic-One!

The thing I love about this multi-agent team is that the same implementation achieves very strong performance across three challenging agentic benchmarks. If you are someone working on agentic systems, you know how challenging this can
Will Held (@williambarrheld) 's Twitter Profile Photo

Balancing data across domains is key to training the best generalist LLMs! In my summer work @MetaAI, we introduce UtiliMax and MEDU, new methods to estimate data utility and optimize data mixes efficiently. HF Blog: huggingface.co/blog/WillHeld/… ArXiv: arxiv.org/abs/2501.11747

Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

Introducing our first set of Llama 4 models!

We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
Dieuwke Hupkes (@_dieuwke_) 's Twitter Profile Photo

So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!) arxiv.org/abs/2504.10356 Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design AI at Meta

So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!)

arxiv.org/abs/2504.10356

Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design

<a href="/metaai/">AI at Meta</a>
AI at Meta (@aiatmeta) 's Twitter Profile Photo

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image &

Yen-Ju Lu (@yen_ju_lu) 's Twitter Profile Photo

🚀 Introducing the Latent Speech-Text Transformer (LST) — a speech-text model that organizes speech tokens into latent patches for better text→speech transfer, enabling steeper scaling laws and more efficient multimodal training ⚡️ Paper 📄 arxiv.org/pdf/2510.06195

🚀 Introducing the Latent Speech-Text Transformer (LST) — a speech-text model that organizes speech tokens into latent patches for better text→speech transfer, enabling steeper scaling laws and more efficient multimodal training ⚡️

Paper 📄 arxiv.org/pdf/2510.06195
Bhargavi Paranjape (@bvp22294) 's Twitter Profile Photo

📢 PhD Students in GenAI/RL! Our team at FAIR is hiring a Research Intern for Summer 2026 to push the boundaries of multimodal multi-agent social interaction. Learn more and apply: metacareers.com/jobs/182171308…