Bhargavi Paranjape (@bvp22294) Twitter Tweets • TwiCopy

Aran Komatsuzaki

3 years ago

ART: Automatic multi-step reasoning and tool-use for large language models Achieves a substantial improvement over few-shot prompting and automatic CoT on unseen tasks in the BigBench and MMLU benchmarks. arxiv.org/abs/2303.09014

thumb_up_off_alt179

chat_bubble_outline0

repeat36

shareShare

John Nay

@johnjnay

3 years ago

Whose Opinions Do LLMs Reflect? -Eval of alignment of LM opinions w/ 60 US demographics -Major mismatch of LLM views & groups (esp. 65+ & widowed) -RLHF'ed LLMs converge to modal views of liberals & moderates RL causes caricatures (99% Biden approval) arxiv.org/abs/2303.17548

thumb_up_off_alt214

chat_bubble_outline8

repeat36

shareShare

Alisa Liu

@alisawuffles

3 years ago

Ambiguity is an intrinsic feature of language; as LMs are increasingly deployed to interface with human communication, handling ambiguity is critical🧐. So, we collect 🎉AmbiEnt, a benchmark with diverse ambiguities, challenging even GPT-4 🤖. arxiv.org/abs/2304.14399 🧵

thumb_up_off_alt450

chat_bubble_outline7

repeat103

shareShare

Terra Blevins

@terrablvns

3 years ago

New paper alert!! ✨ Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models (PLMs) ✨ We evaluate how well PLMs translate words in context and then leverage this prompting setup to perform zero-shot WSD on 18 languages! 1/n

thumb_up_off_alt63

chat_bubble_outline1

repeat25

shareShare

Gabriel Ilharco

@gabriel_ilharco

3 years ago

Today we are releasing a CLIP ViT-L/14 model with 79.2% zero-shot accuracy on ImageNet. Our model outperforms OpenAI's CLIP by a large margin, and outperforms even bigger models (ViT-g/14) trained on LAION-2B Check it out at huggingface.co/laion/CLIP-ViT…!

thumb_up_off_alt719

chat_bubble_outline16

repeat132

shareShare

Mandar Joshi

@mandarjoshi_

3 years ago

Excited to present Pix2Act! An agent that can interact with GUIs using the same conceptual interface that humans commonly use — via pixel-based screenshots and generic keyboard and mouse actions -- arxiv.org/abs/2306.00245 (1/4)

thumb_up_off_alt308

chat_bubble_outline9

repeat56

shareShare

Tim Dettmers

@tim_dettmers

3 years ago

We present SpQR, which allows lossless LLM inference at 4.75 bits with a 15% speedup. You can run a 33B LLM on a single 24GB GPU fully lossless. SpQR works by isolating sensitive weights with higher precision and roughly doubles improvements from GPTQ: arxiv.org/abs/2306.03078🧵

thumb_up_off_alt1,1K

chat_bubble_outline36

repeat293

shareShare

Yizhong Wang

@yizhongwyz

3 years ago

🦙🐪🐫 So many instruction tuning datasets came out recently! How valuable are they, and how far are open models really from proprietary ones like ChatGPT? 🧐We did a systematic exploration, and built Tülu---a suite of LLaMa-tuned models up to 65B! 📜arxiv.org/abs/2306.04751

thumb_up_off_alt616

chat_bubble_outline11

repeat153

shareShare

Weijia Shi

@weijiashi2

2 years ago

Ever wondered which data black-box LLMs like GPT are pretrained on? 🤔 We build a benchmark WikiMIA and develop Min-K% Prob 🕵️, a method for detecting undisclosed pretraining data from LLMs (relying solely on output probs). Check out our project: swj0419.github.io/detect-pretrai… [1/n]

thumb_up_off_alt647

chat_bubble_outline16

repeat137

shareShare

Victor Zhong

@hllo_wrld

2 years ago

I am hiring strong PhD students in ML and NLP at the University of Waterloo to start in 2024. This is an excellent opportunity to be a part of a vibrant new NLP group w/ 5 professors. Please see more details here: victorzhong.com. Deadline is Dec 1!

thumb_up_off_alt411

chat_bubble_outline12

repeat92

shareShare

Mike Lewis

@ml_perception

2 years ago

Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come... ai.meta.com/blog/meta-llam…

thumb_up_off_alt503

chat_bubble_outline17

repeat97

shareShare

Bhargavi Paranjape

@bvp22294

2 years ago

Check out ✨Husky✨, Joongwon Kim's new work on open-source LM agents for multi-step reasoning + tool-use! 📄 Paper: arxiv.org/abs/2406.06469 📷 Code: github.com/agent-husky/Hu…

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

AI at Meta

@aiatmeta

2 years ago

Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context

thumb_up_off_alt5,5K

chat_bubble_outline271

repeat1,1K

shareShare

Gagan Bansal

@bansalg_

a year ago

Excited to finally release Magentic-One! The thing I love about this multi-agent team is that the same implementation achieves very strong performance across three challenging agentic benchmarks. If you are someone working on agentic systems, you know how challenging this can

thumb_up_off_alt40

chat_bubble_outline2

repeat6

shareShare

Will Held

@williambarrheld

a year ago

Balancing data across domains is key to training the best generalist LLMs! In my summer work @MetaAI, we introduce UtiliMax and MEDU, new methods to estimate data utility and optimize data mixes efficiently. HF Blog: huggingface.co/blog/WillHeld/… ArXiv: arxiv.org/abs/2501.11747

thumb_up_off_alt64

chat_bubble_outline3

repeat15

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

a year ago

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

thumb_up_off_alt5,5K

chat_bubble_outline323

repeat959

shareShare

Dieuwke Hupkes

@_dieuwke_

a year ago

So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!) arxiv.org/abs/2504.10356 Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design AI at Meta

thumb_up_off_alt49

chat_bubble_outline3

repeat10

shareShare

AI at Meta

@aiatmeta

a year ago

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image &

thumb_up_off_alt983

chat_bubble_outline56

repeat248

shareShare

Yen-Ju Lu

@yen_ju_lu

6 months ago

🚀 Introducing the Latent Speech-Text Transformer (LST) — a speech-text model that organizes speech tokens into latent patches for better text→speech transfer, enabling steeper scaling laws and more efficient multimodal training ⚡️ Paper 📄 arxiv.org/pdf/2510.06195

thumb_up_off_alt28

chat_bubble_outline7

repeat15

shareShare

Bhargavi Paranjape

@bvp22294

5 months ago

📢 PhD Students in GenAI/RL! Our team at FAIR is hiring a Research Intern for Summer 2026 to push the boundaries of multimodal multi-agent social interaction. Learn more and apply: metacareers.com/jobs/182171308…

thumb_up_off_alt315

chat_bubble_outline7

repeat48

shareShare