Shachar Don-Yehiya (@shachar_don) Twitter Tweets • TwiCopy

Michael Hassid

a year ago

Which is better, running a 70B model once, or a 7B model 10 times? The answer might be surprising! Presenting our new Conference on Language Modeling paper: "The Larger the Better? Improved LLM Code-Generation via Budget Reallocation" arxiv.org/abs/2404.00725 1/n

Which is better, running a 70B model once, or a 7B model 10 times? The answer might be surprising!

Presenting our new <a href="/COLM_conf/">Conference on Language Modeling</a> paper: "The Larger the Better? Improved LLM Code-Generation via Budget Reallocation"

arxiv.org/abs/2404.00725

1/n

thumb_up_off_alt208

chat_bubble_outline6

repeat44

shareShare

nitzan guetta

@nitzanguetta

a year ago

Can you answer these riddles? We are happy to present our new paper “Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models”. Paper: Website: visual-riddles.github.io 🧵

thumb_up_off_alt35

chat_bubble_outline1

repeat14

shareShare

Gili Lior

@gililior

a year ago

Exciting news! I'll present my poster at #ACL2024 about unsupervised document structure extraction tomorrow (Aug. 12th) at 12:45 PM 🕒 Come say hi and let's chat over the paper! arxiv.org/pdf/2402.13906 More details below ⬇️ w/ Gabriel Stanovsky (((ل()(ل() 'yoav))))👾 Ai2 HUJI NLP

thumb_up_off_alt60

chat_bubble_outline2

repeat15

shareShare

Daniel Marczak

@danie1marczak

a year ago

It would be nice not to worry about catastrophic forgetting while continually training your models, wouldn’t it? Check out our new #ECCV2024 paper to see how you can do that. “MagMax: Leveraging Model Merging for Seamless Continual Learning” 📜arxiv.org/pdf/2407.06322

thumb_up_off_alt77

chat_bubble_outline2

repeat34

shareShare

Leshem Choshen C U @ ICLR 🤖🤗

@lchoshen

a year ago

Human feedback is critical for aligning LLMs, so why don’t we collect it in the open ecosystem?🧐 We (15 orgs) gathered the key issues and next steps. Envisioning a community-driven feedback platform, like Wikipedia alphaxiv.org/abs/2408.16961 🧵

thumb_up_off_alt192

chat_bubble_outline2

repeat50

shareShare

Human Feedback Foundation

@humanfeedbackio

a year ago

This is the one paper to read if you care about human input into AI systems. You know who else agrees we have to collect and leverage human feedback in open source AI? That's right Yann LeCun and definitely Soumith Chintala

thumb_up_off_alt15

chat_bubble_outline1

repeat5

shareShare

Shachar Don-Yehiya

@shachar_don

a year ago

Human feedback is a valuable resource for model development and research 🤖💬 While for-profit companies collect user data through their APIs to improve their own models, the open-source community lags behind What's missing?🧐 We tackle it here⬇️

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

idan shenfeld

@idanshenfeld

a year ago

Human feedback is critical for aligning LLMs, but it’s often locked away in proprietary datasets. In our new paper, we explore scalable methods to collect and share open-source human feedback data for LLM alignment. Let’s democratize this process and push the field forward! 🚀👇

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Uri Berger

@uriberger88

a year ago

1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ Gabriel Stanovsky Omri Abend Lea Frermann >

$1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ <a href="/GabiStanovsky/">Gabriel Stanovsky</a> <a href="/AbendOmri/">Omri Abend</a> <a href="/leafrermann/">Lea Frermann</a> >$

thumb_up_off_alt19

chat_bubble_outline3

repeat10

shareShare

AI Tinkerers

@aitinkerers

a year ago

🚀 AI Tinkerers & Human Feedback Foundation present Paper Club! Premiere: "Learning from Naturally Occurring Feedback" Sept 17 at 12pm ET. #PaperClub #BuildingAI Shachar Don-Yehiya Leshem (Legend) Choshen 🤖🤗 @ACL Omri Abend Elena Yunusov Human Feedback Foundation - link in first comment

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare

Yotam Perlitz 👾

@yotamperlitz

a year ago

✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…

thumb_up_off_alt16

chat_bubble_outline1

repeat5

shareShare

Eitan Wagner

@eitanwagner

a year ago

Language Models output probabilities for tokens. So probabilities of spans must follow a valid joint, right? Check out our new #EMNLP2024 paper -- "CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models". w/ Yuli Slavutsky and Omri Abend (1/5)

thumb_up_off_alt38

chat_bubble_outline4

repeat7

shareShare

Prateek Yadav

@prateeky2806

a year ago

Ever wondered if model merging works at scale? Maybe the benefits wear off for bigger models? Maybe you considered using model merging for post-training of your large model but not sure if it generalizes well? cc: Google AI Google DeepMind UNC NLP 🧵👇 Excited to announce my

thumb_up_off_alt372

chat_bubble_outline6

repeat84

shareShare

Esther Shizgal

@esthershizgal

10 months ago

1/n Excited to present our work at #EMNLP2024’s Workshop on Narrative Understanding! Testimonies reveal personal stories, making them ideal for studying character development. We use NLP to analyze character arcs in 1,000 Holocaust testimonies. Omri Abend Renana Keydar Eitan Wagner

thumb_up_off_alt55

chat_bubble_outline1

repeat25

shareShare

Noam Dahan

@dahan_noam

9 months ago

Look at the CRAZY domain gap we found in summarization datasets: while English resources are diverse, other languages are mostly restricted to news. Presenting our survey following 130+ datasets in 100+ languages! Explore: github.com/edahanoam/Awes… Gabriel Stanovsky, HUJI NLP 1/6

thumb_up_off_alt42

chat_bubble_outline4

repeat15

shareShare

Ariel Gera

@arielgera2

9 months ago

Say I want to compare system qualities - pick between 2 configurations, or rank a whole bunch of models. I'll use LLM-as-a-judge, right? 🧑🏻‍⚖️ But how do I know the LLM judge is up to the task? Who is a good judge for ranking systems? Enter our new paper!✨🧵 arxiv.org/abs/2412.09569

thumb_up_off_alt24

chat_bubble_outline1

repeat8

shareShare

Asaf Yehudai

@asafyehudai

9 months ago

New preprint! ✨ Interested in LLM-as-a-Judge? Want to get the best judge for ranking your system? our new work is just for you: "JuStRank: Benchmarking LLM Judges for System Ranking" 🕺💃 arxiv.org/abs/2412.09569

thumb_up_off_alt30

chat_bubble_outline2

repeat9

shareShare

Esther Shizgal

@esthershizgal

8 months ago

Happy to share that our paper is now on Arxiv 🤗 🚞 Applying NLP for analyzing character development 👣 ✡️ Examining Religious Trajectories in 1000 Holocaust testimonies 🕎 🍂🌱🍂🌱🌱 Sequence Clustering 🍂🌱🍂🌿🌱 arxiv.org/abs/2412.17063

thumb_up_off_alt37

chat_bubble_outline0

repeat8

shareShare

Noy Sternlicht

@noysternlicht

3 months ago

🔔 New Paper! We propose a challenging new benchmark for LLM judges: Evaluating debate speeches. Are they comparable to humans? Well... it’s debatable. 🤔 noy-sternlicht.github.io/Debatable-Inte… 👇 Here are our findings:

thumb_up_off_alt46

chat_bubble_outline3

repeat15

shareShare

Niv Eckhaus

@niveckhaus

3 months ago

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7

thumb_up_off_alt51

chat_bubble_outline2

repeat12

shareShare