Shachar Don-Yehiya (@shachar_don) 's Twitter Profile
Shachar Don-Yehiya

@shachar_don

PhD student @CseHuji @ibmresearch #NLProc

ID: 1514160551395569669

linkhttps://shachardon.github.io/ calendar_today13-04-2022 08:36:36

166 Tweet

269 Followers

270 Following

Michael Hassid (@michaelhassid) 's Twitter Profile Photo

Which is better, running a 70B model once, or a 7B model 10 times? The answer might be surprising! Presenting our new Conference on Language Modeling paper: "The Larger the Better? Improved LLM Code-Generation via Budget Reallocation" arxiv.org/abs/2404.00725 1/n

Which is better, running a 70B model once, or a 7B model 10 times? The answer might be surprising!

Presenting our new <a href="/COLM_conf/">Conference on Language Modeling</a>  paper: "The Larger the Better? Improved LLM Code-Generation via Budget Reallocation"

arxiv.org/abs/2404.00725

1/n
nitzan guetta (@nitzanguetta) 's Twitter Profile Photo

Can you answer these riddles? We are happy to present our new paper “Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models”. Paper: Website: visual-riddles.github.io 🧵

Can you answer these riddles?

We are happy to present our new paper “Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models”.
Paper:
Website: visual-riddles.github.io
🧵
Gili Lior (@gililior) 's Twitter Profile Photo

Exciting news! I'll present my poster at #ACL2024 about unsupervised document structure extraction tomorrow (Aug. 12th) at 12:45 PM 🕒 Come say hi and let's chat over the paper! arxiv.org/pdf/2402.13906 More details below ⬇️ w/ Gabriel Stanovsky (((ل()(ل() 'yoav))))👾 Ai2 HUJI NLP

Exciting news! I'll present my poster at #ACL2024 about unsupervised document structure extraction tomorrow (Aug. 12th) at 12:45 PM 🕒 Come say hi and let's chat over the paper! arxiv.org/pdf/2402.13906 More details below ⬇️
w/ <a href="/GabiStanovsky/">Gabriel Stanovsky</a> <a href="/yoavgo/">(((ل()(ل() 'yoav))))👾</a> <a href="/allen_ai/">Ai2</a> <a href="/nlphuji/">HUJI NLP</a>
Daniel Marczak (@danie1marczak) 's Twitter Profile Photo

It would be nice not to worry about catastrophic forgetting while continually training your models, wouldn’t it? Check out our new #ECCV2024 paper to see how you can do that. “MagMax: Leveraging Model Merging for Seamless Continual Learning” 📜arxiv.org/pdf/2407.06322

It would be nice not to worry about catastrophic forgetting while continually training your models, wouldn’t it?

Check out our new #ECCV2024 paper to see how you can do that.

“MagMax: Leveraging Model Merging for Seamless Continual Learning”
📜arxiv.org/pdf/2407.06322
Leshem Choshen C U @ ICLR 🤖🤗 (@lchoshen) 's Twitter Profile Photo

Human feedback is critical for aligning LLMs, so why don’t we collect it in the open ecosystem?🧐 We (15 orgs) gathered the key issues and next steps. Envisioning a community-driven feedback platform, like Wikipedia alphaxiv.org/abs/2408.16961 🧵

Human feedback is critical for aligning LLMs, so why don’t we collect it in the open ecosystem?🧐
We (15 orgs) gathered the key issues and next steps.
Envisioning
a community-driven feedback platform, like Wikipedia

alphaxiv.org/abs/2408.16961
🧵
Human Feedback Foundation (@humanfeedbackio) 's Twitter Profile Photo

This is the one paper to read if you care about human input into AI systems. You know who else agrees we have to collect and leverage human feedback in open source AI? That's right Yann LeCun and definitely Soumith Chintala

Shachar Don-Yehiya (@shachar_don) 's Twitter Profile Photo

Human feedback is a valuable resource for model development and research 🤖💬 While for-profit companies collect user data through their APIs to improve their own models, the open-source community lags behind What's missing?🧐 We tackle it here⬇️

idan shenfeld (@idanshenfeld) 's Twitter Profile Photo

Human feedback is critical for aligning LLMs, but it’s often locked away in proprietary datasets. In our new paper, we explore scalable methods to collect and share open-source human feedback data for LLM alignment. Let’s democratize this process and push the field forward! 🚀👇

Uri Berger (@uriberger88) 's Twitter Profile Photo

1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ Gabriel Stanovsky Omri Abend Lea Frermann >

1/ Into Image Captioning? Don’t miss this!
Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading?
Read our recent Captioning evaluation survey!

arxiv.org/abs/2408.04909
w\
<a href="/GabiStanovsky/">Gabriel Stanovsky</a>
<a href="/AbendOmri/">Omri Abend</a>
<a href="/leafrermann/">Lea Frermann</a>
&gt;
AI Tinkerers (@aitinkerers) 's Twitter Profile Photo

🚀 AI Tinkerers & Human Feedback Foundation present Paper Club! Premiere: "Learning from Naturally Occurring Feedback" Sept 17 at 12pm ET. #PaperClub #BuildingAI Shachar Don-Yehiya Leshem (Legend) Choshen 🤖🤗 @ACL Omri Abend Elena Yunusov Human Feedback Foundation - link in first comment

Yotam Perlitz 👾 (@yotamperlitz) 's Twitter Profile Photo

✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…

Eitan Wagner (@eitanwagner) 's Twitter Profile Photo

Language Models output probabilities for tokens. So probabilities of spans must follow a valid joint, right? Check out our new #EMNLP2024 paper -- "CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models". w/ Yuli Slavutsky and Omri Abend (1/5)

Prateek Yadav (@prateeky2806) 's Twitter Profile Photo

Ever wondered if model merging works at scale? Maybe the benefits wear off for bigger models? Maybe you considered using model merging for post-training of your large model but not sure if it generalizes well? cc: Google AI Google DeepMind UNC NLP 🧵👇 Excited to announce my

Ever wondered if model merging works at scale? Maybe the benefits wear off for bigger models?

Maybe you considered using model merging for post-training of your large model but not sure if it  generalizes well?

cc: <a href="/GoogleAI/">Google AI</a> <a href="/GoogleDeepMind/">Google DeepMind</a> <a href="/uncnlp/">UNC NLP</a>
🧵👇

Excited to announce my
Esther Shizgal (@esthershizgal) 's Twitter Profile Photo

1/n Excited to present our work at #EMNLP2024’s Workshop on Narrative Understanding! Testimonies reveal personal stories, making them ideal for studying character development. We use NLP to analyze character arcs in 1,000 Holocaust testimonies. Omri Abend Renana Keydar Eitan Wagner

1/n
Excited to present our work at #EMNLP2024’s Workshop on Narrative Understanding! Testimonies reveal personal stories, making them ideal for studying character development. We use NLP to analyze character arcs in 1,000 Holocaust testimonies.
<a href="/AbendOmri/">Omri Abend</a> <a href="/RKeydar/">Renana Keydar</a> <a href="/EitanWagner/">Eitan Wagner</a>
Noam Dahan (@dahan_noam) 's Twitter Profile Photo

Look at the CRAZY domain gap we found in summarization datasets: while English resources are diverse, other languages are mostly restricted to news. Presenting our survey following 130+ datasets in 100+ languages! Explore: github.com/edahanoam/Awes… Gabriel Stanovsky, HUJI NLP 1/6

Look at the CRAZY domain gap we found in summarization datasets: while English resources are diverse, other languages are mostly restricted to news.

Presenting our survey following 130+ datasets in 100+ languages!

Explore: github.com/edahanoam/Awes…

<a href="/GabiStanovsky/">Gabriel Stanovsky</a>, <a href="/nlphuji/">HUJI NLP</a>
1/6
Ariel Gera (@arielgera2) 's Twitter Profile Photo

Say I want to compare system qualities - pick between 2 configurations, or rank a whole bunch of models. I'll use LLM-as-a-judge, right? 🧑🏻‍⚖️ But how do I know the LLM judge is up to the task? Who is a good judge for ranking systems? Enter our new paper!✨🧵 arxiv.org/abs/2412.09569

Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

New preprint! ✨ Interested in LLM-as-a-Judge? Want to get the best judge for ranking your system? our new work is just for you: "JuStRank: Benchmarking LLM Judges for System Ranking" 🕺💃 arxiv.org/abs/2412.09569

Esther Shizgal (@esthershizgal) 's Twitter Profile Photo

Happy to share that our paper is now on Arxiv 🤗 🚞 Applying NLP for analyzing character development 👣 ✡️ Examining Religious Trajectories in 1000 Holocaust testimonies 🕎 🍂🌱🍂🌱🌱 Sequence Clustering 🍂🌱🍂🌿🌱 arxiv.org/abs/2412.17063

Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

🔔 New Paper! We propose a challenging new benchmark for LLM judges: Evaluating debate speeches. Are they comparable to humans? Well... it’s debatable. 🤔 noy-sternlicht.github.io/Debatable-Inte… 👇 Here are our findings:

Niv Eckhaus (@niveckhaus) 's Twitter Profile Photo

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7