Alon Jacovi (@alon_jacovi) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Do you have a "tell" when you are about to lie? We find that LLMs have “tells” in their internal representations which allow estimating how knowledgeable a model is about an entity 𝘣𝘦𝘧𝘰𝘳𝘦 it generates even a single token. Paper: arxiv.org/abs/2406.12673… 🧵 Daniela Gottesman

thumb_up_off_alt117

chat_bubble_outline2

repeat24

shareShare

omer goldman

@omernlp

8 months ago

new models have an amazingly long context. but can we actually tell how well they deal with it? 🚨🚨NEW PAPER ALERT🚨🚨 with Alon Jacovi lovodkin93 Aviya Maimon, Ido Dagan and Reut Tsarfaty arXiv: arxiv.org/abs/2407.00402… 1/🧵

new models have an amazingly long context. but can we actually tell how well they deal with it?
🚨🚨NEW PAPER ALERT🚨🚨
with <a href="/alon_jacovi/">Alon Jacovi</a> <a href="/lovodkin93/">lovodkin93</a> Aviya Maimon, Ido Dagan and <a href="/rtsarfaty/">Reut Tsarfaty</a>
arXiv: arxiv.org/abs/2407.00402…

1/🧵

thumb_up_off_alt78

chat_bubble_outline5

repeat29

shareShare

lovodkin93

@lovodkin93

8 months ago

What constitutes a really challenging long-context task? And are the current works addressing long context actually test those aspects? Check out our new opinion paper 👇

thumb_up_off_alt13

chat_bubble_outline0

repeat3

shareShare

Maor Ivgi

@maorivg

8 months ago

1/7 🚨 What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning. 🧵👇 Mor Geva Jonathan Berant Ori Yoran

thumb_up_off_alt122

chat_bubble_outline2

repeat30

shareShare

Alon Jacovi

@alon_jacovi

8 months ago

Long Context task design is hard. When the "goalpost" keeps moving, relying on distractors+filler helps, but makes us predisposed to retrieval pipelines. How can we design "retrieval-resistant" long-context tasks? New paper! ⬇️ arxiv.org/abs/2407.00402 x.com/omerNLP/status…

thumb_up_off_alt24

chat_bubble_outline1

repeat2

shareShare

Alon Jacovi

@alon_jacovi

8 months ago

Oral *and* poster *and* pre-recorded oral? Within a week and a half? 😰

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

Dieuwke Hupkes

@_dieuwke_

7 months ago

Last but not least: contamination analysis! What exactly is the best way of measuring contamination will prob. remain an open question for a bit (I'll talk about that soon in a keynote in the ACL 2025 workshop CONDA!), but *doing* contamination analysis should be the standard!

thumb_up_off_alt12

chat_bubble_outline1

repeat3

shareShare

Orion Weller

@orionweller

7 months ago

🚨 We all complain a lot about reviewers/ACs/SACs in the ML/NLP community. But why not look at the data to see what’s going on? I found some crazy statistics about who is doing/not doing this service in the *CL community. 😱 orionweller.github.io/blog/2024/revi… 🧵

thumb_up_off_alt221

chat_bubble_outline11

repeat61

shareShare

Oscar Sainz

@osainz59

7 months ago

Thank you to all the contributors! As part of the CONDA Workshop, we have created a report with all the contributions. You can find it already available in arxiv: arxiv.org/abs/2407.21530

thumb_up_off_alt8

chat_bubble_outline0

repeat4

shareShare

Yanai Elazar

@yanaiela

7 months ago

Concerned about data contamination? We asked the community for known contamination in different datasets and models, and summarized these finding in this report. arxiv.org/pdf/2407.21530

thumb_up_off_alt33

chat_bubble_outline1

repeat7

shareShare

Aran Komatsuzaki

@arankomatsuzaki

7 months ago

Google presents CoverBench: A Challenging Benchmark for Complex Claim Verification Provides a significant challenge to current models with large headroom arxiv.org/abs/2408.03325

thumb_up_off_alt127

chat_bubble_outline1

repeat30

shareShare

AK

@_akhaliq

7 months ago

Google announces CoverBench A Challenging Benchmark for Complex Claim Verification discuss: huggingface.co/papers/2408.03… There is a growing line of research on verifying the correctness of language models' outputs. At the same time, LMs are being used to tackle complex queries that

thumb_up_off_alt97

chat_bubble_outline0

repeat19

shareShare

Alon Jacovi

@alon_jacovi

7 months ago

New complex reasoning eval set! CoverBench: Verify whether a claim is correct given a rich context. It requires implicit complex reasoning. It's efficient (<1k ex), convenient (binary classification), and hard. Take a look! arxiv.org/abs/2408.03325 huggingface.co/datasets/googl…

thumb_up_off_alt77

chat_bubble_outline2

repeat14

shareShare

Jack Hessel

@jmhessel

7 months ago

Cool dataset from Alon Jacovi et al.! Similar to NLI: you're given a context and a claim (and need to confirm/deny the claim). But the contexts are ~3.5K tokens + claims require different skills like multi-hop/table reasoning, etc. 70B+ models are close to random performance

thumb_up_off_alt27

chat_bubble_outline1

repeat1

shareShare

Pratyush Maini

@pratyushmaini

7 months ago

Our work on ACR memorization won the Best Paper Award at CONDA @ #ACL2024 🎉🎉 I will be giving a talk on the same on August 16th Over the last few months, I have substantially grown in the realization of how impactful ACR will be in the GenAI copyright discourse. Thoughts🧵1/n

thumb_up_off_alt128

chat_bubble_outline10

repeat20

shareShare

Yanai Elazar

@yanaiela

7 months ago

My recent thoughts on ACing for ARR and reviewer's arguments: yanaiela.github.io/posts/meta-rev…

thumb_up_off_alt55

chat_bubble_outline2

repeat7

shareShare

MMitchell

@mmitchell_ai

7 months ago

What does a 1911 electroscope have to do with testing on your training data? Find out at my (remote) invited talk tomorrow @ CONDA #ACL2024 conda-workshop.github.io

thumb_up_off_alt20

chat_bubble_outline1

repeat5

shareShare

Iker García-Ferrero

@iker_garciaf

7 months ago

The main conference has ended, but the ACL 2025 is not over. Tomorrow, Friday the 16th, come to the CONDA Workshop (Lotus Suite 4) for amazing talks by Anna Rogers, Jesse Dodge, @dieuwke, and MMitchell. Schedule: conda-workshop.github.io

The main conference has ended, but the <a href="/aclmeeting/">ACL 2025</a> is not over. Tomorrow, Friday the 16th, come to the CONDA Workshop (Lotus Suite 4) for amazing talks by <a href="/annargrs/">Anna Rogers</a>, <a href="/JesseDodge/">Jesse Dodge</a>, @dieuwke, and <a href="/mmitchell_ai/">MMitchell</a>.

Schedule: conda-workshop.github.io

thumb_up_off_alt14

chat_bubble_outline1

repeat7

shareShare

Alon Jacovi

@alon_jacovi

7 months ago

The Data Contamination Workshop is tomorrow at #ACL2024NLP ! Come listen to MMitchell Dieuwke Hupkes Jesse Dodge Anna Rogers and a collection of amazing data contamination papers at Lotus Suite 4! Program: conda-workshop.github.io

thumb_up_off_alt24

chat_bubble_outline0

repeat5

shareShare

Anna Rogers

@annargrs

7 months ago

I'll be discussing 'emergent properties' at 11am in this lovely workshop tomorrow. I found even more definitions for what this means during this ACL and also ICML!

thumb_up_off_alt27

chat_bubble_outline2

repeat4

shareShare