Alon Jacovi (@alon_jacovi) 's Twitter Profile
Alon Jacovi

@alon_jacovi

ML/NLP, XAI @google. previously: @biunlp @allen_ai @IBMResearch @RIKEN_AIP

ID: 1101220947607146497

linkhttps://alonjacovi.github.io/ calendar_today28-02-2019 20:41:48

666 Tweet

1,1K Takipçi

433 Takip Edilen

Mor Geva (@megamor2) 's Twitter Profile Photo

Do you have a "tell" when you are about to lie? We find that LLMs have “tells” in their internal representations which allow estimating how knowledgeable a model is about an entity 𝘣𝘦𝘧𝘰𝘳𝘦 it generates even a single token. Paper: arxiv.org/abs/2406.12673… 🧵 Daniela Gottesman

Do you have a "tell" when you are about to lie?

We find that LLMs have “tells” in their internal representations which allow estimating how knowledgeable a model is about an entity 𝘣𝘦𝘧𝘰𝘳𝘦 it generates even a single token.

Paper: arxiv.org/abs/2406.12673… 🧵

<a href="/dhgottesman/">Daniela Gottesman</a>
omer goldman (@omernlp) 's Twitter Profile Photo

new models have an amazingly long context. but can we actually tell how well they deal with it? 🚨🚨NEW PAPER ALERT🚨🚨 with Alon Jacovi lovodkin93 Aviya Maimon, Ido Dagan and Reut Tsarfaty arXiv: arxiv.org/abs/2407.00402… 1/🧵

new models have an amazingly long context. but can we actually tell how well they deal with it?
🚨🚨NEW PAPER ALERT🚨🚨
with <a href="/alon_jacovi/">Alon Jacovi</a> <a href="/lovodkin93/">lovodkin93</a> Aviya Maimon, Ido Dagan and <a href="/rtsarfaty/">Reut Tsarfaty</a> 
arXiv: arxiv.org/abs/2407.00402…

1/🧵
lovodkin93 (@lovodkin93) 's Twitter Profile Photo

What constitutes a really challenging long-context task? And are the current works addressing long context actually test those aspects? Check out our new opinion paper 👇

Maor Ivgi (@maorivg) 's Twitter Profile Photo

1/7 🚨 What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning. 🧵👇 Mor Geva Jonathan Berant Ori Yoran

1/7 🚨 What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning. 🧵👇
<a href="/megamor2/">Mor Geva</a> <a href="/JonathanBerant/">Jonathan Berant</a> <a href="/OriYoran/">Ori Yoran</a>
Alon Jacovi (@alon_jacovi) 's Twitter Profile Photo

Long Context task design is hard. When the "goalpost" keeps moving, relying on distractors+filler helps, but makes us predisposed to retrieval pipelines. How can we design "retrieval-resistant" long-context tasks? New paper! ⬇️ arxiv.org/abs/2407.00402 x.com/omerNLP/status…

Dieuwke Hupkes (@_dieuwke_) 's Twitter Profile Photo

Last but not least: contamination analysis! What exactly is the best way of measuring contamination will prob. remain an open question for a bit (I'll talk about that soon in a keynote in the ACL 2025 workshop CONDA!), but *doing* contamination analysis should be the standard!

Last but not least: contamination analysis! What exactly is the best way of measuring contamination will prob. remain an open question for a bit (I'll talk about that soon in a keynote in the <a href="/aclmeeting/">ACL 2025</a> workshop CONDA!), but *doing* contamination analysis should be the standard!
Orion Weller (@orionweller) 's Twitter Profile Photo

🚨 We all complain a lot about reviewers/ACs/SACs in the ML/NLP community.   But why not look at the data to see what’s going on? I found some crazy statistics about who is doing/not doing this service in the *CL community. 😱 orionweller.github.io/blog/2024/revi… 🧵

Oscar Sainz (@osainz59) 's Twitter Profile Photo

Thank you to all the contributors! As part of the CONDA Workshop, we have created a report with all the contributions. You can find it already available in arxiv: arxiv.org/abs/2407.21530

Yanai Elazar (@yanaiela) 's Twitter Profile Photo

Concerned about data contamination? We asked the community for known contamination in different datasets and models, and summarized these finding in this report. arxiv.org/pdf/2407.21530

Concerned about data contamination?
We asked the community for known contamination in different datasets and models, and summarized these finding in this report.
arxiv.org/pdf/2407.21530
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Google presents CoverBench: A Challenging Benchmark for Complex Claim Verification Provides a significant challenge to current models with large headroom arxiv.org/abs/2408.03325

Google presents CoverBench: A Challenging Benchmark for Complex Claim Verification

Provides a significant challenge to current models with 
large headroom

arxiv.org/abs/2408.03325
AK (@_akhaliq) 's Twitter Profile Photo

Google announces CoverBench A Challenging Benchmark for Complex Claim Verification discuss: huggingface.co/papers/2408.03… There is a growing line of research on verifying the correctness of language models' outputs. At the same time, LMs are being used to tackle complex queries that

Google announces CoverBench

A Challenging Benchmark for Complex Claim Verification

discuss: huggingface.co/papers/2408.03…

There is a growing line of research on verifying the correctness of language models' outputs. At the same time, LMs are being used to tackle complex queries that
Alon Jacovi (@alon_jacovi) 's Twitter Profile Photo

New complex reasoning eval set! CoverBench: Verify whether a claim is correct given a rich context. It requires implicit complex reasoning. It's efficient (<1k ex), convenient (binary classification), and hard. Take a look! arxiv.org/abs/2408.03325 huggingface.co/datasets/googl…

New complex reasoning eval set!

CoverBench: Verify whether a claim is correct given a rich context. It requires implicit complex reasoning.

It's efficient (&lt;1k ex), convenient (binary classification), and hard. Take a look!

arxiv.org/abs/2408.03325
huggingface.co/datasets/googl…
Jack Hessel (@jmhessel) 's Twitter Profile Photo

Cool dataset from Alon Jacovi et al.! Similar to NLI: you're given a context and a claim (and need to confirm/deny the claim). But the contexts are ~3.5K tokens + claims require different skills like multi-hop/table reasoning, etc. 70B+ models are close to random performance

Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

Our work on ACR memorization won the Best Paper Award at CONDA @ #ACL2024 🎉🎉 I will be giving a talk on the same on August 16th Over the last few months, I have substantially grown in the realization of how impactful ACR will be in the GenAI copyright discourse. Thoughts🧵1/n

Our work on ACR memorization won the Best Paper Award at CONDA @ #ACL2024 🎉🎉 
I will be giving a talk on the same on August 16th

Over the last few months, I have substantially grown in the realization of how impactful ACR will be in the GenAI copyright discourse. Thoughts🧵1/n
MMitchell (@mmitchell_ai) 's Twitter Profile Photo

What does a 1911 electroscope have to do with testing on your training data? Find out at my (remote) invited talk tomorrow @ CONDA #ACL2024 conda-workshop.github.io

Iker García-Ferrero (@iker_garciaf) 's Twitter Profile Photo

The main conference has ended, but the ACL 2025 is not over. Tomorrow, Friday the 16th, come to the CONDA Workshop (Lotus Suite 4) for amazing talks by Anna Rogers, Jesse Dodge, @dieuwke, and MMitchell. Schedule: conda-workshop.github.io

The main conference has ended, but the <a href="/aclmeeting/">ACL 2025</a> is not over. Tomorrow, Friday the 16th, come to the CONDA Workshop (Lotus Suite 4) for amazing talks by <a href="/annargrs/">Anna Rogers</a>, <a href="/JesseDodge/">Jesse Dodge</a>, @dieuwke, and <a href="/mmitchell_ai/">MMitchell</a>. 

Schedule: conda-workshop.github.io
Alon Jacovi (@alon_jacovi) 's Twitter Profile Photo

The Data Contamination Workshop is tomorrow at #ACL2024NLP ! Come listen to MMitchell Dieuwke Hupkes Jesse Dodge Anna Rogers and a collection of amazing data contamination papers at Lotus Suite 4! Program: conda-workshop.github.io

Anna Rogers (@annargrs) 's Twitter Profile Photo

I'll be discussing 'emergent properties' at 11am in this lovely workshop tomorrow. I found even more definitions for what this means during this ACL and also ICML!