Eve Fleisig (@enfleisig) 's Twitter Profile
Eve Fleisig

@enfleisig

PhD student @Berkeley_EECS | Princeton ‘21 | NLP, ethical + equitable AI, and linguistics enthusiast

ID: 1289061357007110144

linkhttp://efleisig.com calendar_today31-07-2020 04:52:38

192 Tweet

623 Takipçi

372 Takip Edilen

Arduin Findeis @ ICLR2025 (@arduinfindeis) 's Twitter Profile Photo

🕵🏻💬 Introducing Feedback Forensics: a new tool to investigate pairwise preference data. Feedback data is notoriously difficult to interpret and has many known issues – our app aims to help! Try it at app.feedbackforensics.com Three example use-cases 👇🧵

Yoo Yeon Sung (@yooyeonsung1) 's Twitter Profile Photo

🏆ADVSCORE won an Outstanding Paper Award at #NAACL2025 NAACL HLT 2025!! If you want to learn how to make your benchmark *actually* adversarial, come find me: 📍Poster Session 5 - HC: Human-centered NLP 📅May 1 @ 2PM Hiring for human-focused AI dev/LLM eval? Let’s talk! 💼

🏆ADVSCORE won an Outstanding Paper Award at #NAACL2025 <a href="/naaclmeeting/">NAACL HLT 2025</a>!!

If you want to learn how to make your benchmark *actually* adversarial, come find me:
📍Poster Session 5 - HC: Human-centered NLP
📅May 1 @ 2PM

Hiring for human-focused AI dev/LLM eval? Let’s talk! 💼
Yoo Yeon Sung (@yooyeonsung1) 's Twitter Profile Photo

I feel so honored to win this award at #naaclmeeting #naacl2025 🥹 Cannot say how much grateful I am to my wonderful advisor Jordan Boyd-Graber and could not have done it without Maharshi Gor, Eve Fleisig, Ishani Mondal 🙏

I feel so honored to win this award at #naaclmeeting #naacl2025 🥹 

Cannot say how much grateful I am to my wonderful advisor <a href="/boydgraber/">Jordan Boyd-Graber</a> and could not have done it without <a href="/maharshigor/">Maharshi Gor</a>, <a href="/enfleisig/">Eve Fleisig</a>, <a href="/IshaniMond66436/">Ishani Mondal</a> 🙏
Myra Cheng (@chengmyra1) 's Twitter Profile Photo

Do people actually like human-like LLMs? In our #ACL2025 paper HumT DumT, we find a kind of uncanny valley effect: users dislike LLM outputs that are *too human-like*. We thus develop methods to reduce human-likeness without sacrificing performance.

Do people actually like human-like LLMs? In our #ACL2025 paper HumT DumT, we find a kind of uncanny valley effect: users dislike LLM outputs that are *too human-like*. We thus develop methods to reduce human-likeness without sacrificing performance.
CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Omar Shaikh (@oshaikh13) 's Twitter Profile Photo

BREAKING NEWS! Most people aren’t prompting models with IMO problems :) They’re prompting with tasks that need more context, like “plz make talk slides.” In an ACL oral, I’ll cover challenges in human-LM grounding (in 60K+ real interactions) & introduce a benchmark: RIFTS. 🧵

BREAKING NEWS! Most people aren’t prompting models with IMO problems :)

They’re prompting with tasks that need more context, like “plz make talk slides.”

In an ACL oral, I’ll cover challenges in human-LM grounding (in 60K+ real interactions) &amp; introduce a benchmark: RIFTS.

🧵