dilara (@dilarafsoylu) Twitter Tweets • TwiCopy

Csordás Róbert

3 months ago

Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations. In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6

thumb_up_off_alt1,1K

chat_bubble_outline32

repeat139

shareShare

Houjun Liu

@houjun_liu

3 months ago

New Paper Day! For ACL Findings 2025: You should **drop dropout** when you are training your LMs AND MLMs!

thumb_up_off_alt84

chat_bubble_outline3

repeat16

shareShare

Omar Khattab

@lateinteraction

3 months ago

Calling learning natural-language rules “not real learning” is so backwards. Interacting with an environment to generate abstract hypotheses and turn them into actionable natural-language rules is as “learning” as the word’s natural connotations get. Though gradient-based

thumb_up_off_alt48

chat_bubble_outline2

repeat9

shareShare

CLS

@chengleisi

2 months ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

thumb_up_off_alt553

chat_bubble_outline10

repeat162

shareShare

Siyan Sylvia Li 🦋

@sylvia_sparkle

2 months ago

🎉 Excited to announce that the 4th HCI+NLP workshop will be co-located with @EMNLP in Suzhou, China! 🌍📍 Join us to explore the intersection of human-computer interaction and NLP. 🧵 1/

thumb_up_off_alt72

chat_bubble_outline2

repeat16

shareShare

dilara

@dilarafsoylu

2 months ago

SmolLM3 uses the APO preference loss! Karel D’Oosterlinck great to see APO getting more adoption!

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Brando Miranda

@brandohablando

2 months ago

🔄 We were nominated for Oral+top 1 in the MATH-AI workshp at #ICML! 🚨Why? ≈46 % of GitHub commits are AI-generated—but can we verify them correct? 📢 VeriBench challenges agents; turn Python into Lean code! 🧵1/14 📃 Paper: openreview.net/forum?id=rWkGF…

thumb_up_off_alt39

chat_bubble_outline1

repeat13

shareShare

Ahmed Ahmed

@ahmedsqrd

a month ago

Prompting Llama 3.1 70B with the “Mr and Mrs. D” can generate seed the generation of a near-exact copy of the entire ~300 page book ‘Harry Potter & the Sorcerer’s Stone’ 🤯 We define a “near-copy” as text that is identical modulo minor spelling / punctuation variations. Below

thumb_up_off_alt63

chat_bubble_outline2

repeat7

shareShare

Harshit Joshi

@harshitj__

a month ago

flying to Vienna 🇦🇹 for ACL to present Genie Worksheets (Monday 11am)! come and say hi if you want to talk about how to create controllable and reliable application layers on top of LLMs, knowledge discovery and curation, or just wanna hang

thumb_up_off_alt40

chat_bubble_outline2

repeat17

shareShare

Omar Shaikh

@oshaikh13

a month ago

BREAKING NEWS! Most people aren’t prompting models with IMO problems :) They’re prompting with tasks that need more context, like “plz make talk slides.” In an ACL oral, I’ll cover challenges in human-LM grounding (in 60K+ real interactions) & introduce a benchmark: RIFTS. 🧵

thumb_up_off_alt245

chat_bubble_outline4

repeat42

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

a month ago

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

thumb_up_off_alt458

chat_bubble_outline15

repeat87

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

a month ago

Paper: arxiv.org/abs/2507.19457 GEPA will be open-sourced soon as a new DSPy optimizer. Stay tuned! Incredibly grateful to the wonderful team Shangyin Tan Dilara Soylu Noah Ziems Rishi Khare Krista Opsahl-Ong Arnav Singhvi Herumb Shandilya Michael Ryan @ ACL 2025 🇦🇹 Meng Jiang Christopher Potts

Paper: arxiv.org/abs/2507.19457

GEPA will be open-sourced soon as a new DSPy optimizer. Stay tuned!

Incredibly grateful to the wonderful team <a href="/ShangyinT/">Shangyin Tan</a> <a href="/dilarafsoylu/">Dilara Soylu</a> <a href="/NoahZiems/">Noah Ziems</a> <a href="/rishiskhare/">Rishi Khare</a> <a href="/kristahopsalong/">Krista Opsahl-Ong</a> <a href="/arnav_thebigman/">Arnav Singhvi</a> <a href="/krypticmouse/">Herumb Shandilya</a> <a href="/michaelryan207/">Michael Ryan @ ACL 2025 🇦🇹</a> <a href="/Meng_CS/">Meng Jiang</a> <a href="/ChrisGPotts/">Christopher Potts</a>

thumb_up_off_alt77

chat_bubble_outline3

repeat13

shareShare

Omar Khattab

@lateinteraction

a month ago

Lakshya A Agrawal Obligatory tagging of Andrej Karpathy's take. Comparing prompt learning against GRPO by Lakshya A Agrawal, Dilara Soylu, Noah Ziems and team. See also earlier evidence of prompt optimization vs. offline RL in a narrower setting dspy.BetterTogether (EMNLP'24)! x.com/karpathy/statu…

thumb_up_off_alt24

chat_bubble_outline0

repeat3

shareShare