Gal Yona (@_galyo) Twitter Tweets • TwiCopy

Mor Geva

a year ago

Excited to attend EMNLP 2025 in Miami next week 🤩 DM me if you'd like to grab a coffee and chat about interpretability, knowledge, or reasoning in LLMs! Our group/collabs will be presenting a bunch of cool works, come check them out! 🧵

Excited to attend <a href="/emnlpmeeting/">EMNLP 2025</a> in Miami next week 🤩 DM me if you'd like to grab a coffee and chat about interpretability, knowledge, or reasoning in LLMs!

Our group/collabs will be presenting a bunch of cool works, come check them out! 🧵

thumb_up_off_alt75

chat_bubble_outline1

repeat7

shareShare

Yoav Wald

@wald_yoav

a year ago

What prompt generated the image on the right? Come find out today at our tutorial on OOD generalization: Shortcuts, Spuriousness, and Stability @Maggiemakar aahlad puli Panel: Elan Rosenfeld Aditi Raghunathan Danica Sutherland

thumb_up_off_alt19

chat_bubble_outline0

repeat5

shareShare

Chip Huyen

@chipro

a year ago

During the process of writing AI Engineering, I went through so many papers, case studies, blog posts, repos, tools, etc. This repo contains ~100 resources that really helped me understand various aspects of building with foundation models. github.com/chiphuyen/aie-… Here are the

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat241

shareShare

François Chollet

@fchollet

a year ago

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task

thumb_up_off_alt8,8K

chat_bubble_outline204

repeat1,1K

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

a year ago

We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.

thumb_up_off_alt1,1K

chat_bubble_outline24

repeat230

shareShare

Sasha Rush

@srush_nlp

8 months ago

Simons Institute Workshop: "Future of LLMs and Transformers": 21 talks Monday - Friday next week. simons.berkeley.edu/workshops/futu…

thumb_up_off_alt528

chat_bubble_outline4

repeat92

shareShare

Zorik Gekhman

@zorikgekhman

8 months ago

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”? In our new paper, we clearly define this concept and design controlled experiments to test it. 1/🧵

thumb_up_off_alt221

chat_bubble_outline4

repeat59

shareShare

Stanford NLP Group

@stanfordnlp

8 months ago

.Percy Liang & Tatsunori Hashimoto start the 2nd offering of CS336 Language Modeling from Scratch at Stanford NLP Group. The class philosophy is Understanding by Building. We need many people who understand the detailed design of modern LLMs, not just a few at “frontier” 🤭 AI companies.

.<a href="/percyliang/">Percy Liang</a> & <a href="/tatsu_hashimoto/">Tatsunori Hashimoto</a> start the 2nd offering of CS336 Language Modeling from Scratch at <a href="/stanfordnlp/">Stanford NLP Group</a>. The class philosophy is Understanding by Building. We need many people who understand the detailed design of modern LLMs, not just a few at “frontier” 🤭 AI companies.

thumb_up_off_alt243

chat_bubble_outline9

repeat32

shareShare

Gal Yona

@_galyo

7 months ago

This was a great 30-minute conceptual read. It neatly ties together classic RL, LLMs of the past few years, and where agents are headed next. Honestly, I find the future of agents interacting w the world with less human mediation ("experiencing") both exciting and terrifying

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Jeffrey Emanuel

@doodlestein

7 months ago

Sam Altman the single biggest thing you could do for safety/alignment is to put a massive emphasis in the RL feedback loop on basic HONESTY and never misleading, tricking, overstating, exaggerating, etc. It should be like touching a hot stove to the model. Just like how you raise kids

thumb_up_off_alt174

chat_bubble_outline9

repeat4

shareShare

(((ل()(ل() 'yoav))))👾

@yoavgo

7 months ago

we write too much. more than we can read, and many small incremental things. i think there should be some mechanism to restrict paper submissions and acceptances per person per year, to force people to prioritize their best work, and invest more in it.

thumb_up_off_alt617

chat_bubble_outline28

repeat31

shareShare

Josh Breiner

@joshbreiner

6 months ago

מצב המשטרה: השתמשה בצ'ט GPT שהמציא עבורה חוק חדש על מנת לנצח בהליך להחרמת פלאפון בדיון בבית משפט השלום בחדרה. השופט היה המום כשהדבר התגלה: "30 שנה אני שופט וחשבתי שראיתי הכל. כנראה שטעיתי"

thumb_up_off_alt2,2K

chat_bubble_outline101

repeat195

shareShare

Gal Yona

@_galyo

6 months ago

new work by Gabrielle Kaili-May Liu shows that LLMs still struggle to faithfully express their uncertainty in words, but cool to see that meta cognitive inspired prompting can go a long way. looking forward to seeing more positive results on this fundamental problem!

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare