Eran Hirsch (@hirscheran) Twitter Tweets • TwiCopy

Google Labs

a month ago

We just discovered the 🔥 COOLEST 🔥 trick in Flow that we have to share: Instead of wordsmithing the perfect prompt, you can just... draw it. Take the image of your scene, doodle what you'd like on it (through any editing app), and then briefly describe what needs to happen

thumb_up_off_alt3,3K

chat_bubble_outline113

repeat412

shareShare

Denny Zhou

@denny_zhou

a month ago

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial

thumb_up_off_alt2,2K

chat_bubble_outline22

repeat322

shareShare

Yumo Xu

@yumo_xu

a month ago

Excited to share our #ACL2025NLP paper, "𝐂𝐢𝐭𝐞𝐄𝐯𝐚𝐥: 𝐏𝐫𝐢𝐧𝐜𝐢𝐩𝐥𝐞-𝐃𝐫𝐢𝐯𝐞𝐧 𝐂𝐢𝐭𝐚𝐭𝐢𝐨𝐧 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐟𝐨𝐫 𝐒𝐨𝐮𝐫𝐜𝐞 𝐀𝐭𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧"! 📜 If you’re working on RAG, Deep Research and Trustworthy AI, this is for you. Why? Citation quality is

thumb_up_off_alt44

chat_bubble_outline2

repeat8

shareShare

Vishakh Padmakumar

@vishakh_pk

a month ago

Maybe don't use an LLM for _everything_? Last summer, I got to fiddle again with content diversity Adobe Research Adobe and we showed that agentic pipelines that mix LLM-prompt steps with principled techniques can yield better, more personalized summaries

Maybe don't use an LLM for _everything_?

Last summer, I got to fiddle again with content diversity <a href="/AdobeResearch/">Adobe Research</a> <a href="/Adobe/">Adobe</a> and we showed that agentic pipelines that mix LLM-prompt steps with principled techniques can yield better, more personalized summaries

thumb_up_off_alt61

chat_bubble_outline1

repeat12

shareShare

Hunyuan

@tencenthunyuan

a month ago

We're thrilled to release & open-source Hunyuan3D World Model 1.0! This model enables you to generate immersive, explorable, and interactive 3D worlds from just a sentence or an image. It's the industry's first open-source 3D world generation model, compatible with CG pipelines

thumb_up_off_alt3,3K

chat_bubble_outline164

repeat520

shareShare

Eran Hirsch

@hirscheran

a month ago

Presenting LAQuer tomorrow (Monday) from 11:00 to 12:30 at poster session 1!

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Aviya Maimon

@aviyamaimon

a month ago

🚨 New paper alert! 🚨 We propose an IQ Test for LLMs — a new way to evaluate models that goes beyond benchmarks and uncovers their core skills. Think: 🧠🤖 psychometrics for LLMs. 👇 (1/6)

thumb_up_off_alt27

chat_bubble_outline1

repeat13

shareShare

Tzuf - צוף

@tzuf6

a month ago

BIU NLP Itai Mondshine Reut Tsarfaty Our paper "Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization": arxiv.org/pdf/2507.08342

thumb_up_off_alt8

chat_bubble_outline1

repeat3

shareShare

omer goldman

@omernlp

a month ago

Are you still around Vienna? Come hear about a new morphological task at CoNLL at ~11:20 (hall M.1) Reut Tsarfaty

Are you still around Vienna? Come hear about a new morphological task at CoNLL at ~11:20 (hall M.1)
<a href="/rtsarfaty/">Reut Tsarfaty</a>

thumb_up_off_alt9

chat_bubble_outline1

repeat2

shareShare

Salesforce AI Research

@sfresearch

a month ago

Can LLM agents really understand us? 🤔 Paper: arxiv.org/html/2507.2203… Code/Data: github.com/SalesforceAIRe… We introduce UserBench: a gym environment testing how well agents align with nuanced human intent—not just follow commands. Users are messy: vague, evolving, indirect.

thumb_up_off_alt24

chat_bubble_outline0

repeat6

shareShare

Banghua Zhu

@banghuaz

24 days ago

Hearing from a lot of folks that they still fine-tune Qwen2.5 instead of Qwen3 — simply because “it’s easier to tune.” Qwen2.5 models seem more steerable: easier to adapt for new behaviors or boost specific capabilities, which means more downstream work builds on them. People

thumb_up_off_alt361

chat_bubble_outline27

repeat21

shareShare

Mosh Levy

@mosh_levy

19 days ago

Producing reasoning texts boosts the capabilities of AI models, but do we humans correctly understand these texts? Our latest research suggests that we do not. This highlights a new angle on the "Are they transparent?" debate: they might be, but we misinterpret them. 🧵

thumb_up_off_alt125

chat_bubble_outline7

repeat27

shareShare

Paul Couvert

@itspaulai

17 days ago

Wow! Chinese lab Tencent Hunyuan has released an open source alternative to Genie 3 🔥 You can generate realistic videos that you can control in real time. - Long-term consistency - No need for expensive rendering - Trained on 1M+ gameplay recordings Already available ↓

thumb_up_off_alt1,1K

chat_bubble_outline60

repeat251

shareShare

Tomer Ashuach

@tomerashuach

13 days ago

🚨 New preprint out! CRISP: Persistent Concept Unlearning via SAEs LLMs often encode knowledge we want to remove. CRISP enables persistent, interpretable, precise unlearning while keeping models useful & coherent—tested on bio & cyber safety tasks🧵👇 📄arxiv.org/abs/2508.13650

thumb_up_off_alt79

chat_bubble_outline1

repeat19

shareShare

Eviatar Nachshoni

@enachshoni

11 days ago

🚨 New paper out! 📄 What happens when LLMs & RLMs face conflicting answers to a question? 🤔 They often ignore disagreement and confidently pick one “correct” answer. 🤯 📄 arxiv.org/pdf/2508.12355 #AI #LLM #NLP #MachineLearning

thumb_up_off_alt23

chat_bubble_outline1

repeat7

shareShare

Fan Nie

@fannie1208

8 days ago

How it works? [4/n] 🔹 UQ-Dataset: 500 challenging, diverse unsolved questions sourced from Stack Exchange. They are super hard and arise naturally when humans seek answers. 🔹 UQ-Validator: No ground-truth → traditional metrics fail. Instead, we designed compound validation

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare