Jay Huang✈️ICLR2025🇸🇬 (@jentsehuang) Twitter Tweets • TwiCopy

Jay Huang✈️ICLR2025🇸🇬

@jentsehuang

+ Follow

#NLProc. Postdoc @JohnsHopkins. PhD @CUHKofficial. BS @PKU1898. Previous: @USC @TencentGlobal. LLM + Social Science, Multi-Agent, AI Fairness.

ID: 1664096530415124481

linkhttps://penguinnnnn.github.io/ calendar_today01-06-2023 02:29:18

237 Tweet

586 Followers

830 Following

Irene Li

@irenelizihui

6 months ago

📢 Today, we release #MMLUProX, which upgrades MMLU-Pro to 29 languages across 14 disciplines—11,829 reasoning-heavy Qs per language (≈342 k total). The toughest multilingual stress test for today’s LLMs! 🌐🧠 Heartfelt thanks to everyone who contributed.🤝

thumb_up_off_alt62

chat_bubble_outline1

repeat17

shareShare

Jiahao Xu

@jiahaox82739261

6 months ago

🚨 Announcing DeepTheorem: Revolutionizing LLM Mathematical Reasoning! 🚀 𝕋𝕃𝔻ℝ: - 🌟 Learning by exploration is the most important rationale that recent RL-zero training teaches us since self-exploration significantly boosts the utilization of LLM pre-training knowledge; -

thumb_up_off_alt154

chat_bubble_outline0

repeat58

shareShare

Murat Kocaoglu

@murat_kocaoglu_

5 months ago

I am pleased to announce that I will be joining Johns Hopkins University's Computer Science Department JHU Computer Science as an Assistant Professor in Fall 2025. I am grateful to my mentors for their unwavering support and to my exceptional PhD students for advancing our lab's vision.

thumb_up_off_alt461

chat_bubble_outline45

repeat18

shareShare

Anand Bhattad

@anand_bhattad

5 months ago

I’m thrilled to share that I will be joining Johns Hopkins University’s Department of Computer Science (JHU Computer Science, Johns Hopkins Data Science and AI Institute) as an Assistant Professor this fall.

I’m thrilled to share that I will be joining Johns Hopkins University’s Department of Computer Science (<a href="/JHUCompSci/">JHU Computer Science</a>, <a href="/HopkinsDSAI/">Johns Hopkins Data Science and AI Institute</a>) as an Assistant Professor this fall.

thumb_up_off_alt1,1K

chat_bubble_outline80

repeat33

shareShare

Zhaopeng Tu

@tuzhaopeng

5 months ago

When eyes and memory clash, who wins? 👁️🧠 Introducing a comprehensive study on vision-knowledge conflicts in MLLMs, where visual input contradicts the model's internal commonsense knowledge—and the results might surprise you. #ACL2025NLP 📈 We developed an automated framework

thumb_up_off_alt27

chat_bubble_outline0

repeat8

shareShare

Zhaopeng Tu

@tuzhaopeng

5 months ago

Can MLLMs truly "see" safety risks in image-text combinations? 🌲🖼️ Introducing MMSafetyAwareness, the first comprehensive benchmark for multimodal safety awareness in MLLMs, featuring 1,500 image-prompt pairs across 29 safety scenarios to evaluate whether models correctly

thumb_up_off_alt50

chat_bubble_outline1

repeat11

shareShare

Kai Chen

@kaichen23

5 months ago

🤔How well do LLMs adapt to different norms? 🧵We introduce STEER-BENCH, a benchmark for assessing steerability in LLMs. 📉 Human: 81% | Top LLM: ~65% 🚨 Norm alignment ≠ solved. 📄 Paper: arxiv.org/abs/2505.20645 Zihao He Taiwei Shi 🇺🇦 Kristina Lerman 🇺🇦

thumb_up_off_alt15

chat_bubble_outline1

repeat4

shareShare

Jay Huang✈️ICLR2025🇸🇬

@jentsehuang

5 months ago

Congrats Minghao!!!

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Jay Huang✈️ICLR2025🇸🇬

@jentsehuang

5 months ago

Let’s jailbreak step by step!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Omar Shaikh

@oshaikh13

5 months ago

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

thumb_up_off_alt181

chat_bubble_outline12

repeat57

shareShare

Pan Lu

@lupantech

5 months ago

Do LLMs truly understand math proofs, or just guess? 🤔Our new study on #IneqMath dives deep into Olympiad-level inequality proofs & reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs. ➡️ ineqmath.github.io To tackle

thumb_up_off_alt180

chat_bubble_outline11

repeat40

shareShare

Mark Dredze

@mdredze

5 months ago

Our new paper explores knowledge conflict in LLMs. It also issues a word of warning to those using LLMs as a Judge: the model can't help but inject its own knowledge into its decisions.

thumb_up_off_alt43

chat_bubble_outline0

repeat9

shareShare

Jay Huang✈️ICLR2025🇸🇬

@jentsehuang

5 months ago

Interesting work!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Jay Huang✈️ICLR2025🇸🇬

@jentsehuang

4 months ago

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Yueqi Song

@yueqi_song

4 months ago

We have a long way to go on visual reasoning. Our VisualPuzzles benchmark🧩shows similar findings, where the best models still can’t beat the bottom 5% of humans. 👉Check out our threads: x.com/yueqi_song/sta…