Dylan X. Hou (@xinminghou) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

📢 Excited to announce our new paper Language-guided world models: A model-based approach to AI control • We develop LWMs: world models that can read texts to capture new environment dynamics • These models enable humans to efficiently control agents by providing language

thumb_up_off_alt200

chat_bubble_outline4

repeat46

shareShare

Weijie Su

@weijie444

a year ago

New Research (w/ amazing Hangfeng He) "A Law of Next-Token Prediction in Large Language Models" LLMs rely on NTP, but their internal mechanisms seem chaotic. It's difficult to discern how each layer processes data for NTP. Surprisingly, we discover a physics-like law on NTP:

New Research (w/ amazing <a href="/hangfeng_he/">Hangfeng He</a>)

"A Law of Next-Token Prediction in Large Language Models"

LLMs rely on NTP, but their internal mechanisms seem chaotic. It's difficult to discern how each layer processes data for NTP. Surprisingly, we discover a physics-like law on NTP:

thumb_up_off_alt430

chat_bubble_outline8

repeat91

shareShare

Kristina Gligorić

@krisgligoric

a year ago

LLMs have been proposed for annotation tasks. But, LLMs are biased and make errors. Can we draw * valid * conclusions from LLM annotations? arxiv.org/abs/2408.15204

thumb_up_off_alt158

chat_bubble_outline3

repeat34

shareShare

AI at Meta

@aiatmeta

a year ago

New research paper from Meta FAIR – Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model. Chunting Zhou, Lili Yu (ICLR2025) and team introduce this recipe for training a multi-modal model over discrete and continuous data. Transfusion combines next token

New research paper from Meta FAIR – Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model.

<a href="/violet_zct/">Chunting Zhou</a>, <a href="/liliyu_lili/">Lili Yu (ICLR2025)</a> and team introduce this recipe for training a multi-modal model over discrete and continuous data. Transfusion combines next token

thumb_up_off_alt785

chat_bubble_outline15

repeat144

shareShare

xuan (ɕɥɛn / sh-yen)

@xuanalogue

a year ago

Should AI be aligned with human preferences, rewards, or utility functions? Excited to finally share a preprint that Micah Carroll Matija Hal Ashton & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!

Should AI be aligned with human preferences, rewards, or utility functions?

Excited to finally share a preprint that <a href="/MicahCarroll/">Micah Carroll</a> <a href="/FranklinMatija/">Matija</a> <a href="/hal_ashton/">Hal Ashton</a> & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!

thumb_up_off_alt993

chat_bubble_outline33

repeat172

shareShare

Omar Khattab

@lateinteraction

a year ago

🔗 Thoughts on Research Impact in AI. Grad students often ask: how do I do research that makes a difference in the current, crowded AI space? This is a blogpost that summarizes my perspective in six guidelines for making research impact via open-source artifacts. Link below.

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat264

shareShare

Jillian Fisher

@jrfisher552

10 months ago

How do biased AI models effect human decision-making? 🤔 Our latest paper, “Biased AI can Influence Political Decision-Making”, uses two interactive tasks which show that exposure to partisan AI can sway opinions—no matter your political stance! 🗳️ Paper: arxiv.org/abs/2410.06415

thumb_up_off_alt34

chat_bubble_outline4

repeat7

shareShare

Rohan Pandey

@rohan99pandey

10 months ago

1/7 With all the buzz around PhD applications, I've felt one of the things missing in the narrative experience of PhDing itself. There's great advice on the application process, there's little talk about how it really is.

thumb_up_off_alt144

chat_bubble_outline5

repeat16

shareShare

Nathan Lambert

@natolambert

10 months ago

One of the first papers studying inference time personalization. One of the great ways we can make open models better suited to your needs than APIs. PAD: Personalized Alignment at Decoding-Time (similar ideas to our social choice position paper from earlier in the year)

thumb_up_off_alt251

chat_bubble_outline4

repeat40

shareShare

Yixin Liu

@yixinliu17

10 months ago

LLMs are often used to evaluate the instruction-following capabilities of other LLMs – but which LLM should we choose, and how should we use it? 🤔 We're excited to share "ReIFE: Re-evaluating Instruction-Following Evaluation"! Preprint: arxiv.org/abs/2410.07069 📊 Our study is

thumb_up_off_alt75

chat_bubble_outline2

repeat13

shareShare

Nathan Lambert

@natolambert

9 months ago

Newest *PO preference tuning paper at least feels substantially different from a lot of the others from earlier in 2024. TPO: Tree Preference Optimization Liao et al It creates a latent space to rank many options between steps. Like DPO meets tree search / PRMs. With some

thumb_up_off_alt179

chat_bubble_outline1

repeat23

shareShare

Tao Yu

@taoyds

9 months ago

🍅Excited to see Anthropic using 🚀our OSWorld🚀(NeurIPS'24) to benchmark computer use! 🍋OSWorld will soon support parallel cloud running, much faster! 🍓More multimodal agent open-source big projects coming soon from XLANG NLP Lab in Nov- stay tuned! 👇os-world.github.io

🍅Excited to see <a href="/AnthropicAI/">Anthropic</a> using 🚀our OSWorld🚀(NeurIPS'24) to benchmark computer use!

🍋OSWorld will soon support parallel cloud running, much faster!

🍓More multimodal agent open-source big projects coming soon from <a href="/XLangNLP/">XLANG NLP Lab</a> in Nov- stay tuned!

👇os-world.github.io

thumb_up_off_alt205

chat_bubble_outline9

repeat38

shareShare

Ilia Sucholutsky

@sucholutsky

9 months ago

So excited to share that this was published in Nature Human Behaviour! 🥳 It's time to build AI thought partners that learn & think *with* people rather than *instead of* people. 🧠🤝🤖 We lay out what that means, why it matters, and how it can be done! nature.com/articles/s4156…

thumb_up_off_alt90

chat_bubble_outline2

repeat18

shareShare

LLM360

@llm360

9 months ago

📣Proud to share Web2Code: a Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs was accepted to NeurIPS Conference 2024! About Web2Code: 📸 novel image + html dataset 📈webpage code gen benchmark 🧠CrystalChat-7B-Web2Code Blog: mbzuai-llm.github.io/webpage2code/

📣Proud to share Web2Code: a Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs was accepted to <a href="/NeurIPSConf/">NeurIPS Conference</a> 2024!

About Web2Code:
📸 novel image + html dataset
📈webpage code gen benchmark
🧠CrystalChat-7B-Web2Code

Blog: mbzuai-llm.github.io/webpage2code/

thumb_up_off_alt60

chat_bubble_outline2

repeat21

shareShare

Zhijing Jin✈️ ICLR Singapore

@zhijingjin

9 months ago

Really excited about my second Nature subjournal paper "Inferring Human Behavior from Language" nature.com/articles/s4156… Thanks a lot to Rada Mihalcea James W Pennebaker Laura Biester Ryan Boyd Verónica Pérez-Rosas Steve Wilson for this NLP+Psychology collaboration!

Really excited about my second Nature subjournal paper "Inferring Human Behavior from Language" nature.com/articles/s4156… Thanks a lot to <a href="/radamihalcea/">Rada Mihalcea</a> <a href="/jwpennebaker/">James W Pennebaker</a> <a href="/lbiester23/">Laura Biester</a> <a href="/DrRyanBoyd/">Ryan Boyd</a> <a href="/vperez_r/">Verónica Pérez-Rosas</a> <a href="/SteveWilsonNLP/">Steve Wilson</a> for this NLP+Psychology collaboration!

thumb_up_off_alt109

chat_bubble_outline2

repeat14

shareShare

Jing-Jing Li

@drjingjing2026

9 months ago

🚨 New preprint from my internship Ai2! Introducing SafetyAnalyst, an LLM content moderation framework that 📌 builds structured “harm-benefit trees” given a prompt 📌 weights harms against benefits 📌 delivers interpretable, transparent, and steerable safety decisions

🚨 New preprint from my internship <a href="/allen_ai/">Ai2</a>!

Introducing SafetyAnalyst, an LLM content moderation framework that
📌 builds structured “harm-benefit trees” given a prompt
📌 weights harms against benefits
📌 delivers interpretable, transparent, and steerable safety decisions

thumb_up_off_alt116

chat_bubble_outline2

repeat18

shareShare

Tao Yu

@taoyds

9 months ago

🍅Surprising finding: Basic adversarial pop-ups trick state-of-the-art VLMs (e.g., Anthropic computer use agent) into clicking 🚩>90%🚩of the time in OSworld! 🥝Clear signal: We need more robust safety measures before deploying computer use agents at scale.

🍅Surprising finding: Basic adversarial pop-ups trick state-of-the-art VLMs (e.g., <a href="/AnthropicAI/">Anthropic</a> computer use agent) into clicking 🚩>90%🚩of the time in OSworld!

🥝Clear signal: We need more robust safety measures before deploying computer use agents at scale.

thumb_up_off_alt113

chat_bubble_outline2

repeat18

shareShare

Tao Yu

@taoyds

3 months ago

🤔Static CUA benchmarks enable fast model dev but lack task variety and risk overfitting. Computer Agent Arena tests crowdsourced real-world tasks. OSWorld: 🥇UI-Tars1.5🥈Operator🥉Claude 3.7 CUA Arena: 🥇Claude 3.7🥈Operator🥉UI-Tars1.5 🚀Rankings likely to evolve quickly

thumb_up_off_alt36

chat_bubble_outline0

repeat11

shareShare

DeepSeek

@deepseek_ai

2 months ago

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗

thumb_up_off_alt9,9K

chat_bubble_outline386

repeat1,1K

shareShare

Yongyi Zang

@yongyi_zang

2 months ago

🚨New Audio Benchmark 🚨We find standard LLMs can solve Music-QA benchmarks by just guessing from text only, + LALMs can still answer well when given noise instead of music! Presenting RUListening: A fully automated pipeline for making Audio-QA benchmarks *actually* assess

thumb_up_off_alt26

chat_bubble_outline1

repeat10

shareShare

Dylan X. Hou

Gate.io

Khanh Nguyen (on job market)

Weijie Su

Kristina Gligorić

AI at Meta

xuan (ɕɥɛn / sh-yen)

Omar Khattab

Jillian Fisher

Rohan Pandey

Nathan Lambert

Yixin Liu

Nathan Lambert

Tao Yu

Ilia Sucholutsky

LLM360

Zhijing Jin✈️ ICLR Singapore

Jing-Jing Li

Tao Yu

Tao Yu

DeepSeek

Yongyi Zang