Tong Yang (@tongyang_666) Twitter Tweets • TwiCopy

Tong Yang

@tongyang_666

+ Follow

I'm a PhD student in CMU, ECE department. My research focus on machine learning, especially theory and optimization

ID: 1686642135855136768

calendar_today02-08-2023 07:36:50

6 Tweet

71 Takipçi

51 Takip Edilen

Aran Komatsuzaki

@arankomatsuzaki

2 years ago

Google presents Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF arxiv.org/abs/2405.19320

thumb_up_off_alt178

chat_bubble_outline3

repeat42

shareShare

No LLM is secure! A year ago, we unveiled the first of many automated jailbreak capable of cracking all major LLMs. 🚨 But there is hope?! We introduce Short Circuiting: the first alignment technique that is adversarially robust. 🧵 📄 Paper: arxiv.org/abs/2406.04313

thumb_up_off_alt658

chat_bubble_outline15

repeat103

shareShare

Gray Swan AI

@grayswanai

a year ago

🚨Ultimate Jailbreaking Championship 2024 🚨 Hackers vs. AI in the arena. Let the battle begin! 🏆 $40,000 in Bounties 🗓️ Sept 7, 2024 @ 10AM PDT 🔗Register Now: app.grayswan.ai/arena

thumb_up_off_alt118

chat_bubble_outline14

repeat28

shareShare

Tong Yang

@tongyang_666

4 months ago

🚨 🔥 Multi-step reasoning is key to solving complex problems — and Transformers with Chain-of-Thought can do it surprisingly well. 🤔 But how does CoT function as a learned scratchpad that lets even shallow Transformers run sequential algorithms that would otherwise require

thumb_up_off_alt121

chat_bubble_outline2

repeat22

shareShare

Yu Huang

@yuhuang42

a month ago

Excited to share our recent work! We provide a mechanistic understanding of long CoT reasoning in state-tracking: when do transformers length-generalize strongly, when they stall, and how recursive self-training pushes the boundary. 🧵(1/8)

thumb_up_off_alt225

chat_bubble_outline6

repeat42

shareShare

Tong Yang

Aran Komatsuzaki

Andy Zou

Gray Swan AI

Tong Yang

Yu Huang