Tong Yang (@tongyang_666) 's Twitter Profile
Tong Yang

@tongyang_666

I'm a PhD student in CMU, ECE department. My research focus on machine learning, especially theory and optimization

ID: 1686642135855136768

calendar_today02-08-2023 07:36:50

6 Tweet

71 Takipçi

51 Takip Edilen

Andy Zou (@andyzou_jiaming) 's Twitter Profile Photo

No LLM is secure! A year ago, we unveiled the first of many automated jailbreak capable of cracking all major LLMs. 🚨 But there is hope?! We introduce Short Circuiting: the first alignment technique that is adversarially robust. 🧵 📄 Paper: arxiv.org/abs/2406.04313

No LLM is secure! A year ago, we unveiled the first of many automated jailbreak capable of cracking all major LLMs. 🚨

But there is hope?!

We introduce Short Circuiting: the first alignment technique that is adversarially robust. 🧵

📄 Paper: arxiv.org/abs/2406.04313
Gray Swan AI (@grayswanai) 's Twitter Profile Photo

🚨Ultimate Jailbreaking Championship 2024 🚨 Hackers vs. AI in the arena. Let the battle begin! 🏆 $40,000 in Bounties 🗓️ Sept 7, 2024 @ 10AM PDT 🔗Register Now: app.grayswan.ai/arena

🚨Ultimate Jailbreaking Championship 2024 🚨

Hackers vs. AI in the arena. Let the battle begin!

🏆 $40,000 in Bounties
🗓️ Sept 7, 2024 @ 10AM PDT

🔗Register Now: app.grayswan.ai/arena
Tong Yang (@tongyang_666) 's Twitter Profile Photo

🚨 🔥 Multi-step reasoning is key to solving complex problems — and Transformers with Chain-of-Thought can do it surprisingly well. 🤔 But how does CoT function as a learned scratchpad that lets even shallow Transformers run sequential algorithms that would otherwise require

🚨 🔥 Multi-step reasoning is key to solving complex problems — and Transformers with Chain-of-Thought can do it surprisingly well.

🤔 But how does CoT function as a learned scratchpad that lets even shallow Transformers run sequential algorithms that would otherwise require
Yu Huang (@yuhuang42) 's Twitter Profile Photo

Excited to share our recent work! We provide a mechanistic understanding of long CoT reasoning in state-tracking: when do transformers length-generalize strongly, when they stall, and how recursive self-training pushes the boundary. 🧵(1/8)

Excited to share our recent work!  We provide a mechanistic understanding of long CoT reasoning in state-tracking: when do transformers length-generalize strongly, when they stall, and how recursive self-training pushes the boundary. 🧵(1/8)