Yi Zeng 曾祎(@EasonZeng623) 's Twitter Profileg
Yi Zeng 曾祎

@EasonZeng623

probe to improve | Ph.D. @VTEngineering | Amazon Research Fellow | #AI_security 🛡 #Adversarial ⚔️ #Backdoors 🎠 I deal with the dark side of machine learning.

ID:901961911448723456

linkhttps://www.yi-zeng.com calendar_today28-08-2017 00:17:32

460 Tweets

1,0K Followers

1,0K Following

Grant Sanderson(@3blue1brown) 's Twitter Profile Photo

The next chapter about transformers is up on YouTube, digging into the attention mechanism: youtu.be/eMlx5fFNoYc

The model works with vectors representing tokens (think words), and this is the mechanism that allows those vectors to take in meaning from context.

account_circle
Tianyu Pang(@TianyuPang1) 's Twitter Profile Photo

😯The 100% jailbreaking of GPT3.5/4, Llama-2, Gemma, and Claude2/3 is truly impressive! Smart ideas for leveraging API access.

Congrats to Maksym Andriushchenko 🇺🇦 francesco croce TML Lab (EPFL) !

arxiv.org/abs/2404.02151

account_circle
Vanshaj Khattar(@VanshajKhattar) 's Twitter Profile Photo

Thrilled to announce the Trustworthy Interactive Decision-Making with Foundation Models workshop at ! 🌟 A great opportunity for dialogue on ethical AI decision-making, blending human-centric methods & foundation models.

Thrilled to announce the Trustworthy Interactive Decision-Making with Foundation Models workshop at #IJCAI2024! 🌟 A great opportunity for dialogue on ethical AI decision-making, blending human-centric methods & foundation models.
account_circle
Haize Labs(@haizelabs) 's Twitter Profile Photo

‼️⚠️bad day to be a LLM⚠️‼️

Haize Labs took one of our favorite adversarial attack algorithms, GCG, and made it *38x* faster

‼️⚠️bad day to be a LLM⚠️‼️ @haizelabs took one of our favorite adversarial attack algorithms, GCG, and made it *38x* faster
account_circle
Ethan Mollick(@emollick) 's Twitter Profile Photo

Two of the best elaborate parody sites in recent memory are on exactly the opposite sides of the AI safety debate.

Open Asteroid Impact (openasteroidimpact.org) and Goody 2 (goody2.ai) which seems to have an actual LLM.

We need more debates via maximalist parody.

Two of the best elaborate parody sites in recent memory are on exactly the opposite sides of the AI safety debate. Open Asteroid Impact (openasteroidimpact.org) and Goody 2 (goody2.ai) which seems to have an actual LLM. We need more debates via maximalist parody.
account_circle
Bindu Reddy(@bindureddy) 's Twitter Profile Photo

Mix and Match Your LLMS To Avoid A High Latency Hellscape

Most LLM apps and AI agents need multiple calls to an LLM. As many as 5-10 calls are required for a moderately complex LLM app or AI agent. Calling GPT-4 or Claude is highly impractical, and you will soon be in

Mix and Match Your LLMS To Avoid A High Latency Hellscape Most LLM apps and AI agents need multiple calls to an LLM. As many as 5-10 calls are required for a moderately complex LLM app or AI agent. Calling GPT-4 or Claude is highly impractical, and you will soon be in
account_circle
Pietro Schirano(@skirano) 's Twitter Profile Photo

Introducing Maestro ✨

A framework for Claude Opus to orchestrate subagents.

Simply ask for a goal, and Opus will break it down and intelligently orchestrate instances of Haiku to execute subtasks, which Opus will review at the end. 🧙‍♂️

The output is saved as markdown 👇

account_circle
Chulin Xie(@ChulinXie) 's Twitter Profile Photo

Some text data is private & cannot be shared... Can we generate synthetic replicas with privacy guarantees?🤔

Instead of DP-SGD finetuning, use Aug-PE with inference APIs! Compatible with strong LLMs (GPT-3.5, Mistral), where DP-SGD is infeasible.
🔗alphapav.github.io/augpe-dpapitext [1/n]

Some text data is private & cannot be shared... Can we generate synthetic replicas with privacy guarantees?🤔 Instead of DP-SGD finetuning, use Aug-PE with inference APIs! Compatible with strong LLMs (GPT-3.5, Mistral), where DP-SGD is infeasible. 🔗alphapav.github.io/augpe-dpapitext [1/n]
account_circle
Google DeepMind(@GoogleDeepMind) 's Twitter Profile Photo

Introducing SIMA: the first generalist AI agent to follow natural-language instructions in a broad range of 3D virtual environments and video games. 🕹️

It can complete tasks similar to a human, and outperforms an agent trained in just one setting. 🧵 dpmd.ai/3TiYV7d

account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

Can hazardous knowledge be unlearned from LLMs without harming other capabilities?

We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge.

📝arxiv.org/abs/2403.03218
🔗wmdp.ai

Can hazardous knowledge be unlearned from LLMs without harming other capabilities? We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge. 📝arxiv.org/abs/2403.03218 🔗wmdp.ai
account_circle
Shayne Longpre(@ShayneRedford) 's Twitter Profile Photo

Independent AI research should be valued and protected.

In an open letter signed by over a 100 researchers, journalists, and advocates, we explain how AI companies should support it going forward.

sites.mit.edu/ai-safe-harbor/

1/

Independent AI research should be valued and protected. In an open letter signed by over a 100 researchers, journalists, and advocates, we explain how AI companies should support it going forward. sites.mit.edu/ai-safe-harbor/ 1/
account_circle
Yihe Deng(@Yihe__Deng) 's Twitter Profile Photo

Large Vision Language Models are prone to object hallucinations – how to cost-efficiently address this issue? 🚀 Introducing MARINE: a training-free, API-free framework to tackle object hallucinations.

Joint work with an amazing team Linxi Zhao Weitong ZHANG and Quanquan Gu!

Large Vision Language Models are prone to object hallucinations – how to cost-efficiently address this issue? 🚀 Introducing MARINE: a training-free, API-free framework to tackle object hallucinations. Joint work with an amazing team @linxizhao4 @WeitongZhang and @QuanquanGu!
account_circle
Boyi Wei(@wei_boyi) 's Twitter Profile Photo

Wondering why LLM safety mechanisms are fragile? 🤔
😯 We found safety-critical regions in aligned LLMs are sparse: ~3% of neurons/ranks
⚠️Sparsity makes safety easy to undo. Even freezing these regions during fine-tuning still leads to jailbreaks
🔗 boyiwei.com/alignment-attr…
[1/n]

Wondering why LLM safety mechanisms are fragile? 🤔 😯 We found safety-critical regions in aligned LLMs are sparse: ~3% of neurons/ranks ⚠️Sparsity makes safety easy to undo. Even freezing these regions during fine-tuning still leads to jailbreaks 🔗 boyiwei.com/alignment-attr… [1/n]
account_circle
Ethan Mollick(@emollick) 's Twitter Profile Photo

Increasingly finding that the one thing that most makes managers panic about AI is showing them, not the advanced features of GPT-4, but rather the fact that Copilot for Office can create an OK PowerPoint with speaker notes from a document in 47 seconds.

This is all real time.

account_circle
Yu Yang(@YUYANG_UCLA) 's Twitter Profile Photo

🎉 Two of my papers have been accepted this week at & !
Big thanks and congrats to co-authors Xuxi Chen & Eric Gan, mentors Atlas Wang & Gintare Karolina Dziugaite, and especially my advisor Baharan Mirzasoleiman! 🙏
More details on both papers after the ICML deadline!

🎉 Two of my papers have been accepted this week at #ICLR2024 & #AISTATS! Big thanks and congrats to co-authors @xxchenxx_ut & Eric Gan, mentors Atlas Wang & @gkdziugaite, and especially my advisor @baharanm! 🙏 More details on both papers after the ICML deadline!
account_circle
Secure Learning Lab (SLL)(@uiuc_aisecure) 's Twitter Profile Photo

LLMs can be backdoored easily, even with the use of chain of thought, which we show can be turned into another weakness.
Our paper BadChain accepted by ICLR provides the first backdoor attack against LLMs with COT prompting. openreview.net/pdf?id=S4cYxIN…

account_circle
Yi Zeng 曾祎(@EasonZeng623) 's Twitter Profile Photo

🌶️ Uncovering the potential risks of LLM customization via finetuning, accepted as an oral (1.2%) at ICLR 2024’24

Couldn't be prouder of my team!
🌐: llm-tuning-safety.github.io.

(we also studied LLM backdoors and its’ implications, Evan Hubinger Anthropic, consider citing us 😕)

account_circle