Yi Zeng 曾祎 (@EasonZeng623) Twitter Tweets • TwiCopy

Yi Zeng 曾祎

@EasonZeng623

+ Follow

probe to improve | Ph.D. @VTEngineering | Amazon Research Fellow | #AI_security 🛡 #Adversarial ⚔️ #Backdoors 🎠 I deal with the dark side of machine learning.

ID:901961911448723456

linkhttps://www.yi-zeng.com calendar_today28-08-2017 00:17:32

460 Tweets

1,0K Followers

1,0K Following

Grant Sanderson

2 weeks ago

The next chapter about transformers is up on YouTube, digging into the attention mechanism: youtu.be/eMlx5fFNoYc

The model works with vectors representing tokens (think words), and this is the mechanism that allows those vectors to take in meaning from context.

thumb_up_off_alt5,2K

chat_bubble_outline0

account_circle

Tianyu Pang

3 weeks ago

😯The 100% jailbreaking of GPT3.5/4, Llama-2, Gemma, and Claude2/3 is truly impressive! Smart ideas for leveraging API access.

Congrats to Maksym Andriushchenko 🇺🇦 francesco croce TML Lab (EPFL) !

arxiv.org/abs/2404.02151

thumb_up_off_alt103

chat_bubble_outline0

account_circle

Vanshaj Khattar

@VanshajKhattar

3 weeks ago

Thrilled to announce the Trustworthy Interactive Decision-Making with Foundation Models workshop at #IJCAI2024 ! 🌟 A great opportunity for dialogue on ethical AI decision-making, blending human-centric methods & foundation models.

Thrilled to announce the Trustworthy Interactive Decision-Making with Foundation Models workshop at #IJCAI2024! 🌟 A great opportunity for dialogue on ethical AI decision-making, blending human-centric methods & foundation models.

thumb_up_off_alt9

chat_bubble_outline0

account_circle

Haize Labs

4 weeks ago

‼️⚠️bad day to be a LLM⚠️‼️

Haize Labs took one of our favorite adversarial attack algorithms, GCG, and made it *38x* faster

‼️⚠️bad day to be a LLM⚠️‼️ @haizelabs took one of our favorite adversarial attack algorithms, GCG, and made it *38x* faster

thumb_up_off_alt69

chat_bubble_outline0

account_circle

Ethan Mollick

3 weeks ago

Two of the best elaborate parody sites in recent memory are on exactly the opposite sides of the AI safety debate.

Open Asteroid Impact (openasteroidimpact.org) and Goody 2 (goody2.ai) which seems to have an actual LLM.

We need more debates via maximalist parody.

Two of the best elaborate parody sites in recent memory are on exactly the opposite sides of the AI safety debate. Open Asteroid Impact (openasteroidimpact.org) and Goody 2 (goody2.ai) which seems to have an actual LLM. We need more debates via maximalist parody.

thumb_up_off_alt136

chat_bubble_outline0

account_circle

Bindu Reddy

4 weeks ago

Mix and Match Your LLMS To Avoid A High Latency Hellscape

Most LLM apps and AI agents need multiple calls to an LLM. As many as 5-10 calls are required for a moderately complex LLM app or AI agent. Calling GPT-4 or Claude is highly impractical, and you will soon be in

Mix and Match Your LLMS To Avoid A High Latency Hellscape Most LLM apps and AI agents need multiple calls to an LLM. As many as 5-10 calls are required for a moderately complex LLM app or AI agent. Calling GPT-4 or Claude is highly impractical, and you will soon be in

thumb_up_off_alt268

chat_bubble_outline0

account_circle

Pietro Schirano

1 month ago

Introducing Maestro ✨

A framework for Claude Opus to orchestrate subagents.

Simply ask for a goal, and Opus will break it down and intelligently orchestrate instances of Haiku to execute subtasks, which Opus will review at the end. 🧙‍♂️

The output is saved as markdown 👇

thumb_up_off_alt1,0K

chat_bubble_outline0

account_circle

Chulin Xie

1 month ago

Some text data is private & cannot be shared... Can we generate synthetic replicas with privacy guarantees?🤔

Instead of DP-SGD finetuning, use Aug-PE with inference APIs! Compatible with strong LLMs (GPT-3.5, Mistral), where DP-SGD is infeasible.
🔗alphapav.github.io/augpe-dpapitext [1/n]

Some text data is private & cannot be shared... Can we generate synthetic replicas with privacy guarantees?🤔 Instead of DP-SGD finetuning, use Aug-PE with inference APIs! Compatible with strong LLMs (GPT-3.5, Mistral), where DP-SGD is infeasible. 🔗alphapav.github.io/augpe-dpapitext [1/n]

thumb_up_off_alt81

chat_bubble_outline0

account_circle

Google DeepMind

@GoogleDeepMind

1 month ago

Introducing SIMA: the first generalist AI agent to follow natural-language instructions in a broad range of 3D virtual environments and video games. 🕹️

It can complete tasks similar to a human, and outperforms an agent trained in just one setting. 🧵 dpmd.ai/3TiYV7d

thumb_up_off_alt3,9K

chat_bubble_outline0

account_circle

Dan Hendrycks

1 month ago

Can hazardous knowledge be unlearned from LLMs without harming other capabilities?

We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge.

📝arxiv.org/abs/2403.03218
🔗wmdp.ai

Can hazardous knowledge be unlearned from LLMs without harming other capabilities? We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge. 📝arxiv.org/abs/2403.03218 🔗wmdp.ai

thumb_up_off_alt238

chat_bubble_outline0

account_circle

Shayne Longpre

1 month ago

Independent AI research should be valued and protected.

In an open letter signed by over a 100 researchers, journalists, and advocates, we explain how AI companies should support it going forward.

sites.mit.edu/ai-safe-harbor/

1/

Independent AI research should be valued and protected. In an open letter signed by over a 100 researchers, journalists, and advocates, we explain how AI companies should support it going forward. sites.mit.edu/ai-safe-harbor/ 1/

thumb_up_off_alt230

chat_bubble_outline0

account_circle

Minzhou Pan

2 months ago

💯💯💯🎇

thumb_up_off_alt3

chat_bubble_outline0

account_circle

Yihe Deng

2 months ago

Large Vision Language Models are prone to object hallucinations – how to cost-efficiently address this issue? 🚀 Introducing MARINE: a training-free, API-free framework to tackle object hallucinations.

Joint work with an amazing team Linxi Zhao Weitong ZHANG and Quanquan Gu!

Large Vision Language Models are prone to object hallucinations – how to cost-efficiently address this issue? 🚀 Introducing MARINE: a training-free, API-free framework to tackle object hallucinations. Joint work with an amazing team @linxizhao4 @WeitongZhang and @QuanquanGu!

thumb_up_off_alt171

chat_bubble_outline0

account_circle

Boyi Wei

2 months ago

Wondering why LLM safety mechanisms are fragile? 🤔
😯 We found safety-critical regions in aligned LLMs are sparse: ~3% of neurons/ranks
⚠️Sparsity makes safety easy to undo. Even freezing these regions during fine-tuning still leads to jailbreaks
🔗 boyiwei.com/alignment-attr…
[1/n]

Wondering why LLM safety mechanisms are fragile? 🤔 😯 We found safety-critical regions in aligned LLMs are sparse: ~3% of neurons/ranks ⚠️Sparsity makes safety easy to undo. Even freezing these regions during fine-tuning still leads to jailbreaks 🔗 boyiwei.com/alignment-attr… [1/n]

thumb_up_off_alt158

chat_bubble_outline0

account_circle

qnguyen3

3 months ago

just found out this library. incredibly easy to get Mixtral ready
github.com/hiyouga/LLaMA-…

thumb_up_off_alt557

chat_bubble_outline0

account_circle

Ethan Mollick

3 months ago

Increasingly finding that the one thing that most makes managers panic about AI is showing them, not the advanced features of GPT-4, but rather the fact that Copilot for Office can create an OK PowerPoint with speaker notes from a document in 47 seconds.

This is all real time.

thumb_up_off_alt2,6K

chat_bubble_outline0

account_circle

Yu Yang

3 months ago

🎉 Two of my papers have been accepted this week at #ICLR2024 & #AISTATS !
Big thanks and congrats to co-authors Xuxi Chen & Eric Gan, mentors Atlas Wang & Gintare Karolina Dziugaite, and especially my advisor Baharan Mirzasoleiman! 🙏
More details on both papers after the ICML deadline!

🎉 Two of my papers have been accepted this week at #ICLR2024 & #AISTATS! Big thanks and congrats to co-authors @xxchenxx_ut & Eric Gan, mentors Atlas Wang & @gkdziugaite, and especially my advisor @baharanm! 🙏 More details on both papers after the ICML deadline!

thumb_up_off_alt200

chat_bubble_outline0

account_circle

Secure Learning Lab (SLL)

3 months ago

LLMs can be backdoored easily, even with the use of chain of thought, which we show can be turned into another weakness.
Our paper BadChain accepted by ICLR provides the first backdoor attack against LLMs with COT prompting. openreview.net/pdf?id=S4cYxIN…

thumb_up_off_alt42

chat_bubble_outline0

account_circle

Javier Rando

3 months ago

Our paper on ✨ universal LLM backdoors ✨ through RLHF poisoning has been accepted at ICLR 2024 2024. See you in Vienna! #ICLR2024

thumb_up_off_alt54

chat_bubble_outline0

account_circle

Yi Zeng 曾祎

3 months ago

🌶️ Uncovering the potential risks of LLM customization via finetuning, accepted as an oral (1.2%) at ICLR 2024’24

Couldn't be prouder of my team!
🌐: llm-tuning-safety.github.io.

(we also studied LLM backdoors and its’ implications, Evan Hubinger Anthropic, consider citing us 😕)

thumb_up_off_alt69

chat_bubble_outline0

account_circle

fpc ok :)