
Yiwei Wang
@wangyiw33973985
Postdoc at UCLA CS @UCLA, @UCLANLP
ID: 1555047236240752640
https://wangywust.github.io/ 04-08-2022 04:25:36
39 Tweet
118 Takipรงi
70 Takip Edilen







Jailbreaking through asking for coherent outputs ๐งต๐ Read of the day, day 49: Frustratingly Easy Jailbreak of Large Language Models via Output Prefix Attacks, by Yiwei Wang et al from UCLA Yet another way of breaking through LLMโs defense systems found. This idea is





๐ ๐๐ฎ๐ฅ๐ญ๐ข๐ฆ๐จ๐๐๐ฅ ๐๐๐๐ ๐ DPO over-prioritizes language-only preference ๐ Introducing mDPO: optimizes image-conditioned preference ๐ Best 3B MLLM with reduced hallucination, beats LLaVA 7/13B with DPO Collaboration with Microsoft Research huggingface.co/papers/2406.11โฆ





๐จVision-Language Models (VLMs) are truly amazing. Ever wonder if their visual and textual "brains" always agree? I am excited to share our latest paper, where we tackle a critical challenge in VLMs, dubbed the ๐๐ซ๐จ๐ฌ๐ฌ-๐ฆ๐จ๐๐๐ฅ๐ข๐ญ๐ฒ ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐ซ๐ข๐ ๐ค๐ง๐จ๐ฐ๐ฅ๐๐๐ ๐


