Luca Beurer-Kellner (@lbeurerkellner) 's Twitter Profile
Luca Beurer-Kellner

@lbeurerkellner

working on secure agentic AI @invariantlabsai

PhD @the_sri_lab, ETH Zürich. Also: @lmqllang and @projectlve.

ID: 67704155

linkhttps://lbeurerkellner.github.io calendar_today21-08-2009 20:05:29

285 Tweet

1,1K Takipçi

274 Takip Edilen

Luca Beurer-Kellner (@lbeurerkellner) 's Twitter Profile Photo

I haven't checked the experiments, but my experience aligns with this. Constraining a model on a token level can affect task accuracy.

Luca Beurer-Kellner (@lbeurerkellner) 's Twitter Profile Photo

Cool work about computer use agents. Even worse: Safeguarding CUAs is extremely challenging, as the scaffolding does not have high-level understanding of what’s going on, i.e. how do you know which mouse click is “open link” vs “delete the database x.com/aichberger/sta…

Johann Rehberger (@wunderwuzzi23) 's Twitter Profile Photo

The ZombAIs have arrived in Codex! Prompt injection to C2. Be careful out there! This PoC uses a domain from the Common Dependencies allowlist when enabling restricted Internet access That allowed for a compromise via indirect prompt injection and getting command and control

The ZombAIs have arrived in Codex!

Prompt injection to C2. Be careful out there!

This PoC uses a domain from the Common Dependencies allowlist when enabling restricted Internet access

That allowed for a compromise via indirect prompt injection and getting command and control
Critical Thinking - Bug Bounty Podcast (@ctbbpodcast) 's Twitter Profile Photo

Advanced prompt injection targets the LLM’s core logic, aiming to not only make it output some weird things but also to manipulate how the model interprets complex data and use its internal tools.

John(Yueh-Han) Chen (@jcyhc_ai) 's Twitter Profile Photo

LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately. 💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings! 🛡️Our monitoring method defends with 93% success! 🧵

LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately.

💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings!
🛡️Our monitoring method defends with 93% success! 🧵
Luca Beurer-Kellner (@lbeurerkellner) 's Twitter Profile Photo

There’s incredible potential in combining LLM-based code generation (“vibe coding”) with e.g. model-driven SWE. Also promising: programming languages and libraries designed specifically for both LLM generation and human readability. Still a lot of greenfield here.

Luca Beurer-Kellner (@lbeurerkellner) 's Twitter Profile Photo

🌊Honored to announce that I was invited to present a talk at the European Lighthouse on Secure and Safe AI, GA 2025 ELSA - European Lighthouse on Secure and Safe AI If you are in and around Brussels this week and you want to meet and discuss AI agent safety, let me know. The talk will be on Wed morning.

🌊Honored to announce that I was invited to present a talk at the European Lighthouse on Secure and Safe AI, GA 2025 <a href="/elsa_lighthouse/">ELSA - European Lighthouse on Secure and Safe AI</a>

If you are in and around Brussels this week and you want to meet and discuss AI agent safety, let me know.

The talk will be on Wed morning.
Snyk (@snyksec) 's Twitter Profile Photo

We’re proud to announce we’ve acquired Invariant Labs to deepen our defense against agentic AI threats. Invariant Labs joins Snyk Labs to advance real-time protection for AI-native apps. One platform. Full AI security. Learn more: bit.ly/3GdBTMk

We’re proud to announce we’ve acquired <a href="/InvariantLabsAI/">Invariant Labs</a> to deepen our defense against agentic AI threats. Invariant Labs joins Snyk Labs to advance real-time protection for AI-native apps. One platform. Full AI security. Learn more: bit.ly/3GdBTMk
Johann Rehberger (@wunderwuzzi23) 's Twitter Profile Photo

🚨 Security Advisory: Anthropic's Slack MCP Server leaks data via link unfurling ☠️ See a demo exploit with Claude Code connected to the MCP server, and how a prompt injection attack can leak developer secrets. Watch and learn!