Giuseppe `N3mes1s` (@gn3mes1s) Twitter Tweets • TwiCopy

Mario Zechner

6 months ago

A new entry to my popular series "LLM tools for plebs": claude-trace - Injects itself into Claude Code - Logs all traffic - Reconstructs conversations and shows what's going on behind the scenes (system prompts, all tool inputs/outputs, and more) Some observations. 🧵

thumb_up_off_alt995

chat_bubble_outline27

repeat74

shareShare

Mario Zechner

@badlogicgames

6 months ago

Haiku is also part of their "Make Bash Safe Again" protection layer. Before running a Bash command, Haiku is asked to determine malicious command injection. I'm not sure I'd trust an LLM with this task. YOLO I guess.

thumb_up_off_alt35

chat_bubble_outline2

repeat4

shareShare

Ilia Shumailov🦔

@iliaishacked

6 months ago

Our new Google DeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.

Our new <a href="/GoogleDeepMind/">Google DeepMind</a> paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.

thumb_up_off_alt169

chat_bubble_outline4

repeat35

shareShare

Adam Chester 🏴‍☠️

@_xpn_

6 months ago

New blog post is up! Stepping out of my comfort zone (be kind), looking at Meta's Prompt Guard 2 model, how to misclassify prompts using the Unigram tokenizer and hopefully demonstrate why we should invest time looking beyond the API at how LLMs function. specterops.io/blog/2025/06/0…

thumb_up_off_alt123

chat_bubble_outline5

repeat47

shareShare

sarah guo // conviction

@saranormous

6 months ago

x.com/i/article/1929…

thumb_up_off_alt1,1K

chat_bubble_outline76

repeat164

shareShare

Mario Zechner

@badlogicgames

6 months ago

From the maker of claude-trace now comes claude-bridge. Use Claude Code with Google models, OpenAI models, or really any other provider with OpenAI endpoints. Including Ollama. Not sure why you would, cause Opus/Sonnet and a Max plan are AMAZING. But now you can.

thumb_up_off_alt384

chat_bubble_outline13

repeat34

shareShare

VirusTotal

@virustotal

6 months ago

What 17,845 GitHub Repos Taught Us About Malicious MCP Servers blog.virustotal.com/2025/06/what-1…

thumb_up_off_alt63

chat_bubble_outline0

repeat23

shareShare

VirusTotal

@virustotal

6 months ago

YARA-X 1.0.0: The Stable Release and Its Advantages blog.virustotal.com/2025/06/yara-x…

thumb_up_off_alt35

chat_bubble_outline0

repeat15

shareShare

Giuseppe `N3mes1s`

@gn3mes1s

6 months ago

Intel® Virtualization Technology - Redirect Protection (Intel® VT-rp) it resting approach to make ept faster and smarter against www attack community.intel.com/t5/Blogs/Tech-…

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

METR

@metr_evals

6 months ago

At METR, we’ve seen increasingly sophisticated examples of “reward hacking” on our tasks: models trying to subvert or exploit the environment or scoring code to obtain a higher score. In a new post, we discuss this phenomenon and share some especially crafty instances we’ve seen.

thumb_up_off_alt221

chat_bubble_outline3

repeat37

shareShare

Caiming Xiong

@caimingxiong

6 months ago

🎉 Excited to share our new work on AI Agent and LLM judge safety "Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows" As AI agents become increasingly autonomous, they often rely on feedback from judges (evaluators). These judges evaluate,

thumb_up_off_alt79

chat_bubble_outline3

repeat31

shareShare

Giuseppe `N3mes1s`

@gn3mes1s

6 months ago

Just love the failure modes • Resource Overallocation • Infinite Loop Behavior • Communication Spam • Work Duplication • Query Verbosity • Source Quality Bias • Premature Continuation • Error Cascading • Emergent Drift • Synchronous Blocking

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Giuseppe `N3mes1s`

@gn3mes1s

6 months ago

This. I think one thing we should have learnt from the past, everything needs to be observed, especially undeterministic actions.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Simon Willison

@simonw

6 months ago

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

thumb_up_off_alt2,2K

chat_bubble_outline70

repeat474

shareShare

Giuseppe `N3mes1s`

@gn3mes1s

6 months ago

> Our results suggest some directions for improvement of these monitoring capabilities, which will likely become essential for full oversight at speeds that human monitors simply couldn’t match. Sometimes what you say is not what you do !

thumb_up_off_alt0

chat_bubble_outline0

repeat1

shareShare

Liv Matan

@terminatorlm

6 months ago

👻This is GerriScary: a vulnerability I discovered in Google's Gerrit that allowed to hack several projects and affected 18 Google projects including ChromiumOS (CVE-2025-1568), Chromium, Bazel, and Dart. Dive into the full details here: tenable.com/blog/gerriscar…

thumb_up_off_alt76

chat_bubble_outline6

repeat24

shareShare

dreadnode

@dreadnode

6 months ago

Introducing AIRTBench, an AI red teaming benchmark for evaluating language models’ ability to autonomously discover and exploit AI/ML security vulnerabilities. Read the paper on arXiv: arxiv.org/abs/2506.14682 Open-source dataset and benchmark eval code repo: