Giuseppe `N3mes1s` (@gn3mes1s) 's Twitter Profile
Giuseppe `N3mes1s`

@gn3mes1s

windows, macos, linux, android && lowlevel && ring-1 lover; EDR chef; malware hunter; purple team💜

ID: 53158658

linkhttp://quequero.org/ calendar_today02-07-2009 19:17:52

24,24K Tweet

12,12K Followers

316 Following

Mario Zechner (@badlogicgames) 's Twitter Profile Photo

A new entry to my popular series "LLM tools for plebs": claude-trace - Injects itself into Claude Code - Logs all traffic - Reconstructs conversations and shows what's going on behind the scenes (system prompts, all tool inputs/outputs, and more) Some observations. 🧵

Mario Zechner (@badlogicgames) 's Twitter Profile Photo

Haiku is also part of their "Make Bash Safe Again" protection layer. Before running a Bash command, Haiku is asked to determine malicious command injection. I'm not sure I'd trust an LLM with this task. YOLO I guess.

Haiku is also part of their "Make Bash Safe Again" protection layer. Before running a Bash command, Haiku is asked to determine malicious command injection. 

I'm not sure I'd trust an LLM with this task. YOLO I guess.
Ilia Shumailov🦔 (@iliaishacked) 's Twitter Profile Photo

Our new Google DeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.

Our new  <a href="/GoogleDeepMind/">Google DeepMind</a> paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.
Adam Chester 🏴‍☠️ (@_xpn_) 's Twitter Profile Photo

New blog post is up! Stepping out of my comfort zone (be kind), looking at Meta's Prompt Guard 2 model, how to misclassify prompts using the Unigram tokenizer and hopefully demonstrate why we should invest time looking beyond the API at how LLMs function. specterops.io/blog/2025/06/0…

Mario Zechner (@badlogicgames) 's Twitter Profile Photo

From the maker of claude-trace now comes claude-bridge. Use Claude Code with Google models, OpenAI models, or really any other provider with OpenAI endpoints. Including Ollama. Not sure why you would, cause Opus/Sonnet and a Max plan are AMAZING. But now you can.

Giuseppe `N3mes1s` (@gn3mes1s) 's Twitter Profile Photo

Intel® Virtualization Technology - Redirect Protection (Intel® VT-rp) it resting approach to make ept faster and smarter against www attack community.intel.com/t5/Blogs/Tech-…

METR (@metr_evals) 's Twitter Profile Photo

At METR, we’ve seen increasingly sophisticated examples of “reward hacking” on our tasks: models trying to subvert or exploit the environment or scoring code to obtain a higher score. In a new post, we discuss this phenomenon and share some especially crafty instances we’ve seen.

At METR, we’ve seen increasingly sophisticated examples of “reward hacking” on our tasks: models trying to subvert or exploit the environment or scoring code to obtain a higher score. In a new post, we discuss this phenomenon and share some especially crafty instances we’ve seen.
Caiming Xiong (@caimingxiong) 's Twitter Profile Photo

🎉 Excited to share our new work on AI Agent and LLM judge safety "Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows" As AI agents become increasingly autonomous, they often rely on feedback from judges (evaluators). These judges evaluate,

🎉 Excited to share our new work on AI Agent and LLM judge safety "Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows"

As AI agents become increasingly autonomous, they often rely on feedback from judges (evaluators). These judges evaluate,
Giuseppe `N3mes1s` (@gn3mes1s) 's Twitter Profile Photo

Just love the failure modes • Resource Overallocation • Infinite Loop Behavior • Communication Spam • Work Duplication • Query Verbosity • Source Quality Bias • Premature Continuation • Error Cascading • Emergent Drift • Synchronous Blocking

Giuseppe `N3mes1s` (@gn3mes1s) 's Twitter Profile Photo

This. I think one thing we should have learnt from the past, everything needs to be observed, especially undeterministic actions.

Simon Willison (@simonw) 's Twitter Profile Photo

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta

Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!
Giuseppe `N3mes1s` (@gn3mes1s) 's Twitter Profile Photo

> Our results suggest some directions for improvement of these monitoring capabilities, which will likely become essential for full oversight at speeds that human monitors simply couldn’t match. Sometimes what you say is not what you do !

Liv Matan (@terminatorlm) 's Twitter Profile Photo

👻This is GerriScary: a vulnerability I discovered in Google's Gerrit that allowed to hack several projects and affected 18 Google projects including ChromiumOS (CVE-2025-1568), Chromium, Bazel, and Dart. Dive into the full details here: tenable.com/blog/gerriscar…

dreadnode (@dreadnode) 's Twitter Profile Photo

Introducing AIRTBench, an AI red teaming benchmark for evaluating language models’ ability to autonomously discover and exploit AI/ML security vulnerabilities. Read the paper on arXiv: arxiv.org/abs/2506.14682 Open-source dataset and benchmark eval code repo:

Introducing AIRTBench, an AI red teaming benchmark for evaluating language models’ ability to autonomously discover and exploit AI/ML security vulnerabilities.

Read the paper on arXiv: arxiv.org/abs/2506.14682 

Open-source dataset and benchmark eval code repo:
Cua (@trycua) 's Twitter Profile Photo

Introducing Windows Sandbox support - run computer-use agents on Windows business apps without VMs or cloud costs. 1/6

Introducing Windows Sandbox support - run computer-use agents on Windows business apps without VMs or cloud costs.

1/6