juliette pluto 🌌 ICLR 2025 (@foundjuliette) 's Twitter Profile
juliette pluto 🌌 ICLR 2025

@foundjuliette

cyclist, shapeshifter, typo-generator. ML security @GoogleDeepMind. views mine.

ID: 2604325361

linkhttps://jul.sh calendar_today12-06-2014 18:10:45

2,2K Tweet

5,5K Followers

608 Following

juliette pluto 🌌 ICLR 2025 (@foundjuliette) 's Twitter Profile Photo

Prediction: unless patched, GPT's tendency to say delve will be assimilated into North American style English. Soon it won't stand out as odd anymore. More broadly, RLHF'ed LLMs will shape cultural norms in unexpected ways.

Jacob Austin (@jacobaustin132) 's Twitter Profile Photo

This is something I've worked on for a while! You can save the activations of one LLM call and reuse them for a follow-up that overlaps with the first. This means asking a question about a big codebase can take 30 seconds the first time and 1s after that!

juliette pluto 🌌 ICLR 2025 (@foundjuliette) 's Twitter Profile Photo

prediction: when all is said and done Elon Musk's $42B purchase of Twitter of Twitter will be seen as a bargain. Not because of the platforms economic success, but its outsize sociopolitical influence.

Colin McCarthy (@us_stormwatch) 's Twitter Profile Photo

12-hour timelapse of American Airlines, Delta, and United plane traffic after what was likely the biggest IT outage in history forced a nationwide ground stop of the three airlines.

juliette pluto 🌌 ICLR 2025 (@foundjuliette) 's Twitter Profile Photo

Bearish on Cursor (closed source, send your code to their servers, forced subscription) Bullish on Zed (open source, fast, private, existing anthropic partnership, supports API keys)

Josh Engels (@joshaengels) 's Twitter Profile Photo

1/6: A recent paper shows that that LLMs are "self aware": when trained to exhibit a behavior like "risk taking", LLMs self report being risky. In a recent blog post, we explore what's happening here: some self awareness behaviors are caused by a simple learned steering vector!🧵

Dan Allison (@danallison) 's Twitter Profile Photo

One failure mode that I’ve repeatedly fallen into is thinking “surely someone smarter than me must have already figured this out” when in fact no one has.

Ilia Shumailov🦔 (@iliaishacked) 's Twitter Profile Photo

Our new Google DeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.

Our new  <a href="/GoogleDeepMind/">Google DeepMind</a> paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.
juliette pluto 🌌 ICLR 2025 (@foundjuliette) 's Twitter Profile Photo

even if model "neutrality" were mean perfectly representing the views & biases of the population in the model, the result would mean cementing the status quo.

François Chollet (@fchollet) 's Twitter Profile Photo

Officially validated IMO gold medal, purely via search in token space, achieved in 4.5 hrs (unclear at what compute cost). The solutions read nicely as well deepmind.google/discover/blog/…

roon (@tszzl) 's Twitter Profile Photo

correct me if im wrong but it seems like: - the theme of the Dan Wang book, and the general elite consensus now is that “industrial process” is a technology that lives in the heads of people and that it was a mistake to let so much “low value” industry be offshored due to the

juliette pluto 🌌 ICLR 2025 (@foundjuliette) 's Twitter Profile Photo

This chart is potentially misleading. It compares the latest sonnet model to older models (from March/April). Also, attacks in this data set were optimized against those other models, but not against sonnet 4.5 (!). It would likely do worse against tailored attacks.