Harry Coppock (@harrycoppock) Twitter Tweets • TwiCopy

summerfieldlab @summerfieldlab.bsky.social

2 years ago

If you want to do world class AI research from within government, this is your chance. AISI is building a really strong technical team to work on AI safety - several positions now available: gov.uk/government/new…

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Laura Gilbert

@statsarepower

2 years ago

Are you a data scientist who wants to work for the team ministers have recently called "The best delivery team in government", and "the unsung heroes of Whitehall"? Do you want to make a difference? Apply now! shorturl.at/elrwD

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Saffron Huang

@saffronhuang

2 years ago

The UK AI Safety Institute is hiring more technical staff! I believe AISI is one of the best places to do ML research/eng for the public good. We’re having impact at the scale of government, while moving at the pace of a startup (what more could you ask for?)

thumb_up_off_alt95

chat_bubble_outline5

repeat31

shareShare

Jobie Budd

@jobiebudd

2 years ago

Is your AI-enabled diagnostic tool accurate, or does your dataset have confounding bias? Our Turing-RSS Health Data Lab paper, published today in Nature Machine Intelligence, investigates audio-based AI classifiers for COVID-19 screening. nature.com/articles/s4225…

thumb_up_off_alt5

chat_bubble_outline1

repeat4

shareShare

AI Security Institute

@aisecurityinst

a year ago

We're at #icml2024. If you want to chat about our work or roles, message Herbie Bradley (predictive evals) Tomek Korbak (safety cases) Jelena Luketina (agents) Cozmin Ududec (testing) Harry Coppock (cyber evals + AI for med) Olivia Jimenez @ ICML (recruiting)

thumb_up_off_alt11

chat_bubble_outline1

repeat4

shareShare

AI Security Institute

@aisecurityinst

a year ago

AISI is co-hosting DEF CON's generative red teaming challenge this year! Huge thanks to Sven Cattell AI Village @ DEF CON DEF CON for making this happen. (1/6)

thumb_up_off_alt29

chat_bubble_outline1

repeat6

shareShare

Xander Davies

@alxndrdavies

a year ago

@AISafetyInst will be at DEF CON! If you'd like to chat abt attacking, defending, & evaling frontier models, DM me or fill out our form (in 🧵)

thumb_up_off_alt18

chat_bubble_outline1

repeat5

shareShare

Xander Davies

@alxndrdavies

a year ago

Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of LLM 𝑎𝑔𝑒𝑛𝑡𝑠 developed at @AISafetyInst in collaboration with Gray Swan AI! 🧵 1/N

thumb_up_off_alt189

chat_bubble_outline5

repeat40

shareShare

Gray Swan AI

@grayswanai

8 months ago

Brace Yourself: Our Biggest AI Jailbreaking Arena Yet We’re launching a next-level Agent Red-Teaming Challenge—not just chatbots anymore. Think direct & indirect attacks on anonymous frontier models. $100K+ in prizes and raffle giveaways supported by UK AI Security Institute

thumb_up_off_alt48

chat_bubble_outline3

repeat13

shareShare

Xander Davies

@alxndrdavies

7 months ago

My team is hiring AI Security Institute! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research & testing. 🧵 1/4

My team is hiring <a href="/AISecurityInst/">AI Security Institute</a>! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research & testing. 🧵 1/4

thumb_up_off_alt172

chat_bubble_outline4

repeat36

shareShare

AI Security Institute

@aisecurityinst

4 months ago

🧵 AI Systems are developing advanced cyber capabilities. This means they’re helping strengthen defences - but can also be used as threats. To keep on top of these risks, we need more rigorous evaluations of agentic AI, which is why we’re releasing Inspect Cyber 🔍

thumb_up_off_alt58

chat_bubble_outline1

repeat13

shareShare

Xander Davies

@alxndrdavies

3 months ago

We at AI Security Institute worked with OpenAI to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4

We at <a href="/AISecurityInst/">AI Security Institute</a> worked with <a href="/OpenAI/">OpenAI</a> to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4

thumb_up_off_alt135

chat_bubble_outline3

repeat24

shareShare

Andy Zou

@andyzou_jiaming

3 months ago

We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵

thumb_up_off_alt2,2K

chat_bubble_outline68

repeat381

shareShare

AI Security Institute

@aisecurityinst

3 months ago

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

thumb_up_off_alt176

chat_bubble_outline6

repeat60

shareShare

Xander Davies

@alxndrdavies

2 months ago

We at AI Security Institute worked with OpenAI to test GPT-5's safeguards. We identified multiple jailbreaks, including a universal jailbreak that evades all layers of mitigations and is being patched. Excited to continue partnering with OpenAI to test & strengthen safeguards.

We at <a href="/AISecurityInst/">AI Security Institute</a> worked with <a href="/OpenAI/">OpenAI</a> to test GPT-5's safeguards. We identified multiple jailbreaks, including a universal jailbreak that evades all layers of mitigations and is being patched. Excited to continue partnering with OpenAI to test & strengthen safeguards.

thumb_up_off_alt92

chat_bubble_outline15

repeat17

shareShare

AI Security Institute

@aisecurityinst

2 months ago

How can open-weight Large Language Models be safeguarded against malicious uses? In our new paper with EleutherAI, we find that removing harmful data before training can be over 10x more effective at resisting adversarial fine-tuning than defences added after training 🧵

How can open-weight Large Language Models be safeguarded against malicious uses?

In our new paper with <a href="/AiEleuther/">EleutherAI</a>, we find that removing harmful data before training can be over 10x more effective at resisting adversarial fine-tuning than defences added after training 🧵

thumb_up_off_alt82

chat_bubble_outline2

repeat18

shareShare

Harry Coppock

@harrycoppock

2 months ago

This is great news for the UK. Having worked with Jade over the past 2 years, setting up AI Security Institute, I am confident that there are very few, if any, who are better placed to take on this role.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Robert Kirk

@_robertkirk

2 months ago

Since I started working on safeguards, we've seen substantial progress in defending certain hosted models, but less progress in measuring & managing misuse risks from open weight models. Three directions I want explored more, drawn from our AI Security Institute post today 🧵

thumb_up_off_alt37

chat_bubble_outline1

repeat7

shareShare

AI Security Institute

@aisecurityinst

a month ago

🔎 People are increasingly using chatbots to seek out new information, raising concerns about how they could misinform voters or distort public opinion. But how is AI actually influencing real-world political beliefs? Our new study explores this question 👇

thumb_up_off_alt19

chat_bubble_outline2

repeat6

shareShare

Xander Davies

@alxndrdavies

a month ago

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

thumb_up_off_alt290

chat_bubble_outline8

repeat63

shareShare