Trustworthy ML Initiative (TrustML) (@trustworthy_ml) Twitter Tweets • TwiCopy

Trustworthy ML Initiative (TrustML)

@trustworthy_ml

+ Follow

Latest research in Trustworthy ML. Organizers: @JaydeepBorkar @sbmisi @hima_lakkaraju @sarahookr Sarah Tan @chhaviyadav_ @_cagarwal @m_lemanczyk @HaohanWang

ID:1262375165490540549

linkhttps://www.trustworthyml.org calendar_today18-05-2020 13:31:24

1,7K Tweets

6,0K Followers

64 Following

Yixin Wan

1 week ago

How to identify bias in language agency?Eg. in texts describing White men as “leading” & Black women as “helping”?🧐
🔎String matching?❌NO!
🔎Sentiment classifier?❌No!
✅Our agency classifier CAN! It reveals gender, racial, and intersectional bias🤯
🔗:
arxiv.org/abs/2404.10508

thumb_up_off_alt83

chat_bubble_outline0

account_circle

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞

@hima_lakkaraju

2 weeks ago

As we increasingly rely on #LLMs for product recommendations and searches, can companies game these models to enhance the visibility of their products?

Our latest work provides answers to this question & demonstrates that LLMs can be manipulated to boost product visibility!…

As we increasingly rely on #LLMs for product recommendations and searches, can companies game these models to enhance the visibility of their products? Our latest work provides answers to this question & demonstrates that LLMs can be manipulated to boost product visibility!…

thumb_up_off_alt348

chat_bubble_outline0

account_circle

Giang Nguyen

@giangnguyen2412

1 week ago

🚀 Exciting news! Our latest work, CHM-Corr++, has been accepted for presentation at the #XAI4CV Workshop, CVPR 2024! 🎉

The work lies in the intersection of: Interactive XAI and human-AI collaboration.

Demo: http://137.184.82.109:7080/
Paper: arxiv.org/abs/2404.05238

thumb_up_off_alt21

chat_bubble_outline0

account_circle

Maksym Andriushchenko 🇺🇦

3 weeks ago

🚨 Are leading safety-aligned LLMs adversarially robust? 🚨

❗In our new work, we jailbreak basically all of them with ≈100% success rate (according to GPT-4 as a semantic judge):
- Claude 1.2 / 2.0 / 2.1 / 3 Haiku / 3 Sonnet / 3 Opus,
- GPT-3.5 / GPT-4,
- R2D2-7B from…

🚨 Are leading safety-aligned LLMs adversarially robust? 🚨 ❗In our new work, we jailbreak basically all of them with ≈100% success rate (according to GPT-4 as a semantic judge): - Claude 1.2 / 2.0 / 2.1 / 3 Haiku / 3 Sonnet / 3 Opus, - GPT-3.5 / GPT-4, - R2D2-7B from…

thumb_up_off_alt347

chat_bubble_outline0

account_circle

Canyu Chen

3 weeks ago

Thanks LLM Security for sharing our new #ICLR2024 work 'Can LLM-Generated Misinformation Be Detected?'

🔗Project website (paper, dataset, and code): llm-misinformation.github.io

🚨LLM-generated misinformation is one of the most critical risks on AI safety. Then, one fundamental…

thumb_up_off_alt121

chat_bubble_outline0

account_circle

Patrick Chao

3 weeks ago

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent?

Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs!

jailbreakbench.github.io
🧵1/n

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent? Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs! jailbreakbench.github.io 🧵1/n

thumb_up_off_alt172

chat_bubble_outline0

account_circle

SAIL @ Imperial College London

1 month ago

We're recruiting two Research Assistants to join us and work on the security of ML-based personal assistants at
Imperial College London. The role will focus on verification, robustification and adversarial attacks for AI assistants. rb.gy/mcxvob.

thumb_up_off_alt3

chat_bubble_outline0

account_circle

Jaemin Cho

1 month ago

Can we adaptively generate training environments with LLMs to help small embodied RL game agents learn useful skills that they are weak at? 🤔

👉 Check out EnvGen, an effective+efficient framework in which an LLM progressively generates and adapts training environments based on…

Can we adaptively generate training environments with LLMs to help small embodied RL game agents learn useful skills that they are weak at? 🤔 👉 Check out EnvGen, an effective+efficient framework in which an LLM progressively generates and adapts training environments based on…

thumb_up_off_alt213

chat_bubble_outline0

account_circle

Przemyslaw Grabowicz

1 month ago

The U.S. Supreme Court has ended the use of race in college admissions. Fortunately, there exists a path to fair algorithmic decision-making that differs from the invalidated affirmative action measures, as we discuss in our recent Uncommon Good post:
uncommongood.substack.com/p/fair-machine…

The U.S. Supreme Court has ended the use of race in college admissions. Fortunately, there exists a path to fair algorithmic decision-making that differs from the invalidated affirmative action measures, as we discuss in our recent Uncommon Good post: uncommongood.substack.com/p/fair-machine…

thumb_up_off_alt0

chat_bubble_outline0

account_circle

Machine Learning Security Laboratory

1 month ago

We are excited to present a new event of our seminar series on ML Security!
We will host Giovanni Cherubin (@Microsoft) on March 26th, 2024 at 15:00 CET.
Free registration: us02web.zoom.us/j/82941308293?…

ELSA - European Lighthouse on Secure and Safe AI Adversarial Machine Learning Trustworthy ML Initiative (TrustML) AI Village @ DEF CON RedTeamVillage

We are excited to present a new event of our seminar series on ML Security! We will host @gchers (@Microsoft) on March 26th, 2024 at 15:00 CET. Free registration: us02web.zoom.us/j/82941308293?… @elsa_lighthouse @adversarial_ML @trustworthy_ml @aivillage_dc @RedTeamVillage_

thumb_up_off_alt15

chat_bubble_outline0

account_circle

Matthew Finlayson

1 month ago

Wanna know gpt-3.5-turbo's embed size? We find a way to extract info from LLM APIs and estimate gpt-3.5-turbo’s embed size to be 4096. With the same trick we also develop 25x faster logprob extraction, audits for LLM APIs, and more!
📄 arxiv.org/abs/2403.09539
Here’s how 1/🧵

Wanna know gpt-3.5-turbo's embed size? We find a way to extract info from LLM APIs and estimate gpt-3.5-turbo’s embed size to be 4096. With the same trick we also develop 25x faster logprob extraction, audits for LLM APIs, and more! 📄 arxiv.org/abs/2403.09539 Here’s how 1/🧵

thumb_up_off_alt362

chat_bubble_outline0

account_circle

Canyu Chen

1 month ago

🤔Can LLM agents really simulate human behaviors?

🌟Our new paper 'Can Large Language Model Agents Simulate Human Trust Behaviors?' (Project website: camel-ai.org/research/agent…) provides some new insights into this fundamental problem.

✨TLDR: We discover the trust behaviors of…

🤔Can LLM agents really simulate human behaviors? 🌟Our new paper 'Can Large Language Model Agents Simulate Human Trust Behaviors?' (Project website: camel-ai.org/research/agent…) provides some new insights into this fundamental problem. ✨TLDR: We discover the trust behaviors of…

thumb_up_off_alt262

chat_bubble_outline0

account_circle

Sharon Levy

1 month ago

🧐Are LLM responses to public health questions biased toward specific demographic groups?

In our new interdisciplinary collaboration, we find that disparities exist among model answers for different groups across ages, U.S. locations, and sexes.

Paper: arxiv.org/pdf/2403.04858…

🧐Are LLM responses to public health questions biased toward specific demographic groups? In our new interdisciplinary collaboration, we find that disparities exist among model answers for different groups across ages, U.S. locations, and sexes. Paper: arxiv.org/pdf/2403.04858…

thumb_up_off_alt103

chat_bubble_outline0

account_circle

Eric Wallace

1 month ago

The final layer of an LLM up-projects from hidden dim —> vocab size. The logprobs are thus low rank, and with some clever API queries, you can recover an LLM’s hidden dimension (or even the exact layer’s weights).

Our new paper is out, a collaboration between lot of friends!

thumb_up_off_alt207

chat_bubble_outline0

account_circle

Nicolas Papernot

@NicolasPapernot

1 month ago

Just one month left before SaTML Conference April 9-11 in Toronto! I am excited to hear from Somesh Jha Deb Raji Yves-A. de Montjoye Sheila McIlraith, as well as the authors of accepted papers, and the competition organizing teams!

There's still time to register! satml.org

Just one month left before @satml_conf April 9-11 in Toronto! I am excited to hear from @jhasomesh @rajiinio @yvesalexandre @SheilaMcIlraith, as well as the authors of accepted papers, and the competition organizing teams! There's still time to register! satml.org

thumb_up_off_alt42

chat_bubble_outline0

account_circle

Przemyslaw Grabowicz

1 month ago

Our first Uncommon Good post (with Nick Perello) discusses how to train AI systems that do not propagate discrimination, in compliance with legal provision, based on our research published in ACM FAccT, AI, Ethics, and Society Conference (AIES), and ICML Conference. Stay tuned!
open.substack.com/pub/uncommongo…

Our first Uncommon Good post (with Nick Perello) discusses how to train AI systems that do not propagate discrimination, in compliance with legal provision, based on our research published in @FAccTConference, @AIESConf, and @icmlconf. Stay tuned! open.substack.com/pub/uncommongo…

thumb_up_off_alt4

chat_bubble_outline0

account_circle

David Wan

1 month ago

Pointing to an image region should help models focus, but standard VLMs fail to understand visual markers/prompts (e.g., boxes/masks).

🚨Contrastive Region Guidance: Training-free method that increases focus on visual prompts by reducing model priors.

arxiv.org/abs/2403.02325
🧵

Pointing to an image region should help models focus, but standard VLMs fail to understand visual markers/prompts (e.g., boxes/masks). 🚨Contrastive Region Guidance: Training-free method that increases focus on visual prompts by reducing model priors. arxiv.org/abs/2403.02325 🧵

thumb_up_off_alt121

chat_bubble_outline0

account_circle

Javier Rando

1 month ago

We are announcing the winners of our Trojan Detection Competition on Aligned LLMs!!

🥇 TML Lab (EPFL) (@fra__31, Maksym Andriushchenko 🇺🇦 and Nicolas Flammarion)
🥈 Krystof Mitka
🥉 nev

🧵 With some of the main findings!

thumb_up_off_alt51

chat_bubble_outline0

account_circle

Zhuang Liu

1 month ago

LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper

“Massive Activations in Large Language Models”

LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)

LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)

thumb_up_off_alt1,1K

chat_bubble_outline0

account_circle

A. Feder Cooper

2 months ago

Thrilled to be recognized with best paper honorable mention at AAAI!

Our paper raises serious questions re: reproducibility + reliability in fairness

We define + mitigate arbitrariness, & find that most fairness benchmarks are actually close-to-fair

This is a BIG 🚩🚩

1/

Thrilled to be recognized with best paper honorable mention at @RealAAAI! Our paper raises serious questions re: reproducibility + reliability in fairness We define + mitigate arbitrariness, & find that most fairness benchmarks are actually close-to-fair This is a BIG 🚩🚩 1/

thumb_up_off_alt140

chat_bubble_outline0

account_circle

fpc ok :)