Saachi Jain (@saachi_jain_) 's Twitter Profile
Saachi Jain

@saachi_jain_

Safety @ OpenAI

ID: 2889468091

linkhttp://saachij.com/ calendar_today04-11-2014 06:33:50

34 Tweet

738 Takipçi

395 Takip Edilen

Aleksander Madry (@aleks_madry) 's Twitter Profile Photo

What's the right way to remove part of an image? We show that typical strategies distort model predictions and introduce bias when debugging models. Good news: leveraging ViTs enables a way to side-step this bias. Paper: arxiv.org/abs/2204.08945 Blog post: gradientscience.org/missingness

Hadi Salman (@hadisalmanx) 's Twitter Profile Photo

If you are attending CVPR and would like to learn about our work on certified patch defenses, pass by our poster (#178) this Thursday 2:30-5pm CDT in Hall B2-C! Saachi Jain Eric Wong and I will be there!

Aleksander Madry (@aleks_madry) 's Twitter Profile Photo

What kinds of fish are hard for a model to classify? Our new method (arxiv.org/abs/2206.14754) automatically identifies + captions model error patterns. The key? Distill failure modes as directions in latent space. Saachi Jain Hannah Lawrence A. Moitra Blog: gradientscience.org/failure-direct…

What kinds of fish are hard for a model to classify? Our new method (arxiv.org/abs/2206.14754) automatically identifies + captions model error patterns. The key? Distill failure modes as directions in latent space. <a href="/saachi_jain_/">Saachi Jain</a> <a href="/HLawrenceCS/">Hannah Lawrence</a> A. Moitra
Blog: gradientscience.org/failure-direct…
Aleksander Madry (@aleks_madry) 's Twitter Profile Photo

Does transfer learning = free accuracy? W/ Hadi Salman Saachi Jain Andrew Ilyas Logan Engstrom Eric Wong we identify one potential drawback: *bias transfer*, where biases in pre-trained models can persist after fine-tuning arxiv.org/abs/2207.02842 gradientscience.org/bias-transfer

Does transfer learning = free accuracy? W/ <a href="/hadisalmanX/">Hadi Salman</a> <a href="/saachi_jain_/">Saachi Jain</a> <a href="/andrew_ilyas/">Andrew Ilyas</a> <a href="/logan_engstrom/">Logan Engstrom</a> <a href="/RICEric22/">Eric Wong</a> we identify one potential drawback: *bias transfer*, where biases in pre-trained models can persist after fine-tuning 
arxiv.org/abs/2207.02842 
gradientscience.org/bias-transfer
Aleksander Madry (@aleks_madry) 's Twitter Profile Photo

Does your pretrained model think planes are flying lawnmowers? W/ Saachi Jain Hadi Salman Alaa Khaddaj Eric Wong Sam Park we build a framework for pinpointing the impact of pretraining data on transfer learning. Paper: arxiv.org/abs/2207.05739 Blog: gradientscience.org/data-transfer/

Does your pretrained model think planes are flying lawnmowers?  W/ <a href="/saachi_jain_/">Saachi Jain</a> <a href="/hadisalmanX/">Hadi Salman</a> <a href="/Alaa_Khaddaj/">Alaa Khaddaj</a> <a href="/RICEric22/">Eric Wong</a> <a href="/smsampark/">Sam Park</a> we build a framework for pinpointing the impact of pretraining data on transfer learning.
Paper: arxiv.org/abs/2207.05739
Blog: gradientscience.org/data-transfer/
Saachi Jain (@saachi_jain_) 's Twitter Profile Photo

If you are attending ICML, Dimitris Tsipras and I are presenting our paper Combining Diverse Feature Priors (arxiv.org/abs/2110.08220) on Tuesday. Come say hi! Talk: Tues. 7/19, 5:40 PM EDT Room 327-329 (DL Sequential Models) Poster: Tues. 7/19, 6:30-8:30 PM EDT Hall E, Poster #508

Andrew Ilyas (@andrew_ilyas) 's Twitter Profile Photo

Come hear about work on datamodels (arxiv.org/abs/2202.00622) at ICML *tomorrow* in the Deep Learning/Optimization track (Rm 309)! The presentation is at 4:50 with a poster session at 6:30. Joint work with Sam Park Logan Engstrom Guillaume Leclerc Aleksander Madry

Hannah Lawrence (@hlawrencecs) 's Twitter Profile Photo

Group CNNs are used for their explicit inductive bias towards symmetry, but what about the implicit bias from training? With Bobak Kiani Kristian Georgiev A. Dienes, we answer this question for linear group CNNs. Check out today's talk + Poster 520 at ICML! proceedings.mlr.press/v162/lawrence2…

Sarah Cen (@cen_sarah) 's Twitter Profile Photo

Got the chance to talk at the Simons Institute's AI & Humanity Workshop last week! Presented two ongoing works with Andrew Ilyas Aleksander Madry Manish Raghavan on building trust in AI & the governance of data-driven algorithms. Check out the video here youtube.com/watch?v=OpFY9D…

Aleksander Madry (@aleks_madry) 's Twitter Profile Photo

Stable diffusion can visualize + improve model failure modes! Leveraging our method, we can generate examples of hard subpopulations, which can then be used for targeted data augmentation to improve reliability. Blog: gradientscience.org/failure-direct… Saachi Jain Hannah Lawrence A.Moitra

Stable diffusion can visualize + improve model failure modes!

Leveraging our method, we can generate examples of hard subpopulations, which can then be used for targeted data augmentation to improve reliability. 

Blog: gradientscience.org/failure-direct…
<a href="/saachi_jain_/">Saachi Jain</a> <a href="/HLawrenceCS/">Hannah Lawrence</a> A.Moitra
Aleksander Madry (@aleks_madry) 's Twitter Profile Photo

Will your model identify a polar bear on the moon? How would you know? Dataset Interfaces let you generate images from your dataset under whatever distribution shift you desire! arxiv.org/abs/2302.07865 gradientscience.org/dataset-interf… W/ Josh Vendrow Saachi Jain Logan Engstrom

Will your model identify a polar bear on the moon? How would you know?

Dataset Interfaces let you generate images from your dataset under whatever distribution shift you desire!
arxiv.org/abs/2302.07865  gradientscience.org/dataset-interf…

W/ <a href="/josh_vendrow/">Josh Vendrow</a> <a href="/saachi_jain_/">Saachi Jain</a>  <a href="/logan_engstrom/">Logan Engstrom</a>
Saachi Jain (@saachi_jain_) 's Twitter Profile Photo

If you are at #ICLR2023, Hannah Lawrence and I are presenting our work on distilling model failures as directions in latent space on *Wednesday*. Come say hi! Talk: 10:30, AD12 Poster: 11:30-1:30 # 59 arxiv.org/abs/2206.14754

Saachi Jain (@saachi_jain_) 's Twitter Profile Photo

Excited to be in Vancouver for #CVPR2023! Hadi Salman and I will be presenting our poster on a data-based perspective on transfer learning on Tuesday (10:30-12). If you're around, drop by and say hi! arxiv.org/abs/2207.05739

Andrew Ilyas (@andrew_ilyas) 's Twitter Profile Photo

Any burning ML questions? The ATTRIB workshop is hosting a panel on "The Future of Attribution in ML" tomorrow at 11AM and is soliciting questions! Submit them by TODAY 11:59PM to hear answers at the panel tomorrow! forms.gle/Yd5N3Ti6kKfqij… More info: attrib-workshop.cc

Lilian Weng (@lilianweng) 's Twitter Profile Photo

🍓 Finally o1 is out - our first model with general reasoning capabilities. Not only it achieves impressive results on hard, scientific tasks, but also it gets significantly improved on safety and robustness. openai.com/index/learning… We found reasoning in context about safety

🍓 Finally o1 is out - our first model with general reasoning capabilities. Not only it achieves impressive results on hard, scientific tasks, but also it gets significantly improved on safety and robustness.

openai.com/index/learning…

We found reasoning in context about safety
Lilian Weng (@lilianweng) 's Twitter Profile Photo

📢 We are hiring Research Scientists and Engineers for safety research at OpenAI, ranging from safe model behavior training, adversarial robustness, AI in healthcare, frontier risk evaluation and more. Please fill in this form if you are interested: jobs.ashbyhq.com/openai/form/oa…

Johannes Heidecke (@joheidecke) 's Twitter Profile Photo

Proud to share our work on Deliberative Alignment openai.com/index/delibera… with a special shoutout to Melody Guan ʕᵔᴥᵔʔ who led this work. Deliberative Alignment trains models to reason over relevant safety and alignment policies to forge their responses.