
Mert Yuksekgonul
@mertyuksekgonul
(he/him) Computer Science PhD Candidate @Stanford @StanfordAILab
ID: 175309364
https://cs.stanford.edu/~merty 06-08-2010 07:11:25
2,2K Tweet
4,4K Followers
721 Following










Model developers try to train “safe” models that refuse to help with malicious tasks like hacking ...but in new work with Jacob Steinhardt and Anca Dragan, we show that such models still enable misuse: adversaries can combine multiple safe models to bypass safeguards 1/n



*TextGrad: Automatic “Differentiation” via Text* by Mert Yuksekgonul Federico Bianchi Sheng Liu Zhi Huang James Zou PyTorch syntax for optimizing graphs of LLM calls, where "gradients" and "optimization" are computed by additional LLM instances. arxiv.org/abs/2406.07496



🔥#TextGrad is now multi-modal! TextGrad boosts GPT-4o's visual reasoning ability: 📊MathVista score 63.8➡️66.1 w/ TextGrad 🧬Reduces ScienceQA error rate by 20%. Best reported 0-shot score Tutorial: colab.research.google.com/github/zou-gro… Great work Pan Lu Mert Yuksekgonul + team! Works

Work co-supervised with Hidenori Tanaka, led by Udith Haputhanthri, has now been accepted to ICML Mechanistic Interpretability workshop. TLDR: High-dimensional bifurcations underly skill acquisition in task-trained RNNs Link: openreview.net/forum?id=njmXd… A tweetprint🧵




A while ago I wrote a thread about #TextGrad, which is an alternative prompt optimization method, based on "natural language gradients". Cool! Since we are still waiting for Andrej Karpathy's video reimplementing this from scratch... I thought I had to make my own... So here is the
