Teddi Worledge (@teddiworledge) 's Twitter Profile
Teddi Worledge

@teddiworledge

(she/her) Computer Science PhD Student @Stanford. Formerly @Berkeley.

ID: 1720864110483853312

calendar_today04-11-2023 18:03:39

21 Tweet

103 Followers

108 Following

Alexander Wan (@alexwan55) 's Twitter Profile Photo

What happens when RAG models are provided with documents that have conflicting information? In our new paper, we study how LLMs answer subjective, contentious, and conflicting queries in real-world retrieval-augmented situations.

What happens when RAG models are provided with documents that have conflicting information?

In our new paper, we study how LLMs answer subjective, contentious, and conflicting queries in real-world retrieval-augmented situations.
Krista Opsahl-Ong (@kristahopsalong) 's Twitter Profile Photo

Got a pipeline with **multiple prompts**, like a DSPy program? What's the right way to jointly optimize these prompts? Introducing MIPRO, a Multi-prompt Instruction Proposal Optimizer. We integrated MIPRO into DSPy. It can deliver +11% gains over existing DSPy optimizers! 🧵👇

Kushal Tirumala (@kushal_tirumala) 's Twitter Profile Photo

if you care about pruning LLMs, you should check out our new paper!! this was a fun project, and am grateful to have gotten the chance to work with this fantastic group of people see the thread below for more👇

Judy Shen (@judyhshen) 's Twitter Profile Photo

I’m fighting… against vague notions of LLM attributions. 😤 Check out our paper (w. Teddi Worledge, Nicole, Caleb and Carlos) here: arxiv.org/abs/2311.12233

CLS (@chengleisi) 's Twitter Profile Photo

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas?

After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.
Logan Engstrom (@logan_engstrom) 's Twitter Profile Photo

Announcing a deadline extension for the ATTRIB workshop! Submissions are now due September 25th, with an option to submit October 4th if at least one paper author volunteers to be an emergency reviewer. More info here: attrib-workshop.cc

Irena Gao (@irena_gao) 's Twitter Profile Photo

Many providers offer inference APIs for the same models: for example, there were over nine Llama-3 8B APIs in Summer 2024. Do all of these APIs serve the same completion distribution as the original model? In our new paper, ✨Model Equality Testing: Which Model is This API

Many providers offer inference APIs for the same models: for example, there were over nine Llama-3 8B APIs in Summer 2024. Do all of these APIs serve the same completion distribution as the original model?

In our new paper, ✨Model Equality Testing: Which Model is This API
Nicole Meister (@nicole__meister) 's Twitter Profile Photo

Prior work has used LLMs to simulate survey responses, yet their ability to match the distribution of views remains uncertain. Our new paper [arxiv.org/pdf/2411.05403] introduces a benchmark to evaluate how distributionally aligned LLMs are with human opinions. 🧵

Prior work has used LLMs to simulate survey responses, yet their ability to match the distribution of views remains uncertain.

Our new paper [arxiv.org/pdf/2411.05403] introduces a benchmark to evaluate how distributionally aligned LLMs are with human opinions.

🧵
Luke Bailey (@lukebailey181) 's Twitter Profile Photo

Can interpretability help defend LLMs? We find we can reshape activations while preserving a model’s behavior. This lets us attack latent-space defenses, from SAEs and probes to Circuit Breakers. We can attack so precisely that we make a harmfulness probe output this QR code. 🧵

Liana (@lianapatel_) 's Twitter Profile Photo

🚀 Thrilled to launch DeepScholar, an openly-accessible DeepResearch system we've been building at Berkeley & Stanford. DeepScholar efficiently processes 100s of articles, demonstrating strong long-form research synthesis capabilities, competitive with OpenAI's DR, while running

🚀 Thrilled to launch DeepScholar, an openly-accessible DeepResearch system we've been building at Berkeley & Stanford.

DeepScholar efficiently processes 100s of articles, demonstrating strong long-form research synthesis capabilities, competitive with OpenAI's DR, while running