Doug Downey (@_dougdowney) 's Twitter Profile
Doug Downey

@_dougdowney

Research Manager at @allen_ai, Prof at @northwesterncs

ID: 1258151736105041920

calendar_today06-05-2020 21:48:55

90 Tweet

342 Takipçi

195 Takip Edilen

Doug Downey (@_dougdowney) 's Twitter Profile Photo

New scientific QA system led by Sergey Feldman, Amanpreet Singh, Joseph Chee Chang, Aakanksha Naik and team from AI2 & UW! Following AI2/UW's prior open QA system (x.com/AkariAsai/stat…) by Akari Asai, this adds thematic clustering, tables, and the latest proprietary models.

Ai2 (@allen_ai) 's Twitter Profile Photo

Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on

Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on
Ai2 (@allen_ai) 's Twitter Profile Photo

We took our most efficient model and made an open-source iOS app📱but why? As phones get faster, more AI will happen on device. With OLMoE, researchers, developers, and users can get a feel for this future: fully private LLMs, available anytime. Learn more from Luca Soldaini 🎀👇

Ai2 (@allen_ai) 's Twitter Profile Photo

Introducing olmOCR, our open-source tool to extract clean plain text from PDFs! Built for scale, olmOCR handles many document types with high throughput. Run it on your own GPU for free—at over 3000 token/s, equivalent to $190 per million pages, or 1/32 the cost of GPT-4o!

Ai2 (@allen_ai) 's Twitter Profile Photo

We’re excited to share some updates to Ai2 ScholarQA: 🗂️ You can now sign in via Google to save your query history across devices and browsers. 📚 We added 108M+ paper abstracts to our corpus - expect to get even better responses! ✨ The backbone model has been updated to the

We’re excited to share some updates to Ai2 ScholarQA:
🗂️ You can now sign in via Google to save your query history across devices and browsers.
📚 We added 108M+ paper abstracts to our corpus - expect to get even better responses!
✨ The backbone model has been updated to the
Ai2 (@allen_ai) 's Twitter Profile Photo

Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks. Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!

Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks.

Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!
Ai2 (@allen_ai) 's Twitter Profile Photo

Meet Ai2 Paper Finder, an LLM-powered literature search system. Searching for relevant work is a multi-step process that requires iteration. Paper Finder mimics this workflow — and helps researchers find more papers than ever 🔍

Meet Ai2 Paper Finder, an LLM-powered literature search system.

Searching for relevant work is a multi-step process that requires iteration. Paper Finder mimics this workflow — and helps researchers find more papers than ever 🔍
Ai2 (@allen_ai) 's Twitter Profile Photo

Imagine AI doing science: reading papers, generating ideas, designing and running experiments, analyzing results… How many more discoveries can we reveal? 🧐 Meet CodeScientist, a promising next step toward autonomous scientific discovery. 🧵

Imagine AI doing science: reading papers, generating ideas, designing and running experiments, analyzing results… How many more discoveries can we reveal? 🧐

Meet CodeScientist, a promising next step toward autonomous scientific discovery. 🧵
Ai2 (@allen_ai) 's Twitter Profile Photo

For years it’s been an open question — how much is a language model learning and synthesizing information, and how much is it just memorizing and reciting? Introducing OLMoTrace, a new feature in the Ai2 Playground that begins to shed some light. 🔦

Ai2 (@allen_ai) 's Twitter Profile Photo

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared. DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared.
DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵
Semantic Scholar Research @ AI2 (@ai2_s2research) 's Twitter Profile Photo

Ai2 Semantic Scholar is hiring an #ml #nlp #ai reasoning researcher for a Research Scientist, Agents for Science position with target start dates in 2025. Excited about developing AI systems with deep reasoning capabilities for science? Send an application our way!

<a href="/allen_ai/">Ai2</a> <a href="/SemanticScholar/">Semantic Scholar</a>
is hiring an #ml #nlp #ai reasoning researcher for a Research Scientist, Agents for Science position with target start dates in 2025. Excited about developing AI systems with deep reasoning capabilities for science? Send an application our way!
Ai2 (@allen_ai) 's Twitter Profile Photo

Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵

Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵
Doug Downey (@_dougdowney) 's Twitter Profile Photo

This was a fun collaboration led by Yilun Zhao and Kaiyan Zhang from Arman Cohan's lab at Yale University. Annotators preferred o3 in our study, which was found to give more detailed and technical answers. Curious to see if community voting changes the picture!

Ai2 (@allen_ai) 's Twitter Profile Photo

We’ve upgraded ScholarQA, our agent that helps researchers conduct literature reviews efficiently by providing detailed answers. Now, when ScholarQA cites a source, it won’t just tell you which paper it came from–you’ll see the exact quote, highlighted in the original PDF. 🧵

We’ve upgraded ScholarQA, our agent that helps researchers conduct literature reviews efficiently by providing detailed answers. Now, when ScholarQA cites a source, it won’t just tell you which paper it came from–you’ll see the exact quote, highlighted in the original PDF. 🧵
Ai2 (@allen_ai) 's Twitter Profile Photo

Great science starts with great questions. 🤔✨ Meet AutoDS—an AI that doesn’t just hunt for answers, it decides which questions are worth asking. 🧵

Great science starts with great questions. 🤔✨ Meet AutoDS—an AI that doesn’t just hunt for answers, it decides which questions are worth asking. 🧵
Hita K (@_hitakam) 's Twitter Profile Photo

Are you a researcher in CS or a CS-adjacent field who could use help in refining your research ideas? Want to try our new AI-powered tool that helps with just that in a paid user study? Details and sign up here! forms.gle/UPFjyJ59uuZ5Zb…

Ai2 (@allen_ai) 's Twitter Profile Photo

With fresh support of $75M from U.S. National Science Foundation and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡

With fresh support of $75M from <a href="/NSF/">U.S. National Science Foundation</a> and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡
Ai2 (@allen_ai) 's Twitter Profile Photo

🚀 In March, we launched Paper Finder, an LLM-powered literature search agent that surfaces papers other tools miss. Now, we’re releasing an open-source snapshot to enable others to inspect & build on it—and reproduce the results. 🧵

🚀 In March, we launched Paper Finder, an LLM-powered literature search agent that surfaces papers other tools miss. Now, we’re releasing an open-source snapshot to enable others to inspect &amp; build on it—and reproduce the results. 🧵
Ai2 (@allen_ai) 's Twitter Profile Photo

🚨 SciArena update + evaluation of new models including GPT-5! 🚨 With thousands of new votes, new LLMs are reshaping our leaderboard for scientific literature tasks. o3 still leads—but GPT-5, Claude Opus 4.1, & more are closing the gap.