Aristo Team at AI2 (@ai2_aristo) 's Twitter Profile
Aristo Team at AI2

@ai2_aristo

Building machines that can read, learn and reason at @allen_ai
Join us: allenai.org/careers?team=a…

ID: 1516911351867736064

linkhttps://allenai.org/aristo calendar_today20-04-2022 22:48:15

74 Tweet

914 Takipçi

10 Takip Edilen

Iz Beltagy (@i_beltagy) 's Twitter Profile Photo

OLMo-7b is finally out 🎉, and we are releasing everything; weights, intermediate checkpoints, training code and logs, training data and toolkit, evaluation and adaptation code and data. Most of it has been released, and the rest is coming soon. OLMo-65b and Adapted OLMo-7b are

OLMo-7b is finally out 🎉, and we are releasing everything; weights, intermediate checkpoints, training code and logs, training data and toolkit, evaluation and adaptation code and data. 

Most of it has been released, and the rest is coming soon. OLMo-65b and Adapted OLMo-7b are
Aristo Team at AI2 (@ai2_aristo) 's Twitter Profile Photo

📢New #ICLR2024 paper with Stanford NLP Group, Princeton NLP Group We find pervasive stereotypical biases in persona-assigned LLMs and show that they can covertly degrade LLM’s reasoning skills (coding, MMLU, etc). We also release a dataset of 1.5M model outputs to enable future research.

Aristo Team at AI2 (@ai2_aristo) 's Twitter Profile Photo

New work from Aristo Team at AI2 is live on arxiv! Congrats to all involved -- Kolby Nottingham Bodhisattwa Majumder bhavana dalvi Sameer Singh Peter Clark and Roy Fox (@[email protected]) 🙌 Learn more ▶️ Project Site: allenai.github.io/sso Paper: arxiv.org/abs/2402.03244 Code: github.com/allenai/sso

Yuling Gu (@gu_yuling) 's Twitter Profile Photo

Looking for an ✨⚙️ interpretable explanation evaluation tool 🔧💫 that can 🤩🔎 automatically characterize the explanation capabilities of modern LLMs 🔬🤩? Check out 🤖 “Digital Socrates: Evaluating LLMs through Explanation Critiques” 🤖 ! arxiv.org/abs/2311.09613 1/

Looking for an ✨⚙️ interpretable explanation evaluation tool 🔧💫 that can 🤩🔎 automatically characterize the explanation capabilities of modern LLMs 🔬🤩? Check out 🤖 “Digital Socrates: Evaluating LLMs through Explanation Critiques” 🤖 !

arxiv.org/abs/2311.09613

1/
Bodhisattwa Majumder (@mbodhisattwa) 's Twitter Profile Photo

Is it possible to build end-to-end autonomous discovery systems using Large Generative Models (LGMs)? 🧬 In this position paper, we argue: arxiv.org/pdf/2402.13610… 🧵 (1/n) Ai2 Aristo Team at AI2 Harshit Surana UMass Amherst University of Utah

Archiki Prasad (@archikiprasad) 's Twitter Profile Photo

🎉Our work ADaPT on enabling LLM agents to dynamically “adapt” to task complexity & LLM capabilities via recursive decomposition is accepted as #NAACL2024 findings!😄 Many thanks to Alexander Koller M Hartmann, P Clark, Ashish Sabharwal Mohit Bansal tusharkhot Aristo Team at AI2 Ai2 UNC NLP

Aristo Team at AI2 (@ai2_aristo) 's Twitter Profile Photo

Wondering why Chain-of-Thought appears to make Transformers more powerful? Find out from Ben Brubaker's elegant and broad overview📜 in Quanta Magazine, covering an upcoming ICLR-2024 paper by William Merrill, Ashish Sabharwal on precisely this topic!

Aristo Team at AI2 (@ai2_aristo) 's Twitter Profile Photo

"The Illusion of State in State Space Models" -- William Merrill, Jackson Petty, and Ashish Sabharwal find that newly popular "state" space models (SSMs) are surprisingly as limited as Transformers when it comes to tracking state.

Sanchaita Hazra (@hsanchaita) 's Twitter Profile Photo

🌻 Super excited about my first Computer Science publication at NAACL HLT 2025 (main)! Bodhisattwa Majumder and I study the language of deception and how language models fare at detecting them. And guess what we've found: arxiv.org/pdf/2311.07092… (1/n) 🧵 @EconUofU Ai2

🌻 Super excited about my first Computer Science publication at <a href="/naaclmeeting/">NAACL HLT 2025</a> (main)! <a href="/mbodhisattwa/">Bodhisattwa Majumder</a> and I study the language of deception and how language models fare at detecting them. And guess what we've found:  arxiv.org/pdf/2311.07092…
(1/n) 🧵
@EconUofU <a href="/allen_ai/">Ai2</a>
Clémentine Fourrier 🍊 (@clefourrier) 's Twitter Profile Photo

Is chain of thought actually helping your model? 🤔 According to the CoT Leaderboard, it seems more useful for the smaller models! Really looking forward to seeing more prompting strategies tested :) Congrats to the Logikon-AI + Ai2 teams! huggingface.co/blog/leaderboa…

Kolby Nottingham (@kolbytn) 's Twitter Profile Photo

Skill Set Optimization was accepted to ICML Conference 2024! I'm proud of this work and everything we learned about in-context policy improvement. Big thanks to my collaborators at Ai2. Way to go team!

Bodhisattwa Majumder (@mbodhisattwa) 's Twitter Profile Photo

Incredibly proud of our teamwork, now in ICML Conference! This position starts a series of work on data-driven scientific discovery w generative models. Follow-ups coming soon on benchmarks, systems, & accessibility in science! arxiv.org/abs/2402.13610 #ICML2024 Ai2 Aristo Team at AI2

Incredibly proud of our teamwork, now in <a href="/icmlconf/">ICML Conference</a>! This position starts a series of work on data-driven scientific discovery w generative models.
Follow-ups coming soon on benchmarks, systems, &amp; accessibility in science!
arxiv.org/abs/2402.13610
#ICML2024 <a href="/allen_ai/">Ai2</a> <a href="/ai2_aristo/">Aristo Team at AI2</a>
Yuling Gu (@gu_yuling) 's Twitter Profile Photo

Our paper 🤖 “Digital Socrates: Evaluating LLMs through Explanation Critiques” 🤖 has been accepted to the #ACL2024NLP main conference! 🎉 w/ my collaborators Oyvind Tafjord and Peter Clark Ai2 Aristo Team at AI2 Try out Digital Socrates for your model evaluations! #NLProc

Aristo Team at AI2 (@ai2_aristo) 's Twitter Profile Photo

NLRSE workshop @ ACL 2024: Deadline extended to May 21 AoE! Also note that non-archival cross-submissions (papers accepted to other venues, such as ACL Findings) can be submitted on the Google Form here: docs.google.com/forms/d/1OAzZE…

Aristo Team at AI2 (@ai2_aristo) 's Twitter Profile Photo

Want to build or test Interactive Coding Agents? Check out AppWorld, an exciting new multi-app simulated environment and benchmark from Stony Brook University and Ai2 !

Aristo Team at AI2 (@ai2_aristo) 's Twitter Profile Photo

AppWorld (appworld.dev) recognized at #ACL2024nlp with a Best Resource Paper award! Congratulations to Harsh Trivedi and collaborators from Stony Brook University and Ai2 for this exciting new environment for interactive coding agents!

Aristo Team at AI2 (@ai2_aristo) 's Twitter Profile Photo

📢The countdown continues: Only one month left to submit your papers to the AI & Scientific Discovery Workshop@NAACL 2025⌛️

Ai2 (@allen_ai) 's Twitter Profile Photo

Imagine AI doing science: reading papers, generating ideas, designing and running experiments, analyzing results… How many more discoveries can we reveal? 🧐 Meet CodeScientist, a promising next step toward autonomous scientific discovery. 🧵

Imagine AI doing science: reading papers, generating ideas, designing and running experiments, analyzing results… How many more discoveries can we reveal? 🧐

Meet CodeScientist, a promising next step toward autonomous scientific discovery. 🧵