tusharkhot (@tusharkhot) Twitter Tweets • TwiCopy

Archiki Prasad

a year ago

I'll be presenting my #NAACL2024 work:✨ADaPT✨ in-person 🇲🇽 tomorrow (June 19) at 11 AM in poster session 7! ADaPT enables LLMs to "adapt" to task complexity & execution failures by decomposing recursively w/ Alexander Koller M Hartmann, P Clark Ashish Sabharwal Mohit Bansal tusharkhot

thumb_up_off_alt55

chat_bubble_outline0

repeat17

shareShare

Bodhisattwa Majumder

@mbodhisattwa

a year ago

Can LLMs help accelerate the discovery of data-driven scientific hypotheses? 🧬📊 We benchmark this in DiscoveryBench: 264 discovery tasks from 6 scientific domains, from humanities to biology: arxiv.org/pdf/2407.01725… Ai2 Aristo Team at AI2 Harshit Surana UMass Amherst

thumb_up_off_alt171

chat_bubble_outline4

repeat46

shareShare

Bodhisattwa Majumder

@mbodhisattwa

a year ago

To y'all attending #icml24, gearing up to argue what general-purpose generative models/agents can and cannot do for automated scientific discovery. Would love to hear your thoughts and counter-arguments on Wed (7/24) at 1:30, Hall C 4-9 #117. + discuss what's cooking Ai2

thumb_up_off_alt36

chat_bubble_outline6

repeat11

shareShare

Archiki Prasad

@archikiprasad

a year ago

Symbolic planners do explicit search for most steps. However, LLMs can solve easy sub-goals directly (Sys-1) and verbalize search for harder ones (Sys-2). In✨System-1.x✨, we train LLMs to efficiently balance Sys-1 & 2. Users can simply control hybridization via the dial x! 🎛️

thumb_up_off_alt57

chat_bubble_outline0

repeat14

shareShare

AK

@_akhaliq

a year ago

AppWorld A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Autonomous agents that address day-to-day digital tasks (e.g., ordering groceries for a household), must not only operate multiple apps (e.g., notes, messaging, shopping app) via

thumb_up_off_alt148

chat_bubble_outline4

repeat24

shareShare

Harsh Trivedi

@harsh3vedi

a year ago

🔥 Autonomous AI Assistants (e.g., #googleio2024, #WWDC24) and coding agents (e.g., #Devin, #SWEAgent) have garnered a lot of attention recently. We can envision coding agents autonomously completing complex day-to-day tasks across apps using APIs on our behalf. But how can we

thumb_up_off_alt81

chat_bubble_outline4

repeat28

shareShare

LUNR

@stonybrooknlp

a year ago

AppWorld won the (one of the) best resource paper award(s) at #ACL2024 Outstanding resource and great work by Harsh Trivedi Niranjan at Stony Brook tusharkhot Ashish Sabharwal Aristo Team at AI2 and collaborators 🧵👇

AppWorld won the (one of the) best resource paper award(s) at #ACL2024
Outstanding resource and great work by <a href="/harsh3vedi/">Harsh Trivedi</a> <a href="/b_niranjan/">Niranjan</a> at Stony Brook <a href="/tusharkhot/">tusharkhot</a> <a href="/Ashish_S_AI/">Ashish Sabharwal</a> <a href="/ai2_aristo/">Aristo Team at AI2</a> and collaborators

🧵👇

thumb_up_off_alt16

chat_bubble_outline0

repeat8

shareShare

ACL 2025

@aclmeeting

a year ago

🏆 ACL Best Theme Paper Award: - OLMo: Accelerating the Science of Language Models by Groeneveld et al. #NLProc #ACL2024NLP

thumb_up_off_alt113

chat_bubble_outline0

repeat10

shareShare

Niranjan

@b_niranjan

a year ago

🏆 AppWorld won a #ACL2024NLP Best Resource Paper Award. 🥳 Congrats team! I'm so happy for Harsh Trivedi. The time & care he put in is inspiring. #proudadvisor 🚨He is on the job market.🚨 Hire him! 🌐Check out appworld.dev Stony Brook University Dept. of Computer Science @AI_SBU #NLProc Ai2

thumb_up_off_alt38

chat_bubble_outline1

repeat7

shareShare

Ben Bogin

@ben_bogin

a year ago

📢 New Benchmark: SUPER for Setting UP and Executing tasks from Research repositories Reproducibility is crucial in science. We introduce SUPER to evaluate LLMs' capabilities in autonomously running experiments from research repositories. ⬇️ arxiv.org/pdf/2409.07440

thumb_up_off_alt72

chat_bubble_outline5

repeat19

shareShare

Ai2

@allen_ai

a year ago

Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it

thumb_up_off_alt1,1K

chat_bubble_outline55

repeat285

shareShare

Harsh Trivedi

@harsh3vedi

a year ago

📢 I am giving a talk on 🌎 AppWorld & its future works at 13+ universities (USC, UCI, Stanford, Berkeley, Princeton, JHU, ..) and companies (Ai2, Google, Apple, Semantic Machines, ..) in the next 1-2 months 📅 Schedule+Details: appworld.dev/talks x.com/harsh3vedi/sta…

thumb_up_off_alt29

chat_bubble_outline1

repeat6

shareShare

Peter Jansen ( @peterjansen-ai.bsky.social )

@peterjansen_ai

a year ago

Can language models perform end-to-end scientific discovery? In our NeurIPS Spotlight paper, we show: very rarely. Our best model found <20% of discoveries, our best PhDs found nearly all. Paper: arxiv.org/pdf/2406.06769 Code/Web: allenai.github.io/discoveryworld Ai2 Microsoft Research

thumb_up_off_alt497

chat_bubble_outline7

repeat104

shareShare

Stanford NLP Group

@stanfordnlp

a year ago

For this week’s NLP Seminar, we are thrilled to host Harsh Trivedi to talk about AppWorld: Reliable Evaluation of Interactive Agents in a World of Apps and People! When: 10/10 Thurs 11am PT Non-Stanford affiliates registration form: forms.gle/UjWyX6dn7mQafj… (closed at 9am PT on

For this week’s NLP Seminar, we are thrilled to host <a href="/harsh3vedi/">Harsh Trivedi</a> to talk about AppWorld: Reliable Evaluation of Interactive Agents in a World of Apps and People!

When: 10/10 Thurs 11am PT
Non-Stanford affiliates registration form: forms.gle/UjWyX6dn7mQafj… (closed at 9am PT on

thumb_up_off_alt18

chat_bubble_outline4

repeat6

shareShare

Archiki Prasad

@archikiprasad

a year ago

🚨Looking to self-align models on complex problem-solving tasks without gold answers or labels? Checkout my internship work on ✨Self-consistency Preference Optimization (ScPO)✨, where we use the self-consistency concept to help train models by iteratively training consistent

thumb_up_off_alt105

chat_bubble_outline0

repeat33

shareShare

Google Gemini App

@geminiapp

a year ago

Introducing Deep Research, your personal agentic AI research assistant. Rolling out starting today in Gemini Advanced. With Deep Research, you can create in-depth research reports on complex topics, complete with source links, giving you hours of research at your fingertips in

thumb_up_off_alt2,2K

chat_bubble_outline183

repeat481

shareShare

tusharkhot

@tusharkhot

a year ago

Working on Scientific discovery? Submit your papers to the AI & Scientific Discovery workshop. Due: Jan 30!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Peter Jansen ( @peterjansen-ai.bsky.social )

@peterjansen_ai

10 months ago

AI & Scientific Discovery Workshop deadline extended, with a few more days to submit! Submit your archival & non-archival papers broadly in the AI & Scientific Discovery space, for a great opportunity to attend a workshop with like-minded folks. ai-and-scientific-discovery.github.io

thumb_up_off_alt1

chat_bubble_outline0

repeat2

shareShare

Bill Yuchen Lin

@billyuchenlin

10 months ago

If you're interested in LLMs like o1 and R1 for complex reasoning, check out this paper — we show that logical reasoning tasks are ideal for evaluating and understanding their scaling limits. 🦓 ZebraLogic-Bench is a dataset of 1K constraint satisfaction problems (CSPs)

thumb_up_off_alt647

chat_bubble_outline15

repeat115

shareShare

Google Gemini App

@geminiapp

10 months ago

We’re also rolling out a version of 2.0 Flash Thinking that can interact with apps like YouTube, Google Search and @GoogleMaps. These connected apps already make the Gemini app a uniquely helpful AI-powered assistant, and we’re exploring how new reasoning capabilities can

thumb_up_off_alt232

chat_bubble_outline8

repeat7

shareShare