Vardaan Pahuja (@vardaanpahuja) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

❓Wondering how to scale inference-time compute with advanced planning for language agents? 🙋‍♂️Short answer: Using your LLM as a world model 💡More detailed answer: Using GPT-4o to predict the outcome of actions on a website can deliver strong performance with improved safety and

thumb_up_off_alt415

chat_bubble_outline16

repeat89

shareShare

Huan Sun (OSU)

@hhsun1

8 months ago

Very excited to learn that our 2023 paper, "G2Retro as a two-step graph generative models for retrosynthesis prediction (nature.com/articles/s4200……)," (led by ziqiChen and ningx005) has been selected into Nature's special collection, "Nobel Prize in Physics 2024,

thumb_up_off_alt51

chat_bubble_outline2

repeat14

shareShare

Percy Liang

@percyliang

8 months ago

I miss the days when we evaluated algorithms rather than models. Rather than "how well does model M do?", it should be "given data D and compute C, how well does running algorithm A on D with C do?" I don't think we can get scientific clarity unless we do the latter.

thumb_up_off_alt793

chat_bubble_outline20

repeat92

shareShare

Bernal Jiménez

@bernaaaljg

8 months ago

On my way to #NeurIPS2024 to present HippoRAG! Please don't hesitate to reach out and/or check out our poster if you're interested in: - LLM or human long-term memory, - the limitations of current RAG systems, - the role of knowledge graphs in modern AI or - neuro-inspired AI in

thumb_up_off_alt33

chat_bubble_outline7

repeat10

shareShare

Lingbo Mo

@lingbomo

8 months ago

🚀 Excited to announce the release of our Agent Safety Resources Repository! 📚🔍 This GitHub repo curates existing papers, benchmarks, and resources to advance research on the safety, trustworthiness, and robustness of autonomous agents driven by LLMs/LMMs. These resources

thumb_up_off_alt28

chat_bubble_outline1

repeat16

shareShare

Boyu Gou

@boyugounlp

8 months ago

With recent advancements like Claude 3.5 Computer Use and Gemini 2.0, the field of GUI Agents is rapidly evolving. 🚀 Excited to introduce GUI Agent Paper List, your go-to repo for the latest in GUI Agent research! 🌟 ✨ Key Features: - 170+ Papers grouped by environments,

thumb_up_off_alt64

chat_bubble_outline2

repeat19

shareShare

Huan Sun (OSU)

@hhsun1

7 months ago

Evaluating on our ScienceAgentBench (Coding tasks in Bioinformatics/Chemistry/Geo info science/Cognitive science) just got much easier and faster! Check out our update on containerized evaluation: (1) Task environments are set up in independent docker containers, which

thumb_up_off_alt49

chat_bubble_outline2

repeat11

shareShare

Boyu Gou

@boyugounlp

6 months ago

🚀 UGround accepted to #ICLR2025 [scores=10/8/8/5]! 🎉 We’re also thrilled to share some exciting updates: ✨ UGround is SOTA—again! Using the exact same training data, our latest model achieved 89.4% accuracy on ScreenSpot, outperforming models from Google, Anthropic, Apple,

thumb_up_off_alt84

chat_bubble_outline2

repeat25

shareShare

Yu Su @#ICLR2025

@ysu_nlp

6 months ago

Such an honor to be part of the 2025 Sloan Research Fellow cohort #SloanFellow! Excited to represent LLM + agent research and Ohio State. Grateful for the support from my family, all the great colleagues and students at OSU NLP Group, and my mentors and collaborators! Thx

thumb_up_off_alt206

chat_bubble_outline36

repeat14

shareShare

Vardaan Pahuja

@vardaanpahuja

5 months ago

I love this interactive demo; it's a great way to see how SAEs can help interpret vision models!

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Bernal Jiménez

@bernaaaljg

5 months ago

Introducing ✨HippoRAG 2 ✨ 📣 📣 “From RAG to Memory: Non-Parametric Continual Learning for Large Language Models” HippoRAG 2 is a memory framework for LLMs that elevates our brain-inspired HippoRAG system to new levels of performance and robustness. 🔓 Unlocks Memory

thumb_up_off_alt132

chat_bubble_outline3

repeat44

shareShare

Vardaan Pahuja

@vardaanpahuja

5 months ago

A Visual Guide to LLM Agents open.substack.com/pub/maartengro…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Vardaan Pahuja

@vardaanpahuja

4 months ago

Realistic and reliable evaluation of web agents is critical for measuring true progress. Online-Mind2Web represents a significant step forward, offering a more comprehensive benchmark with improved diversity.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Chan Hee (Luke) Song

@luke_ch_song

4 months ago

🔥 VLMs aren’t built for spatial reasoning — yet. They hallucinate free space. Misjudge object fit. Can’t tell below from behind We built RoboSpatial to tackle that — a dataset for teaching spatial understanding to 2D/3D VLMs for robotics. 📝 Perfect review scores #CVPR2025 2025

thumb_up_off_alt391

chat_bubble_outline5

repeat73

shareShare

Boyuan Zheng

@boyuan__zheng

4 months ago

🚀 Excited to co-organize the Workshop on Computer Use Agents (CUA) at #ICML2025 in Vancouver! This workshop takes a comprehensive look at computer use agents—covering learning algorithms, orchestration, interfaces, safety, benchmarking, applications, and more. We’re also

thumb_up_off_alt24

chat_bubble_outline0

repeat13

shareShare

Boshi Wang

@boshiwang2

4 months ago

LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why? Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design

thumb_up_off_alt872

chat_bubble_outline24

repeat127

shareShare

Boyuan Zheng

@boyuan__zheng

4 months ago

🔧What if your web agent could abstract its experience into programmatic skills—and improve itself autonomously? 🌟 Introducing SkillWeaver: a framework to enable self-improvement through autonomous exploration and constructing an ever-growing library of programmatic skills. 🧠

thumb_up_off_alt65

chat_bubble_outline4

repeat25

shareShare

Huan Sun (OSU)

@hhsun1

4 months ago

It's a great honor to give a keynote at the Molecule Maker Lab Institute symposium at UIUC! Many thanks to Prof. Heng Ji and Prof. Jiawei Han for invitation. The symposium’s theme this year is “AI scientist? What would it take?”, which I hold close to heart and made a talk titled “Language

thumb_up_off_alt70

chat_bubble_outline2

repeat19

shareShare

Yu Gu @ICLR 2025

@yugu_nlp

4 months ago

“What's the role of NLP/LLM researchers in agent research?” “Natural language is merely a tool for communication.” … These doubts and criticisms have circulated widely over the past two years. In my PhD dissertation, I want to provide a perspective that addresses these doubts

thumb_up_off_alt76

chat_bubble_outline4

repeat22

shareShare

Chan Hee (Luke) Song

@luke_ch_song

3 months ago

🚨We just released the data generation code for RoboSpatial! 💾 github.com/NVlabs/RoboSpa… 📢 And yes, RoboSpatial is a #CVPR2025 Oral 🏆🔥

thumb_up_off_alt11

chat_bubble_outline0

repeat9

shareShare