Vardaan Pahuja (@vardaanpahuja) 's Twitter Profile
Vardaan Pahuja

@vardaanpahuja

Ph.D. student in CSE @osunlp.
Research Interests: Multimodal FMs, KG reasoning, NLP
Ex-intern @MSFTResearch @GoogleAI
Prev @Mila_Quebec @IBMResearch @IITKgp

ID: 2609572508

linkhttps://vardaanpahuja.github.io calendar_today07-07-2014 12:52:37

110 Tweet

200 Takipçi

425 Takip Edilen

Yu Gu @ICLR 2025 (@yugu_nlp) 's Twitter Profile Photo

❓Wondering how to scale inference-time compute with advanced planning for language agents? 🙋‍♂️Short answer: Using your LLM as a world model 💡More detailed answer: Using GPT-4o to predict the outcome of actions on a website can deliver strong performance with improved safety and

❓Wondering how to scale inference-time compute with advanced planning for language agents?

🙋‍♂️Short answer: Using your LLM as a world model
💡More detailed answer: Using GPT-4o to predict the outcome of actions on a website can deliver strong performance with improved safety and
Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

Very excited to learn that our 2023 paper, "G2Retro as a two-step graph generative models for retrosynthesis prediction (nature.com/articles/s4200……)," (led by ziqiChen and ningx005) has been selected into Nature's special collection, "Nobel Prize in Physics 2024,

Percy Liang (@percyliang) 's Twitter Profile Photo

I miss the days when we evaluated algorithms rather than models. Rather than "how well does model M do?", it should be "given data D and compute C, how well does running algorithm A on D with C do?" I don't think we can get scientific clarity unless we do the latter.

Bernal Jiménez (@bernaaaljg) 's Twitter Profile Photo

On my way to #NeurIPS2024 to present HippoRAG! Please don't hesitate to reach out and/or check out our poster if you're interested in: - LLM or human long-term memory, - the limitations of current RAG systems, - the role of knowledge graphs in modern AI or - neuro-inspired AI in

Lingbo Mo (@lingbomo) 's Twitter Profile Photo

🚀 Excited to announce the release of our Agent Safety Resources Repository! 📚🔍 This GitHub repo curates existing papers, benchmarks, and resources to advance research on the safety, trustworthiness, and robustness of autonomous agents driven by LLMs/LMMs. These resources

Boyu Gou (@boyugounlp) 's Twitter Profile Photo

With recent advancements like Claude 3.5 Computer Use and Gemini 2.0, the field of GUI Agents is rapidly evolving. 🚀 Excited to introduce GUI Agent Paper List, your go-to repo for the latest in GUI Agent research! 🌟 ✨ Key Features: - 170+ Papers grouped by environments,

With recent advancements like Claude 3.5 Computer Use and Gemini 2.0, the field of GUI Agents is rapidly evolving.

🚀 Excited to introduce GUI Agent Paper List, your go-to repo for the latest in GUI Agent research! 🌟

✨ Key Features:
- 170+ Papers grouped by environments,
Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

Evaluating on our ScienceAgentBench (Coding tasks in Bioinformatics/Chemistry/Geo info science/Cognitive science) just got much easier and faster! Check out our update on containerized evaluation: (1) Task environments are set up in independent docker containers, which

Boyu Gou (@boyugounlp) 's Twitter Profile Photo

🚀 UGround accepted to #ICLR2025 [scores=10/8/8/5]! 🎉 We’re also thrilled to share some exciting updates: ✨ UGround is SOTA—again! Using the exact same training data, our latest model achieved 89.4% accuracy on ScreenSpot, outperforming models from Google, Anthropic, Apple,

🚀 UGround accepted to #ICLR2025 [scores=10/8/8/5]! 🎉
We’re also thrilled to share some exciting updates:

✨ UGround is SOTA—again!
Using the exact same training data, our latest model achieved 89.4% accuracy on ScreenSpot, outperforming models from Google, Anthropic, Apple,
Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

Such an honor to be part of the 2025 Sloan Research Fellow cohort #SloanFellow! Excited to represent LLM + agent research and Ohio State. Grateful for the support from my family, all the great colleagues and students at OSU NLP Group, and my mentors and collaborators! Thx

Bernal Jiménez (@bernaaaljg) 's Twitter Profile Photo

Introducing ✨HippoRAG 2 ✨ 📣 📣 “From RAG to Memory: Non-Parametric Continual Learning for Large Language Models” HippoRAG 2 is a memory framework for LLMs that elevates our brain-inspired HippoRAG system to new levels of performance and robustness. 🔓 Unlocks Memory

Introducing ✨HippoRAG 2 ✨

📣 📣 “From RAG to Memory: Non-Parametric Continual Learning for Large Language Models”

HippoRAG 2 is a memory framework for LLMs that elevates our brain-inspired HippoRAG system to new levels of performance and robustness.

🔓 Unlocks Memory
Vardaan Pahuja (@vardaanpahuja) 's Twitter Profile Photo

Realistic and reliable evaluation of web agents is critical for measuring true progress. Online-Mind2Web represents a significant step forward, offering a more comprehensive benchmark with improved diversity.

Chan Hee (Luke) Song (@luke_ch_song) 's Twitter Profile Photo

🔥 VLMs aren’t built for spatial reasoning — yet. They hallucinate free space. Misjudge object fit. Can’t tell below from behind We built RoboSpatial to tackle that — a dataset for teaching spatial understanding to 2D/3D VLMs for robotics. 📝 Perfect review scores #CVPR2025 2025

Boyuan Zheng (@boyuan__zheng) 's Twitter Profile Photo

🚀 Excited to co-organize the Workshop on Computer Use Agents (CUA) at #ICML2025 in Vancouver! This workshop takes a comprehensive look at computer use agents—covering learning algorithms, orchestration, interfaces, safety, benchmarking, applications, and more. We’re also

Boshi Wang (@boshiwang2) 's Twitter Profile Photo

LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why? Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design

LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why?

Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design
Boyuan Zheng (@boyuan__zheng) 's Twitter Profile Photo

🔧What if your web agent could abstract its experience into programmatic skills—and improve itself autonomously? 🌟 Introducing SkillWeaver: a framework to enable self-improvement through autonomous exploration and constructing an ever-growing library of programmatic skills. 🧠

Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

It's a great honor to give a keynote at the Molecule Maker Lab Institute symposium at UIUC! Many thanks to Prof. Heng Ji and Prof. Jiawei Han for invitation. The symposium’s theme this year is “AI scientist? What would it take?”, which I hold close to heart and made a talk titled “Language

Yu Gu @ICLR 2025 (@yugu_nlp) 's Twitter Profile Photo

“What's the role of NLP/LLM  researchers in agent research?” “Natural language is merely a tool for communication.” … These doubts and criticisms have circulated widely over the past two years. In my PhD dissertation, I want to provide a perspective that addresses these doubts

Chan Hee (Luke) Song (@luke_ch_song) 's Twitter Profile Photo

🚨We just released the data generation code for RoboSpatial! 💾 github.com/NVlabs/RoboSpa… 📢 And yes, RoboSpatial is a #CVPR2025 Oral 🏆🔥