Jianyang Gu (@vimar_gu) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

We're already using AI search systems every day for more and more complex tasks, but how good are they really? Challenge: evaluation is hard with no fixed ground truth! In Mind2Web 2, we use agents to evaluate agents. Really excited! Thanks to everyone who made this possible!

thumb_up_off_alt17

chat_bubble_outline0

repeat2

shareShare

Huan Sun (OSU)

@hhsun1

5 months ago

🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with Yu Su OSU NLP Group. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,

thumb_up_off_alt64

chat_bubble_outline1

repeat30

shareShare

Sam Stevens

@samstevens6860

5 months ago

I'm excited to bring the Imageomics workshop to NeurIPS 2025! Consider submitting your work on ai4ecology, ai4conservation and general ai4science--if you're using images to learn something about the natural world, chances are it's a good fit for the imageomics workshop!

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Vardaan Pahuja

@vardaanpahuja

5 months ago

🚀 Excited to share our #ACL2025 Findings paper: Explorer — a scalable pipeline that generates diverse web trajectories via exploration, powering generalist GUI agents with strong performance! 📄 arxiv.org/pdf/2502.11357 🌐 osu-nlp-group.github.io/Explorer/ #WebAgents #SyntheticData #LLM

thumb_up_off_alt32

chat_bubble_outline0

repeat13

shareShare

Yu Su @#ICLR2025

@ysu_nlp

5 months ago

Safety is one of the biggest blockers for computer use agents: how can I trust an agent won’t accidentally do something consequential without my permission? We collect and release the first large-scale dataset for detecting consequential actions on the web, and train the best

thumb_up_off_alt98

chat_bubble_outline0

repeat19

shareShare

Boyuan Zheng

@boyuan__zheng

5 months ago

Remember “Son of Anton” from the Silicon Valley show(Silicon Valley)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code? It’s starting to look a lot like reality. Even 18 months ago, my own

Remember “Son of Anton” from the Silicon Valley show(<a href="/SiliconHBO/">Silicon Valley</a>)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code?

It’s starting to look a lot like reality.

Even 18 months ago, my own

thumb_up_off_alt66

chat_bubble_outline0

repeat27

shareShare

Lauren Gillespie

@leg2015

5 months ago

Excited to be giving a keynote at the NeurIPS version of the Imageomics workshop! 🎉It's also not too late to submit your own work to the workshop! (Aug. 22nd deadline, details below 👇)

thumb_up_off_alt10

chat_bubble_outline0

repeat4

shareShare

Yu Su @#ICLR2025

@ysu_nlp

4 months ago

Excited to receive the NSF CAREER Award! Grateful for all the support and encouragement I've received in the 6 years of faculty life so far, especially for my extremely supportive family and for the amazing students OSU NLP Group I have had the privilege to work with!!

thumb_up_off_alt255

chat_bubble_outline23

repeat12

shareShare

Tianshu Zhang

@tianshu_osu

3 months ago

🎉 Excited to share that our paper EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution was accepted at VLDB 2025! 🚀 📢 Reminder: join us at VLDB 2025 in London! 🗓️ Sept 2 (Tue), 10:45 AM – 12:15 PM 📍 Room Wordsworth 4F 📄 vldb.org/pvldb/vol18/p3… #VLDB2025 #LLMs

thumb_up_off_alt28

chat_bubble_outline1

repeat18

shareShare

Huan Sun (OSU)

@hhsun1

3 months ago

🧪 Chemists spend many hours planning and replanning synthetic routes for a target molecule to avoid dangerous reactants and intermediates.☠️🚫 🤔 What if an AI agent could plan around them automatically—better and faster than human experts? 🔬 Constrained retrosynthesis

thumb_up_off_alt49

chat_bubble_outline4

repeat17

shareShare

Huan Sun (OSU)

@hhsun1

3 months ago

I am humbled and grateful to receive two grants from Open Philanthropy Open Philanthropy to advance the safety of AI systems, co-led with my colleague Yu Su (hiring postdoc). I'm also honored to be the first at Ohio State to receive Open Philanthropy funding. Most credit goes to the amazing students

thumb_up_off_alt79

chat_bubble_outline4

repeat17

shareShare

Yu Su @#ICLR2025

@ysu_nlp

3 months ago

Computer Use: Modern Moravec's Paradox A new blog post arguing why computer-use agents may be the biggest opportunity and challenge for AGI. tinyurl.com/computer-use-a… Table of Contents > Moravec’s Paradox > Moravec's Paradox in 2025 > Computer use may be the biggest opportunity

thumb_up_off_alt185

chat_bubble_outline9

repeat61

shareShare

Jianyang Gu

@vimar_gu

3 months ago

We are continuously pushing the frontier of biological AI with BioCLIP. Stay tuned for more updates!

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare

Yu Su @#ICLR2025

@ysu_nlp

3 months ago

> working on semantic parsing in PhD > didn't even have its own track at ACL > it's a dead area, people say > had ~100 citations when graduating > but natural language programming is always the dream > 'Let machines understand human thinking. Don’t let humans think like machines'

thumb_up_off_alt609

chat_bubble_outline17

repeat24

shareShare

Huan Sun (OSU)

@hhsun1

2 months ago

While at #COLM2025, sharing a bit COLMy news related to the keynote yesterday by Dr. Shirley Ho Conference on Language Modeling: So, you know frontier LLMs have achieved gold medal level performance for IMO, IPhO, and IOI; what about astronomy and astrophysics? Our recent evaluation shows that

thumb_up_off_alt56

chat_bubble_outline2

repeat13

shareShare

Hanane Moussa

@hananenmoussa

2 months ago

📢 As AI becomes increasingly explored for research idea generation, how can we rigorously evaluate the ideas it generates before committing time and resources to them? We introduce ScholarEval, a literature grounded framework for research idea evaluation across disciplines 👇!

thumb_up_off_alt139

chat_bubble_outline4

repeat42

shareShare

Jianyang Gu

good girl

Zanming Huang

Huan Sun (OSU)

Sam Stevens

Vardaan Pahuja

Yu Su @#ICLR2025

Boyuan Zheng

Lauren Gillespie

Yu Su @#ICLR2025

Tianshu Zhang

Huan Sun (OSU)

Huan Sun (OSU)

Yu Su @#ICLR2025

Jianyang Gu

Yu Su @#ICLR2025

Huan Sun (OSU)

Hanane Moussa