Jean de Nyandwi (@jeande_d) 's Twitter Profile
Jean de Nyandwi

@jeande_d

Visiting Researcher @LTIatCMU • Vision 🤍 Language, Multimodal Research • CMU

Research blog: deeprevision.github.io
ML: nyandwi.com/machine_learni…

ID: 836904555728289793

linkhttps://nyandwi.com calendar_today01-03-2017 11:42:49

5,5K Tweet

43,43K Followers

980 Following

Graham Neubig (@gneubig) 's Twitter Profile Photo

I created a Python project starter repo for students that helps maintain good code quality while doing research projects: github.com/neubig/starter… I was opinionated and made only one choice for each tool, but there are other options too!

I created a Python project starter repo for students that helps maintain good code quality while doing research projects: github.com/neubig/starter…

I was opinionated and made only one choice for each tool, but there are other options too!
Simran Khanuja (@simi_97k) 's Twitter Profile Photo

We've talked a lot about evaluation in MT, but how do we evaluate related creative tasks like transcreation? 🔍💡 For eg. in Zootopia (2016), while the newscaster was a moose in the US, it was transcreated to a panda in China, a koala in Australia and a jaguar in Brazil. How

We've talked a lot about evaluation in MT, but how do we evaluate related creative tasks like transcreation? 🔍💡

For eg. in Zootopia (2016), while the newscaster was a moose in the US, it was transcreated to a panda in China, a koala in Australia and a jaguar in Brazil. 

How
Seungone Kim @ NAACL2025 (@seungonekim) 's Twitter Profile Photo

#NLProc New paper on "evaluation-time scaling", a new dimension to leverage test-time compute! We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens. arxiv.org/abs/2503.19877

#NLProc 
New paper on "evaluation-time scaling", a new dimension to leverage test-time compute!

We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens.

arxiv.org/abs/2503.19877
Graham Neubig (@gneubig) 's Twitter Profile Photo

Today's a big day! Months of work went into both of these releases, so we hope people enjoy them. OpenHands is now a great coding agent that you can run entirely locally (w/ OpenHands LM), and a great coding agent that you can run anywhere (w/ OpenHands Cloud).

Jean de Nyandwi (@jeande_d) 's Twitter Profile Photo

People at Artificial Analysis are doing good job on comparing models across important axes: intelligence, speed, and price. And for all modalities and tasks: language, speech, video, image, code. Benchmarks are not perfect but there is no other way to know a model is better than

Graham Neubig (@gneubig) 's Twitter Profile Photo

A big two days of agents starting tomorrow at CMU (and then two days of agent hackathon after that!) Registration is still open so if you're in or around Pittsburgh come one come all: cmu-agent-workshop.github.io We also plan to livestream for participants who can't make it in person

Yueqi Song (@yueqi_song) 's Twitter Profile Photo

Humans can perform complex reasoning without relying on specific domain knowledge, but can multimodal models truly do that as well? Short answer: No. Even the best models perform below the 5th-percentile human on our VisualPuzzles tasks. 🚀 Introducing VisualPuzzles🧩: a new

Humans can perform complex reasoning without relying on specific domain knowledge, but can multimodal models truly do that as well?
Short answer: No. Even the best models perform below the 5th-percentile human on our VisualPuzzles tasks.

🚀 Introducing VisualPuzzles🧩: a new
Zhiqiu Lin (@zhiqiulin) 's Twitter Profile Photo

Fresh GPT‑o3 results on our vision‑centric #NaturalBench (NeurIPS’24) benchmark! 🎯 Its new visual chain‑of‑thought—by “zooming in” on details—cracks questions that still stump GPT‑4o. Yet vision reasoning isn’t solved: o3 can still hallucinate even after a full minute of

Fresh GPT‑o3 results on our vision‑centric #NaturalBench (NeurIPS’24) benchmark! 🎯 Its new visual chain‑of‑thought—by “zooming in” on details—cracks questions that still stump GPT‑4o.

Yet vision reasoning isn’t solved: o3 can still hallucinate even after a full minute of
Zhiqiu Lin (@zhiqiulin) 's Twitter Profile Photo

📷 Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links

Graham Neubig (@gneubig) 's Twitter Profile Photo

How can we vibe code while still maintaining code quality? Over the past year, I've shifted 95% of my development from manually writing code to using coding agents. I wrote this blog on some tricks I learned to work successfully with agents: all-hands.dev/blog/vibe-codi…

How can we vibe code while still maintaining code quality?

Over the past year, I've shifted 95% of my development from manually writing code to using coding agents.

I wrote this blog on some tricks I learned to work successfully with agents: all-hands.dev/blog/vibe-codi…