Jean de Nyandwi (@jeande_d) Twitter Tweets • TwiCopy

Jean de Nyandwi

@jeande_d

+ Follow

Visiting Researcher @LTIatCMU • Vision 🤍 Language, Multimodal Research • CMU

Research blog: deeprevision.github.io
ML: nyandwi.com/machine_learni…

ID: 836904555728289793

linkhttps://nyandwi.com calendar_today01-03-2017 11:42:49

5,5K Tweet

43,43K Followers

980 Following

Jean de Nyandwi

@jeande_d

6 months ago

"Copy to clipboard" but for LLMs, from any codebase. Wow, nice feat!!

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

I created a Python project starter repo for students that helps maintain good code quality while doing research projects: github.com/neubig/starter… I was opinionated and made only one choice for each tool, but there are other options too!

thumb_up_off_alt687

chat_bubble_outline18

repeat98

shareShare

Simran Khanuja

@simi_97k

5 months ago

We've talked a lot about evaluation in MT, but how do we evaluate related creative tasks like transcreation? 🔍💡 For eg. in Zootopia (2016), while the newscaster was a moose in the US, it was transcreated to a panda in China, a koala in Australia and a jaguar in Brazil. How

thumb_up_off_alt54

chat_bubble_outline2

repeat14

shareShare

Seungone Kim @ NAACL2025

@seungonekim

5 months ago

#NLProc New paper on "evaluation-time scaling", a new dimension to leverage test-time compute! We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens. arxiv.org/abs/2503.19877

thumb_up_off_alt171

chat_bubble_outline2

repeat37

shareShare

Graham Neubig

@gneubig

5 months ago

Today's a big day! Months of work went into both of these releases, so we hope people enjoy them. OpenHands is now a great coding agent that you can run entirely locally (w/ OpenHands LM), and a great coding agent that you can run anywhere (w/ OpenHands Cloud).

thumb_up_off_alt162

chat_bubble_outline2

repeat24

shareShare

Jean de Nyandwi

@jeande_d

5 months ago

People at Artificial Analysis are doing good job on comparing models across important axes: intelligence, speed, and price. And for all modalities and tasks: language, speech, video, image, code. Benchmarks are not perfect but there is no other way to know a model is better than

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Graham Neubig

@gneubig

5 months ago

A big two days of agents starting tomorrow at CMU (and then two days of agent hackathon after that!) Registration is still open so if you're in or around Pittsburgh come one come all: cmu-agent-workshop.github.io We also plan to livestream for participants who can't make it in person

thumb_up_off_alt188

chat_bubble_outline5

repeat40

shareShare

Yueqi Song

@yueqi_song

5 months ago

Humans can perform complex reasoning without relying on specific domain knowledge, but can multimodal models truly do that as well? Short answer: No. Even the best models perform below the 5th-percentile human on our VisualPuzzles tasks. 🚀 Introducing VisualPuzzles🧩: a new

thumb_up_off_alt197

chat_bubble_outline6

repeat34

shareShare

Zhiqiu Lin

@zhiqiulin

4 months ago

Fresh GPT‑o3 results on our vision‑centric #NaturalBench (NeurIPS’24) benchmark! 🎯 Its new visual chain‑of‑thought—by “zooming in” on details—cracks questions that still stump GPT‑4o. Yet vision reasoning isn’t solved: o3 can still hallucinate even after a full minute of

thumb_up_off_alt111

chat_bubble_outline3

repeat22

shareShare

Nando de Freitas

@nandodf

4 months ago

x.com/i/article/1915…

thumb_up_off_alt257

chat_bubble_outline9

repeat37

shareShare

Zhiqiu Lin

@zhiqiulin

4 months ago

📷 Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links

thumb_up_off_alt183

chat_bubble_outline10

repeat33

shareShare

Graham Neubig

@gneubig

4 months ago

How can we vibe code while still maintaining code quality? Over the past year, I've shifted 95% of my development from manually writing code to using coding agents. I wrote this blog on some tricks I learned to work successfully with agents: all-hands.dev/blog/vibe-codi…

thumb_up_off_alt176

chat_bubble_outline4

repeat36

shareShare