Rohit Malhotra (@rohit_malh5) Twitter Tweets • TwiCopy

Rohit Malhotra

@rohit_malh5

+ Follow

Openhands Maintainer | Ex-CTO @sitewizai | NLP @ CMU | Primarily interested in Agents | Secondary interests in creative design

ID: 1020348326217052162

linkhttp://malhotra5.github.io calendar_today20-07-2018 16:42:52

142 Tweet

92 Takipçi

67 Takip Edilen

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Some users click with code agents. Others struggle. Why? Agents are flexible and creative - just like their users! It's confusing! Agents should understand, educate, and adapt to users. Even personalize. If the agent isn’t willing to grow, the user likely won’t either.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Rohit Malhotra

@rohit_malh5

5 months ago

PSA for engineering leadership exploring software agent solutions 🚨 This post nails the difference between agentic and agentless approaches — and why it actually matters for real software tasks, beyond SWE-Bench scores!

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Graham Neubig

@gneubig

5 months ago

What will software development look like in 2026? With coding agents rapidly improving, dev roles may look quite different. My current workflow has changed a lot: - Work in github, not IDEs - Agents in parallel - Write English, not code - More code review Thoughts + a video👇

thumb_up_off_alt119

chat_bubble_outline3

repeat16

shareShare

Mistral AI

@mistralai

5 months ago

Introducing Devstral Small and Medium 2507! This latest update offers improved performance and cost efficiency, perfectly suited for coding agents and software engineering tasks.

thumb_up_off_alt2,2K

chat_bubble_outline87

repeat326

shareShare

Rohit Malhotra

@rohit_malh5

4 months ago

OpenHands is so general-purpose that I now think of leveraging it with workflow-driven prompting. Also stating constraints works well for me. Examples: • Examine the existing architecture, read docs for Y, plan how to implement X, then do it → Instead of: "Implement feature

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Qwen

@alibaba_qwen

4 months ago

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves

thumb_up_off_alt8,8K

chat_bubble_outline264

repeat1,1K

shareShare

Robert Brennan

@rbren_dev

4 months ago

Nothing more frustrating than seeing "private scaffold" on public benchmark results I love that model providers like Qwen and Mistral are now reporting their results specifically using OpenHands as the scaffold--feels like we're becoming a standard here x.com/Alibaba_Qwen/s…

thumb_up_off_alt94

chat_bubble_outline2

repeat7

shareShare

All Hands AI

@allhands_ai

3 months ago

We built OpenHands in the open (~60K ⭐️ on GitHub). Now we’re giving back to the OSS ecosystem. Announcing the OpenHands Cloud OSS Credit Program → $100–$500 credits for maintainers. 👉 Learn how to apply!

thumb_up_off_alt77

chat_bubble_outline1

repeat7

shareShare

All Hands AI

@allhands_ai

3 months ago

Having appropriate tests makes a world of difference for agent-driven development. If your agent can write a test to localize a bug or exercise a new feature, the following implementation is much more solid. OpenHands+GPT-5 is now 🥇 on the SWT-Bench testing leaderboard!

thumb_up_off_alt102

chat_bubble_outline6

repeat18

shareShare

Jiseung Hong

@jiseungh99

3 months ago

Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: github.com/apps/openhands… Powered by OpenHands

thumb_up_off_alt79

chat_bubble_outline4

repeat12

shareShare

Graham Neubig

@gneubig

3 months ago

Which LM is better at agentic coding? We have a bunch of useful academic benchmarks like SWE-Bench, but we don't have a good comparison of agentic coding LMs *in the wild*. To solve this, we released PR Arena: github.com/neulab/pr-arena

thumb_up_off_alt122

chat_bubble_outline7

repeat20

shareShare

Robert Brennan

@rbren_dev

3 months ago

I'll be speaking about automating large-scale refactors with OpenHands at AI Engineer Paris! It's amazing how much software agents can get done if you orchestrate them thoughtfully.

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Valerie Chen

@valeriechen_

2 months ago

A recent study by Becker et al. finds AI copilots like Cursor slowed expert OSS devs by 19%. But what happens when we compare copilots to more autonomous coding agents? Our study finds the opposite story: agents can boost productivity. 🧵

thumb_up_off_alt28

chat_bubble_outline1

repeat5

shareShare

Jiseung Hong

@jiseungh99

2 months ago

We are excited to launch the ⚔️PR Arena⚔️ leaderboard! Full results will be revealed after a certain milestone of community votes. Fix your GitHub issues for free and vote for better fix! 👉Leaderboard & Setup Guide: prarena.web.app

thumb_up_off_alt22

chat_bubble_outline1

repeat8

shareShare

Rohit Malhotra

@rohit_malh5

2 months ago

SWE-Agents are crushing benchmarks like SWE-Bench but are still fragile in the wild. I argue A/B testing is the missing piece for evaluating and improving SWE-Agents. Proof in Production: Evaluating Effectiveness of SWE Agents with A/B Tests open.substack.com/pub/rohitmalh/…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare