Michael Qizhe Shieh (@mpulsewidth) 's Twitter Profile
Michael Qizhe Shieh

@mpulsewidth

Faculty in AI @NUSingapore.

ID: 977929946

linkhttps://www.michaelshieh.com/ calendar_today29-11-2012 08:54:28

28 Tweet

674 Takipçi

214 Takip Edilen

Eval Sys (@evalsysorg) 's Twitter Profile Photo

MCPMark Leaderboard Update 🚀 🌟 Qwen-3-Coder takes the #1 spot among open-source models, with an impressive per-run cost of just $36.46. ⚡️ Grok-Code-Fast-1 delivers the lowest per-run cost ($16.08) and the fastest average agent time (156.63s) across the top 10 models.

MCPMark Leaderboard Update 🚀

🌟 Qwen-3-Coder takes the #1 spot among open-source models, with an impressive per-run cost of just $36.46.

⚡️ Grok-Code-Fast-1 delivers the lowest per-run cost ($16.08) and the fastest average agent time (156.63s) across the top 10 models.
Binyuan Hui (@huybery) 's Twitter Profile Photo

Super excited to see all the new MCP benchmarks coming out lately! MCPMark looks amazing, it is both comprehensive and professional, and I am sure it will catch on fast. Thrilled that Qwen3-Coder is already SOTA among open-source models on MCPMark, and we are just getting

Michael Qizhe Shieh (@mpulsewidth) 's Twitter Profile Photo

One thing I appreciate about the US experience is highway driving at night. Went back and forth between Pittsburgh and (NYC, Ithaca, DC). Also enjoyed driving at the Grand Canyon area and in Alaska. There are just you, the winding road and cars flashing by. No lights. Can’t

Michael Qizhe Shieh (@mpulsewidth) 's Twitter Profile Photo

I have a little bit of jet lag visiting friends in the US. So I wake up in middle of the night a lot and am able to remember some of my dreams after going back to sleep. It turned out I have been dreaming about doing exams and preparing for exams a lot. Like it doesn't make

Jiawei Wang (@jarvismsustc) 's Twitter Profile Photo

Excited to share our latest paper: "Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents" 🤖🔬 Credit assignment with sparse rewards is a huge challenge in long-horizon tasks. We identify & solve a fundamental issue in policy gradients: the

Excited to share our latest paper: "Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents" 🤖🔬

Credit assignment with sparse rewards is a huge challenge in long-horizon tasks. We identify & solve a fundamental issue in policy gradients: the
Michael Qizhe Shieh (@mpulsewidth) 's Twitter Profile Photo

It's kinda fun to bring my American friend to Chinese restaurants. Today we have a chopstick challenge: using chopsticks for noodles. I've also had burgers straight in a row for many days. I must say In-N-Out burgers with lettuce wraps are fun and tasty.

Michael Qizhe Shieh (@mpulsewidth) 's Twitter Profile Photo

I only learned yesterday that I am not supposed to wave to waiters in a restaurant and ask them to take an order in the US. It is considered not culturally aware or rude? I was like “I do that all the time! 😂” I was so surprised but his girlfriend confirmed this too. My

Michael Qizhe Shieh (@mpulsewidth) 's Twitter Profile Photo

It's interesting that we don't have a great tool for improving the writing for non-native speakers. GPT-5 is kind of critical and always tries to suggest alternatives when my sentence is perfectly correct. Gemini always tries to say nice things and wouldn't give me corrections.

Michael Qizhe Shieh (@mpulsewidth) 's Twitter Profile Photo

It’s kinda crazy that the population of the whole San Jose area is half Asian and half Hispanic. Of course there are the tech bros, but also In-n-out staff and airport staff etc.

Michael Qizhe Shieh (@mpulsewidth) 's Twitter Profile Photo

It’s remarkable how AIs and humans both exist in their own constrained environments. For AI, the biggest constraints are compute and data. Given different compute and data, the optimal architecture and algorithm are completely different. The evolvement of compute and data is the

Eval Sys (@evalsysorg) 's Twitter Profile Photo

Congrats on the launch of Strata! Thrilled that Klavis AI (YC X25) chose MCPMark. 🚀 MCPMark now benchmarks not only model agentic performance, but also MCP Services and frameworks. Can’t wait to see what the community builds next — and always open to partnership!