Wes Nishio (@hnishio0105) Twitter Tweets • TwiCopy

Wes Nishio

a month ago

Customer told us they wanted more integration tests from our agent. Checked the PRs. Every single one was pure mocks, even when their repo had test DB helpers sitting right there. The prompt said "use test infrastructure if available" but the model ignored it every time. Why? One

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

25 days ago

Our Lambda kept dying mid-Jest run. MongoMemoryServer plus a big import chain ate all 3 GB. Bumped to 4 GB and capped Node.js heap so it throws a catchable error instead of silently killing the whole process. The real lesson: if your subprocess can eat unbounded memory, cap it at

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

25 days ago

I use Claude $200 plan but I wanna try codex, then I should do both at $100?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

23 days ago

Spent a day splitting one giant function into six small ones. The function worked fine. But every time I touched it, I had to re-read 300 lines to find the part I needed. Now each piece is obvious. Boring refactor, faster future changes.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

23 days ago

Was reviewing a new function this morning and noticed the tests covered one happy path for a function with three independent input dimensions. The existing quality gate passed it. Added a category that grades whether tests enumerate the full matrix. Ran it on a contrived

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

23 days ago

Kept writing "add a logger before that return" in reviews. Moved it into a pre-commit script and a Claude Code PostToolUse hook that blocks the edit right when it happens, with the line numbers in the block reason. Same rule, one place, no more nagging.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

22 days ago

When two webhooks hit the same PR branch close together, the second agent would happily keep editing on state the first push had already moved past. Added a typed race signal: the commit function flags it, every file-edit tool propagates it, the agent loop breaks the turn, the

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

22 days ago

The Gemini free-tier 429 takes a different shape than GitHub's 429, which takes a different shape than Anthropic's 429, and I had a per-SDK handler for exactly zero of them. Wrote one extractor that dispatches on error type, hooked it into the existing retry loop, honored

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

22 days ago

Gemini's free-tier has two flavors of overload: 429 with a "retry in 60s" hint, and 499 CANCELLED with no hint at all. Same Sentry cluster, same repo, 1 hour apart. Four lines plus two tests to make 499 take the transient-retry path we already had. Overload windows self-heal now.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

21 days ago

Spent the morning on a classifier that confidently called a code bug "infra" because the app logged an AccessDenied warning during test setup. Three empty-commit retries later, nothing fixed. Strip the console blocks before scanning and the same log classifies correctly. One

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

20 days ago

OK, now i confirmed Claude Opus 4.7 and OpenAI GPT-5.4 are both tunnel view and single minded. me: read this review comment and fix it llm: updated the test file me: run it llm: it failed me: the PR passes, so you broke it llm: you’re right that I should not change it just

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

20 days ago

Actually it's been a while to chat with gpt... but his nature / character hasn't chnaged much. He swears he will do X or he will not do Y again. But I say, then do it now, right here. Instead, Claude will do it, but he won't listen to me though...

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

18 days ago

Building agent products on a fixed per-task price means your LLM cost is variable but your revenue is fixed. Hard tasks regularly cost more than they earn. The model provider doesn't compensate, so I covered the overruns out of pocket. Then I tried capping spend and halting when

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

17 days ago

They know pains well youtube.com/shorts/OfIeOI8…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

17 days ago

If your agent is looping on the same tool call, check what your agent loop is pruning. A task on a customer repo had Claude Opus 4.7 call the same tool with the same args 17 times in a row. About half the per-task budget gone on duplicate work. The tell in the logs: every one

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

17 days ago

The agent edited a customer's `.circleci/config.yml` to fix a bug that was inside our own AWS Lambda. Mongo's in-memory binary needed an OpenSSL library our Lambda OS didn't ship. Validation crashed with "library missing". The agent read that, concluded the customer's CircleCI

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

15 days ago

Every AI agent provider has the same problem: premium models cost too much to offer a real free tier. 3 free PRs at $8 each is a demo, not a trial. Google Gemma 4 31B changed the math. Near-zero API cost, thin routing layer, same agent loop. $2/PR, 12 free PRs instead of 3. A

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

15 days ago

100% line coverage can still mean garbage tests. A test that calls a function and never asserts the return value hits every line but proves nothing. We built a 44-check quality evaluator that runs after coverage passes. 9 categories: integration, business logic, adversarial,

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

15 days ago

Your Prompt Is Not Enough. Add a Gate. gitauto.ai/blog/prompts-n…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wes Nishio

@hnishio0105

11 days ago

The original quality-bar problem was developer discipline. Some run lint and type-check before pushing, some don't, the team's bar drifts to the slowest path. Pre-commit fixed it: gates in code, not in habits. AI agents reopened the same problem at a different layer. GitAuto

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare