Wes Nishio (@hnishio0105) 's Twitter Profile
Wes Nishio

@hnishio0105

Founder & CEO at GitAuto (@gitautoai) | Founder University, 500 Global, Blitzscaling Ventures, and AlchemistX cohort member supported by JETRO

ID: 1341066019490578432

linkhttps://gitauto.ai calendar_today21-12-2020 17:01:00

1,1K Tweet

377 Followers

817 Following

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Customer told us they wanted more integration tests from our agent. Checked the PRs. Every single one was pure mocks, even when their repo had test DB helpers sitting right there. The prompt said "use test infrastructure if available" but the model ignored it every time. Why? One

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Our Lambda kept dying mid-Jest run. MongoMemoryServer plus a big import chain ate all 3 GB. Bumped to 4 GB and capped Node.js heap so it throws a catchable error instead of silently killing the whole process. The real lesson: if your subprocess can eat unbounded memory, cap it at

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Spent a day splitting one giant function into six small ones. The function worked fine. But every time I touched it, I had to re-read 300 lines to find the part I needed. Now each piece is obvious. Boring refactor, faster future changes.

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Was reviewing a new function this morning and noticed the tests covered one happy path for a function with three independent input dimensions. The existing quality gate passed it. Added a category that grades whether tests enumerate the full matrix. Ran it on a contrived

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Kept writing "add a logger before that return" in reviews. Moved it into a pre-commit script and a Claude Code PostToolUse hook that blocks the edit right when it happens, with the line numbers in the block reason. Same rule, one place, no more nagging.

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

When two webhooks hit the same PR branch close together, the second agent would happily keep editing on state the first push had already moved past. Added a typed race signal: the commit function flags it, every file-edit tool propagates it, the agent loop breaks the turn, the

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

The Gemini free-tier 429 takes a different shape than GitHub's 429, which takes a different shape than Anthropic's 429, and I had a per-SDK handler for exactly zero of them. Wrote one extractor that dispatches on error type, hooked it into the existing retry loop, honored

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Gemini's free-tier has two flavors of overload: 429 with a "retry in 60s" hint, and 499 CANCELLED with no hint at all. Same Sentry cluster, same repo, 1 hour apart. Four lines plus two tests to make 499 take the transient-retry path we already had. Overload windows self-heal now.

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Spent the morning on a classifier that confidently called a code bug "infra" because the app logged an AccessDenied warning during test setup. Three empty-commit retries later, nothing fixed. Strip the console blocks before scanning and the same log classifies correctly. One

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

OK, now i confirmed Claude Opus 4.7 and OpenAI GPT-5.4 are both tunnel view and single minded. me: read this review comment and fix it llm: updated the test file me: run it llm: it failed me: the PR passes, so you broke it llm: you’re right that I should not change it just

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Actually it's been a while to chat with gpt... but his nature / character hasn't chnaged much. He swears he will do X or he will not do Y again. But I say, then do it now, right here. Instead, Claude will do it, but he won't listen to me though...

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Building agent products on a fixed per-task price means your LLM cost is variable but your revenue is fixed. Hard tasks regularly cost more than they earn. The model provider doesn't compensate, so I covered the overruns out of pocket. Then I tried capping spend and halting when

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

If your agent is looping on the same tool call, check what your agent loop is pruning. A task on a customer repo had Claude Opus 4.7 call the same tool with the same args 17 times in a row. About half the per-task budget gone on duplicate work. The tell in the logs: every one

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

The agent edited a customer's `.circleci/config.yml` to fix a bug that was inside our own AWS Lambda. Mongo's in-memory binary needed an OpenSSL library our Lambda OS didn't ship. Validation crashed with "library missing". The agent read that, concluded the customer's CircleCI

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

Every AI agent provider has the same problem: premium models cost too much to offer a real free tier. 3 free PRs at $8 each is a demo, not a trial. Google Gemma 4 31B changed the math. Near-zero API cost, thin routing layer, same agent loop. $2/PR, 12 free PRs instead of 3. A

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

100% line coverage can still mean garbage tests. A test that calls a function and never asserts the return value hits every line but proves nothing. We built a 44-check quality evaluator that runs after coverage passes. 9 categories: integration, business logic, adversarial,

Wes Nishio (@hnishio0105) 's Twitter Profile Photo

The original quality-bar problem was developer discipline. Some run lint and type-check before pushing, some don't, the team's bar drifts to the slowest path. Pre-commit fixed it: gates in code, not in habits. AI agents reopened the same problem at a different layer. GitAuto