Yifan (@yifanbth) Twitter Tweets • TwiCopy

Yifan

@yifanbth

+ Follow

tech founder cooking in AI -
dev deep dives: youtube.com/@YifanBTH -
ex: founding team @improbableio -
@cambridgeuni compsci

ID: 383951897

linkhttp://bento.me/yifan calendar_today02-10-2011 20:11:10

372 Tweet

1,1K Takipçi

340 Takip Edilen

Yifan

@yifanbth

2 months ago

hot take: $1000+ pro max plus LLM subscriptions are coming this year reasoning models like o3 already cost $17k per task on some benchmarks if you're running complex agents daily, even $1000/month starts looking reasonable vs per-token pricing providers will test this for sure

thumb_up_off_alt3

chat_bubble_outline3

repeat0

shareShare

Yifan

@yifanbth

2 months ago

our standards for sharing LLM progress is dropping rapidly 2022: peer-reviewed papers with months of review 2024: detailed company blog posts 2025: afternoon tweets from the CEO we went from rigorous science to "trust me bro" in just 3 years

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Yifan

@yifanbth

2 months ago

it's going to be a fun weekend!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Yifan

@yifanbth

2 months ago

productivity hack: set youtube to 2x speed counterintuitive but you'll be MORE focused, not less. the faster pace forces your brain to stay engaged instead of drifting off works best for educational content, tutorials, even podcasts. saves time + improves attention

thumb_up_off_alt9

chat_bubble_outline3

repeat1

shareShare

Yifan

@yifanbth

2 months ago

I really don't get why chatgpt doesn't have a setting to turn on search by default

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Yifan

@yifanbth

2 months ago

found a weird LLM quirk: if you have only one function defined with tool_choice="auto", both sonnet-4 and gemini-2.5-pro will spam call it regardless of need tried everything - explicit "don't use unless needed" instructions, better function descriptions, prompt optimisation.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Yifan

@yifanbth

2 months ago

been doing some deep dives into current browser offerings, just realised that Zen browser was mostly done by a single person, extremely impressive given the polish it's basically what Arc should have been - vertical tabs, open source

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Yifan

@yifanbth

2 months ago

been using o3 for most of my daily search tasks and it's become that perfect sweet spot between 4o's quick facts and deep research's verbosity interestingly, OpenAI o3 with search is quite a lot better than Perplexity with o3. it frequently goes 5-10 rounds of

thumb_up_off_alt10

chat_bubble_outline3

repeat0

shareShare

Yifan

@yifanbth

2 months ago

spent a day optimising my production support bot to add function call capabilities. tried o4-mini-high, sonnet-4, and gemini-2.5-pro with identical system prompts and tool definitions. completely unexpected results: • o4-mini-high: accurate tool calls, ok-ish responses •

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Yifan

@yifanbth

2 months ago

can't believe this is natively built into gmail now: one-click unsubscribe to all promo emails in one place

thumb_up_off_alt8

chat_bubble_outline2

repeat0

shareShare

Yifan

@yifanbth

2 months ago

grok's twitter search is that missing piece i didn't know i needed instead of manually scrolling through #AI twitter to gauge sentiment on new releases, just ask grok "how are people reacting to claude code's weekly limits" saves me 20+ mins of timeline archaeology daily

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Yifan

@yifanbth

a month ago

spent the whole weekend reverse engineering claude code, there's so much good stuff hidden in plain sight. Anthropic's effort in prompt engineering is insane big video coming up!

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Yifan

@yifanbth

a month ago

the price for gpt-oss-120b is extremely competitive if the real-world performance lives up to the benchmarks input: ~$0.20/mtok output: ~$0.65/mtok

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Yifan

@yifanbth

a month ago

finally, we can do an apples-to-apples comparison with Claude Code, now with GPT-5

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Yifan

@yifanbth

a month ago

kubernetes-related debugging via CLI coding agents is bliss caveat: you gotta review every command carefully to avoid accidental deletions

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Yifan

@yifanbth

a month ago

honestly, the years with ChatGPT turned me into that annoying person who asks "but how does that actually work?" about everything custom instructions with constructive feedback on every chat made me more reflective than years of attempted journaling and now there's zero excuse

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Yifan

@yifanbth

a month ago

being able to reference past chats is one of the best features that ChatGPT and Claude have ever added. despite coming out later, Claude's UX is superior, with more precise indications of which chats were referenced I suspect this will cause a slight spike in user retention,

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare