Oscar Le (@oscarle_x) Twitter Tweets • TwiCopy

Oscar Le

a month ago

Sadly gpt-oos-120B maybe even worse than gpt-4o-mini in term of knowledge (which according to rumour has only 40B). Here is my favorite question, and 4o-mini answer correctly without even thinking. All models 70B+ always answer correctly this question.

thumb_up_off_alt6

chat_bubble_outline3

repeat0

shareShare

Oscar Le

@oscarle_x

a month ago

Now LLM models are in incremental improvement phase. So this is a good time to build LLM wrappers. You don't have to worry your startup will become a bullet point in the slides of the next Google I/O

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Oscar Le

@oscarle_x

25 days ago

Coding with LLMs may lead to the rise of multi-tasking brain. Before we value single-tasking because it is superior in quality. But now LLMs do the heavy lifting, if someone can oversight 10 coding sessions at the same time, they may well be 5x more efficient then others.

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

25 days ago

I kind of like GPT-5-thinking. Because I don't see it different much from o3 at all. Just like o3 with a different name. The only difference is when thinking, it only shows a headline, not a paragraph of thinking anymore. So sometimes I thought it was freezing.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Oscar Le

@oscarle_x

25 days ago

I guess OpenAI can't improve the performance of GPT-5 much, so they focused on reducing hallucination to have good things to report. And that trend will keep continue. There is a lot of room in reducing hallucination that AI Labs haven't explored.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Oscar Le

@oscarle_x

24 days ago

I'm curious if big AI companies play the espionage game like cold war countries? Like spy when they release new models, how good the benchmark are before announcement...

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

24 days ago

When receiving a gig, you can most of the time say yes and figure out a solution within a week by assembling some components already available. That requires reading a lot and trying things out a lot, but normally can be done within timeframe. The solution will be sloppy but

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

24 days ago

With the flop of GPT-5, Zuck is checking the return policy

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

23 days ago

People think VR is the future, but VR never give us the true feeling of immersion in a new word. What truly immerses us into a new world is in dream. Better develop Dream machine than VR.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

22 days ago

When I was in school, I learned all you need to do is sit in the front row. Professors will notice and remember you and you will get a decent grade. I mostly sit at the end of the class though. But every time if I sit in the front, I got good grade.

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

21 days ago

Why are you mulling that much brother? With these kinds of 10 mins wait, running 5 Claude Code sessions in parallel is the only way to go.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Oscar Le

@oscarle_x

20 days ago

With Cursor I'm aware of every change my code, but with Claude Code I find it keep adding code until I loss control. Maybe 1 reason js the diff review of Cursor is easier to follow. And another reason is the terminal output in claude code is hid under Ctrl+R

thumb_up_off_alt6

chat_bubble_outline5

repeat0

shareShare

Oscar Le

@oscarle_x

19 days ago

My current setup: Vscode + Claude Code + Continue_dev - I plan design, architecture, implementation decisions... CC is bad at this. I break down into steps and tell CC to note down to md files. - CC implement each step and tests. When everything is clear, CC can implement 1000s

thumb_up_off_alt14

chat_bubble_outline1

repeat2

shareShare

Oscar Le

@oscarle_x

17 days ago

Before LLM, novels and high quality journals actually use em-dash a lot. I don't think normal people use them much because hard to type. Maybe that is the reason LLMs think that em-dash is the hallmark of high quality writing and churn out em-dash whenever it find a chance.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

17 days ago

We start researching on "body shaping". We have been avoiding working on this for a long time because this feature could create some kind of unrealistic expectation. And Chinese apps are already very good at it. But surveying our users say otherwise. So here we are.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Oscar Le

@oscarle_x

17 days ago

Hmm

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Oscar Le

@oscarle_x

16 days ago

Hmm Grok Ani always talks and moves like a grandma telling ghost stories to spook kids.

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

16 days ago

The more I spend time with Claude Code, the more I feel like it is a mildly toxic relationship. Claude keep hiding away errors from me and when I find out, it just say sorry you are right I shouldn't do that. I don't want to micromanage though, that is even more toxic

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

15 days ago

We are updating our onboarding. Now we use animated video to guide users to try out the right tool for them. For mobile apps usually D1 retention is <30%, onboarding is one of the most important thing to A/B test frequently. Good things may come out if you try enough.

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Oscar Le

@oscarle_x

15 days ago

I'm quite sure because either (1) They are too optimistic and let LLMs do things it is not good at (2) They too pessimistic and let LLMs do things that provide too little value to see. Failing the first time is okay, they need to try a lot until they find a sweet spot.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare