Ruben Hassid(@RubenHssd) 's Twitter Profileg
Ruben Hassid

@RubenHssd

Daily LLMs benchmarks & prompt engineering. Founder at https://t.co/n6tTy5Q7uX (bootstraped)

ID:1269541526

linkhttp://rubenhassid.ck.page/xprofile calendar_today15-03-2013 11:51:18

11,5K Tweets

12,1K Followers

361 Following

Ruben Hassid(@RubenHssd) 's Twitter Profile Photo

new gpt-4-april-24 vs. old gpt-4

test #1 → logical test.
test #2 → math problem-solving.
test #3 → write a poem in ABAB.

test #1 → logical test:

I asked both models to solve this:

'I have 2 apples, then I buy 2 more.

I bake a pie with 2 of the apples.

After eating…

account_circle
Ruben Hassid(@RubenHssd) 's Twitter Profile Photo

new gpt-4-april-24 vs. old gpt-4

test #1 → logical test.
test #2 → math problem-solving.
test #3 → write a poem in ABAB.

test #1 → logical test:

I asked both models to solve this:

'I have 2 apples, then I buy 2 more.

I bake a pie with 2 of the apples.

After eating…

account_circle
Ruben Hassid(@RubenHssd) 's Twitter Profile Photo

gemini 1.5 is out.

google claims it can 'watch & understand videos up to 1M token'.

I don't trust google:
→ so I made my own benchmark.

↓ Spoiler alert, gemini cannot do it:

gemini 1.5 is out. google claims it can 'watch & understand videos up to 1M token'. I don't trust google: → so I made my own benchmark. ↓ Spoiler alert, gemini cannot do it:
account_circle
Ruben Hassid(@RubenHssd) 's Twitter Profile Photo

google search feels like an endless ad.

So I ran 3 tests comparing google, chatgpt, perplexity & grok.

#1: 'how to write a founder agreement'

google search feels like an endless ad. So I ran 3 tests comparing google, chatgpt, perplexity & grok. #1: 'how to write a founder agreement'
account_circle