echo.hive (@hive_echo) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

New NVIDIA paper makes models think before predicting, training this behavior during pretraining for stronger reasoning. The novelty is that it makes base models practice reasoning during pretraining, not just after. The reward needs no verifier and appears at every token, so

New <a href="/nvidia/">NVIDIA</a> paper makes models think before predicting, training this behavior during pretraining for stronger reasoning.

The novelty is that it makes base models practice reasoning during pretraining, not just after.

The reward needs no verifier and appears at every token, so

thumb_up_off_alt181

chat_bubble_outline12

repeat33

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

A beautiful paper from MIT+Harvard+ Google DeepMind 👏 Explains why Transformers miss multi digit multiplication and shows a simple bias that fixes it. The researchers trained two small Transformer models on 4-digit-by-4-digit multiplication. One used a special training method

A beautiful paper from MIT+Harvard+ <a href="/GoogleDeepMind/">Google DeepMind</a> 👏

Explains why Transformers miss multi digit multiplication and shows a simple bias that fixes it.

The researchers trained two small Transformer models on 4-digit-by-4-digit multiplication.

One used a special training method

thumb_up_off_alt1,1K

chat_bubble_outline28

repeat173

shareShare

Bartosz Naskręcki

@nasqret

2 months ago

GPT-5-Pro solved, in just 15 minutes (without any internet search), the presentation problem known as “Yu Tsumura’s 554th Problem.” arxiv.org/pdf/2508.03685 This is the first model to solve this task completely. I expect more such results soon — the model demonstrates a strong

thumb_up_off_alt871

chat_bubble_outline54

repeat98

shareShare

echo.hive

@hive_echo

2 months ago

I started using gpt-5 “non thinking” more and more when studying. Non thinking is quite pleasant when explaining things unlike the thinking version Grok 4 fast is still amazing but gpt-5 non thinking is way faster and also executes code faster and does better visualizations it

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

echo.hive

@hive_echo

2 months ago

There is a new stealth model in Cursor called “cheetah” 😯 Not free tho. Will test it now

thumb_up_off_alt6

chat_bubble_outline2

repeat0

shareShare

echo.hive

@hive_echo

2 months ago

This new cheetah model is very fast and very good too! and it is $10 per mil output tokens so it cant exactly be a super small model right? inference time active param optimizations must have reached wild levels no?

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

echo.hive

@hive_echo

2 months ago

Really loving these fast coding models. Totally puts you in flow But then you run into their limits very quickly If these models get 10x better and even faster, coding then will be an absolutely incredible experience!

thumb_up_off_alt3

chat_bubble_outline3

repeat0

shareShare

echo.hive

@hive_echo

2 months ago

Which is better coding model all things considered ( speed, quality etc ) not just quality Please give only if you have tried all and having considered a cumulative point of view not just quality

thumb_up_off_alt6

chat_bubble_outline4

repeat0

shareShare

François Chollet

@fchollet

2 months ago

You can teach a Transformer to execute a simple algorithm if you provide the exact step by step algorithm during training via CoT tokens. This is interesting, but the point of machine learning should be to *find* the algorithm during training, from input/output pairs only -- not

thumb_up_off_alt1,1K

chat_bubble_outline72

repeat237

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

🫡 GPT-5-Pro just solved, the Math problem that no other LLM could solve. Took 14 minutes without any internet search. An Oxford and Cambridge paper claimed that no LLM could solve ‘Yu Tsumura’s 554th Problem’.” OpenAI's GPT‑5 Pro produced a full proof in about ~14 minutes.

thumb_up_off_alt1,1K

chat_bubble_outline71

repeat168

shareShare

echo.hive

@hive_echo

2 months ago

New adventure begins! 🥳

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

echo.hive

@hive_echo

2 months ago

I am fairly certain that Math takes that sharp turn into the land of deep abstractions starting with first-order linear differential equations. This was the first time that I had to rewatch an entire chapter. Hey Grok would I you agree?

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

Gal Zajc

@zajcgal

2 months ago

wow! quite impressive. a small channel uploading some insane builds for 15 years already youtube.com/watch?v=ubq5yV…

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Haider.

@slow_developer

2 months ago

now this is big... GPT-5-based agentic frameworks have reached 70% on the OSWorld benchmark — a real-computer, cross-OS environment for multimodal agents and that score is close to the 72% human mark. this can only suggest that human-level computer use is now within reach

thumb_up_off_alt134

chat_bubble_outline11

repeat15

shareShare

echo.hive

@hive_echo

2 months ago

Yesterday was the first time I felt I had forgotten everything I learned the day before 🤔 I assume this is related to the level of jump in abstraction? And my mind not being used to it? Or that I have overloaded my brain with 6-7 hours of math everyday non stop 😆 But it all

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

New NVIDIA paper shows how to make text to image models render high resolution images far faster without losing quality. 53x faster 4K on H100, 3.5 seconds on a 5090 with quantization for 138x total speedup. It speeds up by moving generation into a smaller hidden image space.

New <a href="/nvidia/">NVIDIA</a> paper shows how to make text to image models render high resolution images far faster without losing quality.

53x faster 4K on H100, 3.5 seconds on a 5090 with quantization for 138x total speedup.

It speeds up by moving generation into a smaller hidden image space.

thumb_up_off_alt18

chat_bubble_outline2

repeat2

shareShare

Sebastien Bubeck

@sebastienbubeck

2 months ago

I guess it's now every day until the end of time

thumb_up_off_alt210

chat_bubble_outline7

repeat10

shareShare

echo.hive

@hive_echo

2 months ago

How many moments have we had with AI already? -chatgpt moment -4o moment -realtime voice moment -sora moment -deepseek moment -o1 moment -image-gen-1 moment -veo3 moment -nanobanana moment -sora moment again … tomorrow we will probably have another moment maybe

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare