Sumuk (@sumukx) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Sumuk

@sumukx

2 months ago

super excited to see yourbench converge to be the default generative benchmarking / synthetic data creation solution for llms 💛

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

To make sure your AI agent is not bullshitting you, you need to evaluate its reasoning... but to do so automatically, you need an LLM... 🤔so how do you evaluate the trace evaluator? With TRAIL, which contains: - a full taxonomy of agent errors and most frequent failure cases,

thumb_up_off_alt55

chat_bubble_outline3

repeat8

shareShare

clem 🤗

@clementdelangue

a month ago

Excited to expand our collaboration with Microsoft Azure, thanks for the shoutout Satya Nadella!

Excited to expand our collaboration with <a href="/Azure/">Microsoft Azure</a>, thanks for the shoutout <a href="/satyanadella/">Satya Nadella</a>!

thumb_up_off_alt416

chat_bubble_outline11

repeat29

shareShare

Alina Lozovskaya

@ailozovskaya

a month ago

How well do LLMs really know about Hugging Face? 🤔 I used Yourbench to create a custom eval set across the HF docs to test 10 models Next question: what were the hardest questions for the models? Drop your guesses in the comments ⬇️

thumb_up_off_alt26

chat_bubble_outline1

repeat4

shareShare

Sagnik Mukherjee

@saagnikkk

a month ago

🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models” From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮 And this isn’t a one-off. The pattern holds across RL algorithms and models. 🧵A Deep Dive

thumb_up_off_alt844

chat_bubble_outline17

repeat125

shareShare

Tim Soret

@timsoret

a month ago

2 years apart. Again, don't look at the finger when it is pointing at the moon. Many mocked the early results, but it was already profound to witness a machine clumsily hallucinate from its learning. To me, it felt like mocking a child's drawing.

thumb_up_off_alt2,2K

chat_bubble_outline70

repeat437

shareShare

Sumuk

@sumukx

a month ago

truly beautiful work!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Ars Technica

@arstechnica

a month ago

Hugging Face hopes to bring a humanoid robot to market for just $3,000 arstechnica.com/ai/2025/05/hug…

thumb_up_off_alt109

chat_bubble_outline9

repeat23

shareShare

Ryan Marten

@ryanmart3n

23 days ago

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

thumb_up_off_alt880

chat_bubble_outline27

repeat181

shareShare

Sumuk

@sumukx

21 days ago

🙏 pls read if relevant.

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

meowbooks

@untitled01ipynb

21 days ago

i have to subtweet this one for legal reasons but i guess the 300 users that see my memes would understand the context anyhow and will not reveal it in the replies

thumb_up_off_alt306

chat_bubble_outline7

repeat13

shareShare

Sumuk

@sumukx

21 days ago

we really need regulation for this, Aidan McLaughlin has yet to answer for what they did to the original O1-Pro model

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Lisan al Gaib

@scaling01

20 days ago

A few more observations after replicating the Tower of Hanoi game with their exact prompts: - You need AT LEAST 2^N - 1 moves and the output format requires 10 tokens per move + some constant stuff. - Furthermore the output limit for Sonnet 3.7 is 128k, DeepSeek R1 64K, and

thumb_up_off_alt1,1K

chat_bubble_outline84

repeat254

shareShare

Sumuk

@sumukx

16 days ago

simp 4 satoshi swyx the yc scaling law of “revenue scales exponentially with batch size”

thumb_up_off_alt48

chat_bubble_outline2

repeat3

shareShare

Sumuk

@sumukx

15 days ago

there's something deeply fascinating about deep motor skills that we're failing to replicate properly with machines, but everything else seems to come relatively easily

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Sumuk

@sumukx

8 days ago

remarkable

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Sumuk

@sumukx

7 days ago

if we have human training data, human emotions are necessary for RL few understand this now, but i suspect we’ll see lots of papers intensifying emotion vectors for improved performance

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Sumuk

Gate.io

Sumuk

Clémentine Fourrier 🍊

clem 🤗

Alina Lozovskaya

Sagnik Mukherjee

Tim Soret

Sumuk

Ars Technica

Ryan Marten

Sumuk

meowbooks

Sumuk

Lisan al Gaib

Sumuk

Sumuk

Sumuk

Sumuk