Niðal نضال (@imleslahdin) 's Twitter Profile
Niðal نضال

@imleslahdin

What's the Kolmogorov Complexity of Small Language Models?
DM me papers on LLMs, AI/ML, Data Science. Doing a PhD. Not so serious. Nidhal Selmi.

ID: 3227845866

linkhttp://nselmi.com calendar_today27-05-2015 05:28:13

19,19K Tweet

1,1K Followers

1,1K Following

Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today DatologyAI shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today <a href="/datologyai/">DatologyAI</a> shares BeyondWeb, our synthetic data approach &amp; all the learnings from scaling it to trillions of tokens🧑🏼‍🍳
- 3B LLMs beat 8B models🚀
- Pareto frontier for performance
Vicarious Surgical (@vicarioussurg) 's Twitter Profile Photo

Our instrument arms enable superhuman joint flexibility that gives the surgeon the ability to work straight forward, left, right, straight up, straight down, and even flip all the way around to work facing back at the incision site. #medtech #surgicalrobotics #surgery

Ai2 (@allen_ai) 's Twitter Profile Photo

We’re releasing early pre-training checkpoints for OLMo-2-1B to help study how LLM capabilities emerge. They’re fine-grained snapshots intended for analysis, reproduction, and comparison. 🧵

We’re releasing early pre-training checkpoints for OLMo-2-1B to help study how LLM capabilities emerge. They’re fine-grained snapshots intended for analysis, reproduction, and comparison. 🧵
Sebastien Bubeck (@sebastienbubeck) 's Twitter Profile Photo

Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.

Claim: gpt-5-pro can prove new interesting mathematics.

Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct.

Details below.
Vercel (@vercel) 's Twitter Profile Photo

You can just leave instructions for AI agents inside your app. <𝚜𝚌𝚛𝚒𝚙𝚝 𝚝𝚢𝚙𝚎="𝚝𝚎𝚡𝚝/𝚕𝚕𝚖𝚜.𝚝𝚡𝚝"> A proposal for inline instructions in HTML, based on llms.txt. vercel.com/blog/a-proposa…

DailyPapers (@huggingpapers) 's Twitter Profile Photo

ByteDance introduces FutureX: The world's first live benchmark for real future prediction! Evaluate LLM agents on politics, economy, culture & sports with real-time, contamination-free data.

ByteDance introduces FutureX:

The world's first live benchmark for real future prediction!

Evaluate LLM agents on politics, economy, culture &amp; sports with real-time, contamination-free data.
ARC Prize (@arcprize) 's Twitter Profile Photo

NeurIPS 2025 - Google Code Golf Championship Based on ARC-AGI, create the shortest program that transforms input -> output $100,000 in prizes

NeurIPS 2025 - Google Code Golf Championship

Based on ARC-AGI, create the shortest program that transforms input -&gt; output

$100,000 in prizes
Jiawei Zhao (@jiawzhao) 's Twitter Profile Photo

💡 The secret? LLMs already know when they're uncertain - we just weren't listening! Previous methods use confidence/entropy AFTER full generation for test-time and RL. We're different - we capture reasoning errors DURING generation. DeepConf monitors "local confidence" in

💡 The secret? LLMs already know when they're uncertain - we just weren't listening!
Previous methods use confidence/entropy AFTER full generation for test-time and RL. We're different - we capture reasoning errors DURING generation.
DeepConf monitors "local confidence" in
murat 🍥 (@mayfer) 's Twitter Profile Photo

imo the best output format for realtime world models (wrt robotics use case) is *actual* 3d (i.e. gaussian splats or similar), not just emergent 3d from 2d frames. the reason is, the geometric invariants are a requirement for coherent multi-frame 3d, object permanence, amodal

DailyPapers (@huggingpapers) 's Twitter Profile Photo

xAI just released Grok 2 on Hugging Face. This massive 500GB model, a core part of xAI's 2024 work, is now openly available to push the boundaries of AI research. huggingface.co/xai-org/grok-2

Niðal نضال (@imleslahdin) 's Twitter Profile Photo

I want to buy the exact opposite machine. One that does everything else this machine doesn't do, so that i can (quickly and while having fun) do the things this machine is trying to do.

Niðal نضال (@imleslahdin) 's Twitter Profile Photo

We're at the cusp of what could be a multi-decades innovation cycle but the signs are pointing towards more years of engineering tweaks to the same architecture. The momentum around transformers will be very hard to break. (also i hate to say it but IBM doing SSMs is bearish

jack morris (@jxmnop) 's Twitter Profile Photo

first i thought scaling laws originated in OpenAI (2020) then i thought they came from Baidu (2017) now i am enlightened: Scaling Laws were first explored at Bell Labs (1993)

first i thought scaling laws originated in OpenAI (2020)

then i thought they came from Baidu (2017)

now i am enlightened:
Scaling Laws were first explored at Bell Labs (1993)