AIneojk (@aineojk) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Really interesting experiment scaling generalist LLM search agents to 150k websites with synthetic tasks (like "find a font suitable for a children book") and synthetic evaluation of search patterns. Likely how OpenAI DeepResearch was made.

thumb_up_off_alt361

chat_bubble_outline5

repeat34

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

4 months ago

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding "In this paper, we study building a Perception Language Model (PLM) in a fully open and reproducible framework for transparent research in image and video understanding. We analyze standard training

thumb_up_off_alt168

chat_bubble_outline3

repeat27

shareShare

ℏεsam

@hesamation

4 months ago

the best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs, here's some of their key findings:

thumb_up_off_alt1,1K

chat_bubble_outline4

repeat150

shareShare

Costa Huang

@vwxyzjn

3 months ago

Fun GRPO puzzle: Can you guess why adding a format reward on top of the verification reward makes the sequence length fluctuate more? Hint: it's related to leloy!'s blog post: leloykun.github.io/ponder/grpo-fl… Answer in the 🧵

thumb_up_off_alt367

chat_bubble_outline4

repeat39

shareShare

Saurabh Kumar

@drummatick

3 months ago

Absolutely gold article. Changed the way I see Layer Norm

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat142

shareShare

Aadit Sheth

@aaditsh

3 months ago

MIT literally packed 60 minutes with everything you need to know about LLMs

thumb_up_off_alt2,2K

chat_bubble_outline14

repeat282

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

3 months ago

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

thumb_up_off_alt823

chat_bubble_outline12

repeat141

shareShare

Hongyu Wang

@realhongyu_wang

3 months ago

Thrilled to introduce BitNet v2, native 4-bit activations for 1-bit LLMs🚀🚀 With 1.58-bit weights and 4-bit activations, we have already pushed the limits of NVIDIA GPUs🔥🔥 Hope to see more hardware advancements to bridge the gap of TensorCore between binary and 4-bit compute

thumb_up_off_alt406

chat_bubble_outline7

repeat87

shareShare

λux

@novasarc01

3 months ago

Just published my blog site along with a new blog "Go with the Flow" - I've been diving deep into flow-based models over the past few months, and this is the first part where I break down how they work internally. I have covered topics like Normalizing Flows, Flow Matching,

thumb_up_off_alt776

chat_bubble_outline6

repeat90

shareShare

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxestex

3 months ago

Qwen-3-MoE vs DeepSeek V2 (original) their designs are superficially similar – but different This will be a very interesting test of a few scaling laws

thumb_up_off_alt359

chat_bubble_outline5

repeat40

shareShare

eigenron

@eigenron

2 months ago

writing a blog on Physics-Informed Neural Networks (PINNs). meanwhile, here’s a simple example of a normal NN vs a PINN fitting to a synthetic dataset (with noise) for a projectile motion. > simple NN overfits heavily to outliers > PINN averts this by the help of a modified

thumb_up_off_alt796

chat_bubble_outline19

repeat62

shareShare

zed

@zmkzmkz

2 months ago

sorry for the late update. I bring disappointing news. softpick does NOT scale to larger models. overall training loss and benchmark results are worse than softmax on our 1.8B parameter models. we have updated the preprint on arxiv: arxiv.org/abs/2504.20966

thumb_up_off_alt1,1K

chat_bubble_outline51

repeat68

shareShare

Bryce Adelstein Lelbach

@blelbach

2 months ago

Learn how to GPU-accelerate your code in modern CUDA C++ without writing everything from scratch! During #ISC2025, I'll be giving a talk at the Hamburg C++ User Group on 2025-06-11. It's open to the public. meetup.com/cppusergroupha…

thumb_up_off_alt756

chat_bubble_outline2

repeat88

shareShare

Andi Marafioti

@andimarafioti

2 months ago

📢 A new open-source OCR model is breaking the internet: Nanonets-OCR-s! Nanonets understands context and semantic structures, transforming documents into clean, structured markdown. It has an Apache 2.0 license, and the authors compare it to Mistral-OCR 🧵 Let's look closer:

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat224

shareShare

Randall Balestriero

@randall_balestr

2 months ago

Who got time to wait for delayed generalization (grokking)? We introduce GrokAlign, a provable solution to speed up the alignment between your model and your training data resulting in faster convergence + visual probing of your DN! Ofc it uses splines :) arxiv.org/abs/2506.12284

thumb_up_off_alt140

chat_bubble_outline1

repeat19

shareShare

alphaXiv

@askalphaxiv

2 months ago

Introducing your arXiv Research Agent A personal research assistant with access to arXiv + bioRxiv + medRxiv + Semantic Scholar. Upload drafts, conduct literature reviews, get insights across millions of papers MCP support coming soon 🚀

thumb_up_off_alt995

chat_bubble_outline16

repeat164

shareShare

Mathurin Massias

@mathusmassias

2 months ago

New paper on the generalization of Flow Matching arxiv.org/abs/2506.03719 🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn **can only generate training points**? with Quentin Bertrand, Anne Gagneux & Rémi Emonet 👇👇👇

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat202

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

2 months ago

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets "Byte Pair Encoding (BPE) and similar schemes split text once, build a static vocabulary, and leave the model stuck with that choice. We relax this rigidity by introducing an autoregressive U-Net that learns to

thumb_up_off_alt515

chat_bubble_outline11

repeat83

shareShare

Lerrel Pinto

@lerrelpinto

2 months ago

We have developed a new tactile sensor, called e-Flesh, with a simple working principle: measure deformations in 3D printable microstructures. Now all you need to make tactile sensors is a 3D printer, magnets, and magnetometers! 🧵

thumb_up_off_alt6,6K

chat_bubble_outline108

repeat673

shareShare

goku

@_sammeeer

2 months ago

Gotta re-read this masterpiece

thumb_up_off_alt3,3K

chat_bubble_outline39

repeat270

shareShare

AIneojk

Gate.io

Alexander Doria

Tanishq Mathew Abraham, Ph.D.

ℏεsam

Costa Huang

Saurabh Kumar

Aadit Sheth

Kunhao Zheng @ ICLR 2025

Hongyu Wang

λux

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

eigenron

zed

Bryce Adelstein Lelbach

Andi Marafioti

Randall Balestriero

alphaXiv

Mathurin Massias

Tanishq Mathew Abraham, Ph.D.

Lerrel Pinto

goku