AIneojk (@aineojk) 's Twitter Profile
AIneojk

@aineojk

RL (ML . CV)

ID: 1649323480104730624

calendar_today21-04-2023 08:05:42

2,2K Tweet

255 Followers

7,7K Following

Alexander Doria (@dorialexander) 's Twitter Profile Photo

Really interesting experiment scaling generalist LLM search agents to 150k websites with synthetic tasks (like "find a font suitable for a children book") and synthetic evaluation of search patterns. Likely how OpenAI DeepResearch was made.

Really interesting experiment scaling generalist LLM search agents to 150k websites with synthetic tasks (like "find a font suitable for a children book") and synthetic evaluation of search patterns. Likely how OpenAI DeepResearch was made.
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding "In this paper, we study building a Perception Language Model (PLM) in a fully open and reproducible framework for transparent research in image and video understanding. We analyze standard training

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

"In this paper, we study building a Perception Language Model (PLM) in a  fully open and reproducible framework for transparent research in image  and video understanding. We analyze standard training
ℏεsam (@hesamation) 's Twitter Profile Photo

the best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs, here's some of their key findings:

the best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs,

here's some of their key findings:
Costa Huang (@vwxyzjn) 's Twitter Profile Photo

Fun GRPO puzzle: Can you guess why adding a format reward on top of the verification reward makes the sequence length fluctuate more? Hint: it's related to leloy!'s blog post: leloykun.github.io/ponder/grpo-fl… Answer in the 🧵

Fun GRPO puzzle: Can you guess why adding a format reward on top of the verification reward makes the sequence length fluctuate more?

Hint: it's related to <a href="/leloykun/">leloy!</a>'s blog post: leloykun.github.io/ponder/grpo-fl…

Answer in the 🧵
Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨

That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing.

You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time.

🧵 How?
Hongyu Wang (@realhongyu_wang) 's Twitter Profile Photo

Thrilled to introduce BitNet v2, native 4-bit activations for 1-bit LLMs🚀🚀 With 1.58-bit weights and 4-bit activations, we have already pushed the limits of NVIDIA GPUs🔥🔥 Hope to see more hardware advancements to bridge the gap of TensorCore between binary and 4-bit compute

Thrilled to introduce BitNet v2, native 4-bit activations for 1-bit LLMs🚀🚀

With 1.58-bit weights and 4-bit activations, we have already pushed the limits of NVIDIA GPUs🔥🔥

Hope to see more hardware advancements to bridge the gap of TensorCore between binary and 4-bit compute
λux (@novasarc01) 's Twitter Profile Photo

Just published my blog site along with a new blog "Go with the Flow" - I've been diving deep into flow-based models over the past few months, and this is the first part where I break down how they work internally. I have covered topics like Normalizing Flows, Flow Matching,

Just published my blog site along with a new blog "Go with the Flow" - I've been diving deep into flow-based models over the past few months, and this is the first part where I break down how they work internally. I have covered topics like Normalizing Flows, Flow Matching,
eigenron (@eigenron) 's Twitter Profile Photo

writing a blog on Physics-Informed Neural Networks (PINNs). meanwhile, here’s a simple example of a normal NN vs a PINN fitting to a synthetic dataset (with noise) for a projectile motion. > simple NN overfits heavily to outliers > PINN averts this by the help of a modified

zed (@zmkzmkz) 's Twitter Profile Photo

sorry for the late update. I bring disappointing news. softpick does NOT scale to larger models. overall training loss and benchmark results are worse than softmax on our 1.8B parameter models. we have updated the preprint on arxiv: arxiv.org/abs/2504.20966

Bryce Adelstein Lelbach (@blelbach) 's Twitter Profile Photo

Learn how to GPU-accelerate your code in modern CUDA C++ without writing everything from scratch! During #ISC2025, I'll be giving a talk at the Hamburg C++ User Group on 2025-06-11. It's open to the public. meetup.com/cppusergroupha…

Learn how to GPU-accelerate your code in modern CUDA C++ without writing everything from scratch!

During #ISC2025, I'll be giving a talk at the Hamburg C++ User Group on 2025-06-11. It's open to the public.

meetup.com/cppusergroupha…
Andi Marafioti (@andimarafioti) 's Twitter Profile Photo

📢 A new open-source OCR model is breaking the internet: Nanonets-OCR-s! Nanonets understands context and semantic structures, transforming documents into clean, structured markdown. It has an Apache 2.0 license, and the authors compare it to Mistral-OCR 🧵 Let's look closer:

📢 A new open-source OCR model is breaking the internet: Nanonets-OCR-s!

Nanonets understands context and semantic structures, transforming documents into clean, structured markdown.
It has an Apache 2.0 license, and the authors compare it to Mistral-OCR

🧵 Let's look closer:
Randall Balestriero (@randall_balestr) 's Twitter Profile Photo

Who got time to wait for delayed generalization (grokking)? We introduce GrokAlign, a provable solution to speed up the alignment between your model and your training data resulting in faster convergence + visual probing of your DN! Ofc it uses splines :) arxiv.org/abs/2506.12284

Who got time to wait for delayed generalization (grokking)? We introduce GrokAlign, a provable solution to speed up the alignment between your model and your training data resulting in faster convergence + visual probing of your DN! Ofc it uses splines :)
arxiv.org/abs/2506.12284
alphaXiv (@askalphaxiv) 's Twitter Profile Photo

Introducing your arXiv Research Agent A personal research assistant with access to arXiv + bioRxiv + medRxiv + Semantic Scholar. Upload drafts, conduct literature reviews, get insights across millions of papers MCP support coming soon 🚀

Mathurin Massias (@mathusmassias) 's Twitter Profile Photo

New paper on the generalization of Flow Matching arxiv.org/abs/2506.03719 🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn **can only generate training points**? with Quentin Bertrand, Anne Gagneux & Rémi Emonet 👇👇👇

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets "Byte Pair Encoding (BPE) and similar schemes split text once, build a static vocabulary, and leave the model stuck with that choice. We relax this rigidity by introducing an autoregressive U-Net that learns to

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

"Byte Pair Encoding (BPE) and similar schemes split text once, build a  static vocabulary, and leave the model stuck with that choice. We relax  this rigidity by introducing an autoregressive U-Net that learns to
Lerrel Pinto (@lerrelpinto) 's Twitter Profile Photo

We have developed a new tactile sensor, called e-Flesh, with a simple working principle: measure deformations in 3D printable microstructures. Now all you need to make tactile sensors is a 3D printer, magnets, and magnetometers! 🧵