Ali Fartoot (@ali_fartout) 's Twitter Profile
Ali Fartoot

@ali_fartout

Machine Learning Engineer

ID: 1473192360016236548

calendar_today21-12-2021 07:23:35

101 Tweet

43 Followers

451 Following

Jason Wei (@_jasonwei) 's Twitter Profile Photo

Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s

alphaXiv (@askalphaxiv) 's Twitter Profile Photo

"Deep Researcher with Test-Time Diffusion" This paper treats report writing as an iterative retrieval‑augmented diffusion process that can be enhanced by component‑wise self‑evolution. This demonstrates SoTA on multi‑hop search‑and‑reasoning benchmarks.

"Deep Researcher with Test-Time Diffusion"

This paper treats report writing as an iterative retrieval‑augmented diffusion process that can be enhanced by component‑wise self‑evolution. 

This demonstrates SoTA on multi‑hop search‑and‑reasoning benchmarks.
Leonie (@helloiamleonie) 's Twitter Profile Photo

Apple just released Embedding Atlas: An open-source visualization tool for your embeddings. I just gave it a quick spin with some data stored in my vector database. These are my first impressions: - Nice exploration UX with hover and tool tip for single data points - Shows you

Apple just released Embedding Atlas:
An open-source visualization tool for your embeddings.

I just gave it a quick spin with some data stored in my vector database.

These are my first impressions:
- Nice exploration UX with hover and tool tip for single data points
- Shows you
Wen-Tse Chen (@wenzechen2) 's Twitter Profile Photo

[0/3] 🚀 Introducing Verlog – an open-source RL framework built specifically for training long-horizon, multi-turn LLM agents. 📊 Max episode length comparison: •VeRL / RAGEN → ~10 turns •verl-agent → ~50 turns •Verlog (ours) → 400+ turns 🔥 ⚙️ Technical foundation:

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

BRILLIANT Google DeepMind research. Even the best embeddings cannot represent all possible query-document combinations, which means some answers are mathematically impossible to recover. Reveals a sharp truth, embedding models can only capture so many pairings, and beyond that,

BRILLIANT <a href="/GoogleDeepMind/">Google DeepMind</a> research.

Even the best embeddings cannot represent all possible query-document combinations, which means some answers are mathematically impossible to recover.

Reveals a sharp truth, embedding models can only capture so many pairings, and beyond that,
Ilia (@ilialarchenko) 's Twitter Profile Photo

Let’s talk about VLAs in robotics 🤖 (Vision-Language-Action models) A relatively new type of robotics policies that bring the power of LLMs into the real world. If you’ve seen robots folding laundry, washing dishes, or cleaning rooms – chances are they used something VLA-like.

Let’s talk about VLAs in robotics 🤖
(Vision-Language-Action models)

A relatively new type of robotics policies that bring the power of LLMs into the real world.

If you’ve seen robots folding laundry, washing dishes, or cleaning rooms – chances are they used something VLA-like.
TuringPost (@theturingpost) 's Twitter Profile Photo

RLAD (Reinforcement Learning with Abstraction and Deduction) trains models via RL using a 2-player setup: ▪️ An abstraction generator – proposes short, natural-language “reasoning hints” (abstractions) summarizing key facts and strategies. ▪️ A solution generator – uses them to

RLAD (Reinforcement Learning with Abstraction and Deduction) trains models via RL using a 2-player setup:

▪️ An abstraction generator – proposes short, natural-language “reasoning hints” (abstractions) summarizing key facts and strategies.
▪️ A solution generator – uses them to
Qwen (@alibaba_qwen) 's Twitter Profile Photo

🚀 Exciting updates in Qwen Code v0.0.12–v0.0.14! ✨ What’s new? • Plan Mode: AI proposes a full implementation plan—you approve before a single line changes. • Vision Intelligence: Auto-switch to vision models (Qwen3-VL-Plus with 256K input / 32K output!) when images

alphaXiv (@askalphaxiv) 's Twitter Profile Photo

Your Base Model is Smarter Than You Think This paper proposes a way to beat the lack of generation diversity in RL without RL! By using Markov Chain Monte Carlo’s ‘power sampling’ that reuses a base LLM’s own probabilities, it’s able to beat GRPO without training & verifiers

Your Base Model is Smarter Than You Think

This paper proposes a way to beat the lack of generation diversity in RL without RL!

By using Markov Chain Monte Carlo’s ‘power sampling’ that reuses a base LLM’s own probabilities, it’s able to beat GRPO without training &amp; verifiers
Weiwei Sun (@sunweiwei12) 's Twitter Profile Photo

AI agents are supposed to collaborate with us to solve real-world problems, but can they really? Even the most advanced models can still give us frustrating moments when working with them deeply. We argue that real-world deployment requires more than productivity (e.g., task

AI agents are supposed to collaborate with us to solve real-world problems, but can they really? Even the most advanced models can still give us frustrating moments when working with them deeply.

We argue that real-world deployment requires more than productivity (e.g., task
🔥 Matt Dancho (Business Science) 🔥 (@mdancho84) 's Twitter Profile Photo

🔥 GPT-6 may not just be smarter. It literally might be alive (in the computational sense). A new research paper, SEAL: Self-Adapting Language Models (arXiv:2506.10943), describes how an AI can continuously learn after deployment, evolving its own internal representations

🔥 GPT-6 may not just be smarter. 

It literally might be alive (in the computational sense).

A new research paper, SEAL: Self-Adapting Language Models (arXiv:2506.10943), describes how an AI can continuously learn after deployment, evolving its own internal representations
Ali Behrouz (@behrouz_ali) 's Twitter Profile Photo

We keep scaling model parameters by increasing width and stacking more layers, but what if the truly missing axes for continual learning are compression and stacking the learning process? Excited to share the full version of Nested Learning, a new paradigm for continual learning

We keep scaling model parameters by increasing width and stacking more layers, but what if the truly missing axes for continual learning are compression and stacking the learning process?

Excited to share the full version of Nested Learning, a new paradigm for continual learning
Ahmad (@theahmadosman) 's Twitter Profile Photo

Hugging Face has released a 214-page MASTERCLASS on how to train LLMs > it’s called The Smol Training Playbook > and if want to learn how to train LLMs, > this GIFT is for you > this training bible walks you through the ENTIRE pipeline > covers every concept that matters from

Hugging Face has released a 214-page
MASTERCLASS on how to train LLMs

&gt; it’s called The Smol Training Playbook
&gt; and if want to learn how to train LLMs,
&gt; this GIFT is for you

&gt; this training bible walks you through the ENTIRE pipeline
&gt; covers every concept that matters from
Mo Lotfollahi (@mo_lotfollahi) 's Twitter Profile Photo

Mixture-of-Experts (MoE) is a powerful way to scale large language models (LLMs): instead of running the full model for every token, a router activates only a few “experts,” giving more capacity at roughly the same compute. But routing is still a sore spot. Most MoE systems use

Mixture-of-Experts (MoE) is a powerful way to scale large language models (LLMs): instead of running the full model for every token, a router activates only a few “experts,” giving more capacity at roughly the same compute. 

But routing is still a sore spot. Most MoE systems use