Łukasz Stafiniak (@lukstafi) 's Twitter Profile
Łukasz Stafiniak

@lukstafi

Currently I'm an independent wannabe researcher working on Machine Learning and AI in OCaml.

ID: 956252431

linkhttps://github.com/sponsors/lukstafi calendar_today18-11-2012 21:15:56

671 Tweet

230 Followers

1,1K Following

机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

🚨 41 years in the making — and Dijkstra is no longer unbeatable. A Tsinghua, Stanford, and MPI for Informatics team has achieved the first deterministic algorithm to break the O(m + n log n) bound for single-source shortest paths in directed graphs with real non-negative

🚨 41 years in the making — and Dijkstra is no longer unbeatable.

A Tsinghua, Stanford, and MPI for Informatics team has achieved the first deterministic algorithm to break the O(m + n log n) bound for single-source shortest paths in directed graphs with real non-negative
Shai Shalev-Shwartz (@shai_s_shwartz) 's Twitter Profile Photo

Are frontier AI models really capable of “PhD-level” reasoning? To answer this question, we introduce FormulaOne, a new reasoning benchmark of expert-level Dynamic Programming problems. We have curated a benchmark consisting of three tiers, in increasing complexity, which we call

Are frontier AI models really capable of “PhD-level” reasoning? To answer this question, we introduce FormulaOne, a new reasoning benchmark of expert-level Dynamic Programming problems. We have curated a benchmark consisting of three tiers, in increasing complexity, which we call
Łukasz Stafiniak (@lukstafi) 's Twitter Profile Photo

There is psychological well-being benefits to pair programming with frontier LLMs, that would not be offered by perfect (meaning, not making mistakes) but partial (meaning, not AI-complete) GOFAI systems.

Wes (@wmorrill3) 's Twitter Profile Photo

Ok hear me out - the sun keeps sending it's unwanted radiation to America. This is unacceptable, we need to put a tariff on those solar rays. Let's build a wall that will absorb that unwanted light and turn it into electricity. But we need to make the wall horizontal and put one

Jay (@jayendra_ram) 's Twitter Profile Photo

Since everyone is talking about RL Environments and GRPO now but no one knows how it works we thought it would be cool to make an explainer video + code you can run: This is an example of using GRPO to train Qwen 2.5 to play 2048 (code in thread) 🧵:

Kevin Patrick Murphy (@sirbayes) 's Twitter Profile Photo

I just finished reading this interesting book by Druv Pai, Sam Buchanan and colleagues. It's fairly "heavy" but provides a very satisfying theoretical explanation for many different empirical approaches currently used in "generative AI", such as denoising diffusion models,

Łukasz Stafiniak (@lukstafi) 's Twitter Profile Photo

I'm currently having a Claude Code phase. I stopped using Cursor agents, but Cursor tab complete is by itself totally worth the price.

Sebastian Raschka (@rasbt) 's Twitter Profile Photo

Updated & turned my Big LLM Architecture Comparison article into a narrated video lecture. The 11 LLM architectures covered in this video: 1. DeepSeek V3/R1 2. OLMo 2 3. Gemma 3 4. Mistral Small 3.1 5. Llama 4 6. Qwen3 7. SmolLM3 8. Kimi 2 9. GPT-OSS 10. Grok 2.5 11. GLM-4.5

Łukasz Stafiniak (@lukstafi) 's Twitter Profile Photo

My old university page is dead :-| Truth be told it was an attack vector, insecure wiki content management setup. But a bit sad to lose it.

机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

Huawei proposed Tree-OPO! They explore how MCTS trajectories can fuel Group Relative Policy Optimization (GRPO), enabling preference-based RL without value networks. By staging training with partially revealed rollouts, they create tree-structured reward signals that better

Huawei proposed Tree-OPO!

They explore how MCTS trajectories can fuel Group Relative Policy Optimization (GRPO), enabling preference-based RL without value networks.

By staging training with partially revealed rollouts, they create tree-structured reward signals that better