Abhinav Maurya (@ahmaurya) Twitter Tweets • TwiCopy

Hamel Husain

3 months ago

TOC for the open book "Beyond Naive RAG: Practical Advanced Methods" from our RAG series. This condenses 5 hours of instruction into something you can read in ~30 minutes. Link: maven.com/p/945082/beyon… Ben Clavié Nandan Thakur Orion Weller Antoine Chaffin Bryan Bischof fka Dr. Donut

thumb_up_off_alt584

chat_bubble_outline8

repeat73

shareShare

Rohan Paul

@rohanpaul_ai

3 months ago

This survey maps the LLM benchmark landscape and shows where current tests fall short. 283 benchmarks, 3 categories: general, domain specific, target specific. The paper flags 3 traps that skew results. Training data leaks inflate scores. Culture and language bias tilt

thumb_up_off_alt290

chat_bubble_outline6

repeat68

shareShare

Georgi Gerganov

@ggerganov

3 months ago

To run gpt-oss-20b on a 16GB Mac use these commands: brew install llama.cpp llama-server -hf ggml-org/gpt-oss-20b-GGUF --n-cpu-moe 12 -fa -c 32768 --jinja --no-mmap Then open the browser at http://127.0.0.1:8080

thumb_up_off_alt933

chat_bubble_outline26

repeat114

shareShare

Ahmad

@theahmadosman

3 months ago

GLM 4.5 with Claude Code is the closest thing to Opus 4 imo For Agentic coding tools GLM 4.5 > KIMI K2 > QWEN 3 235B NON-THINKING > Qwen 3 CODER 480B

thumb_up_off_alt260

chat_bubble_outline31

repeat18

shareShare

Niels Rogge

@nielsrogge

3 months ago

GLM-4.5 is beating Claude-4 Opus on the Berkeley Function Calling benchmark while costing 70x less

thumb_up_off_alt780

chat_bubble_outline28

repeat71

shareShare

Raphaël Dabadie🇫🇷

@raphaeldabadie

3 months ago

🐺 Introducing the Werewolf Benchmark, an AI test for social reasoning under pressure. Can models lead, bluff, and resist manipulation in live, adversarial play? 👉 We made 7 of the strongest LLMs, both open-source and closed-source, play 210 full games of Werewolf. Below is

thumb_up_off_alt770

chat_bubble_outline76

repeat157

shareShare

Ahmad

@theahmadosman

3 months ago

Comparing & Contrasting Recent LLMs Architecture > DeepSeek-V3/R1 > OLMo 2 > Gemma 3 > Mistral Small 3.1 > Llama 4 > Qwen3 (dense+MoE) > SmolLM3 > Kimi 2 > GPT-OSS Are 2025 LLMs really that different from each other? MoE, MLA, GQA, sliding window, normalization games & more.

thumb_up_off_alt304

chat_bubble_outline8

repeat54

shareShare

Ivan Fioravanti ᯅ

@ivanfioravanti

3 months ago

gpt-oss-120b-mxfp4-Q8 MMLU Pro is here! After ~33 hours on M3 Ultra 512GB 🤯 Overall: 68.1% All default settings in this first run, so reasoning level should be low. I will now benchmark gpt-oss-120b-mxfp4-bf16 (Attention layers not quantized) and medium reasoning for Q8.

thumb_up_off_alt175

chat_bubble_outline4

repeat11

shareShare

Simone Scardapane

@s_scardapane

3 months ago

*Alice's book got a (minor) upgrade!* Thanks to the dozens of people who gave feedback, now with 1000% less typos and errors, a novel set of Colab lab sessions, and a brand-new CC-BY-SA license. 🙃 sscardapane.it/alice-book/

thumb_up_off_alt1,1K

chat_bubble_outline8

repeat196

shareShare

SemiAnalysis

@semianalysis_

3 months ago

Throughout each successive generation of NVIDIA Tensor Cores, NVIDIA continues to add lower precision data types, starting from 16-bit to 4-bits. This is because deep learning workloads are extremely tolerant of low precision. This is especially true for inference, where even

thumb_up_off_alt354

chat_bubble_outline12

repeat37

shareShare

Zhihu Frontier

@zhihufrontier

3 months ago

🚀 Introducing slime v0.1.0 — An open-source RL infra powering models like GLM-4.5, built by THUDM & Zhipu AI. Z.ai RL infra 朱小霖 shared a deep dive on Zhihu into how they redefined high-performance RL infra👇 🛠️ What's new in v0.1.0? • High-performance inference for

🚀 Introducing slime v0.1.0 — An open-source RL infra powering models like GLM-4.5, built by THUDM & Zhipu AI.
<a href="/Zai_org/">Z.ai</a> RL infra 朱小霖 shared a deep dive on Zhihu into how they redefined high-performance RL infra👇

🛠️ What's new in v0.1.0?
• High-performance inference for

thumb_up_off_alt275

chat_bubble_outline3

repeat42

shareShare

Kyle Corbitt

@corbtt

3 months ago

🚨 We’ve just published a recipe to train a frontier-level deep research agent using RL. With just 30 hours on an H200, any developer can now beat Sonnet-4 on DeepResearch Bench using open-source tools. (Thread 🧵)

thumb_up_off_alt1,1K

chat_bubble_outline38

repeat175

shareShare

Lucas Beyer (bl16)

@giffmana

3 months ago

When an article about hpc/optimization is called "anatomy of" then you know the authors know their shit. For those who don't know, this is basically HPC's "attention is all you need" but less overcooked:

thumb_up_off_alt998

chat_bubble_outline24

repeat65

shareShare

The Metropolitan Museum of Art

@metmuseum

2 months ago

On September 7, 1917, American modernist Jacob Lawrence was born. Lawrence is celebrated for his visual interpretations of Black life and history, which he often explored in series. Today, Lawrence is considered one of the leading innovators of modernism in the United States,

thumb_up_off_alt382

chat_bubble_outline3

repeat66

shareShare

Tim Dettmers

@tim_dettmers

2 months ago

It feels the coding agent frontier is now open-weights: GLM 4.5 costs only $3/month and is on par with Sonnet Kimi K2.1 Turbo is 3x speed, 7x cheaper vs Opus 4.1, but as good Kimi K2.1 feels clean. The best model for me. GPT-5 is only good for complicated specs -- too slow.

thumb_up_off_alt1,1K

chat_bubble_outline66

repeat91

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 months ago

🚨 Leaderboard Disrupted! Two new models have entered the Top 10 Text leaderboard: 🔸#6 Qwen3-max-preview (Proprietary) by Qwen 🔸#8 Kimi-K2-0905-preview (Modified MIT) by Kimi.ai tied with 7 others. Note that this puts Kimi-K2-0905-preview in a tight race for

🚨 Leaderboard Disrupted!
Two new models have entered the Top 10 Text leaderboard:

🔸#6 Qwen3-max-preview (Proprietary) by <a href="/Alibaba_Qwen/">Qwen</a>
🔸#8 Kimi-K2-0905-preview (Modified MIT) by <a href="/Kimi_Moonshot/">Kimi.ai</a> tied with 7 others.

Note that this puts Kimi-K2-0905-preview in a tight race for

thumb_up_off_alt479

chat_bubble_outline13

repeat55

shareShare

bolt.new

@boltdotnew

2 months ago

Claude Code & OpenAI Codex are coming to Bolt. Build enterprise-grade products visually right in your browser. No setup. No CLI tools. No 💔 error loops. Which agent are you most excited for?

thumb_up_off_alt1,1K

chat_bubble_outline145

repeat134

shareShare

𝗘𝗻𝗷𝗼𝘆 𝗖𝗹𝗮𝘀𝘀𝗶𝗰𝗮𝗹 𝗠𝘂𝘀𝗶𝗰 🎼💝

@hoang_hq

2 months ago

Describe her singing voice in one word.🤔

thumb_up_off_alt1,1K

chat_bubble_outline88

repeat284

shareShare

Greg Brockman

@gdb

2 months ago

We've released a large-scale study on how people are using ChatGPT. Consumer adoption has broadened beyond early-user groups, and lots of economic value is being created through both personal and professional use: openai.com/index/how-peop…

thumb_up_off_alt2,2K

chat_bubble_outline161

repeat398

shareShare

Ethan Mollick

@emollick

2 months ago

Some useful findings: 1) Working with AI boosts the performance of people solving math, science & ethics questions 2) The biggest boost is for the hardest problems 3) High performers remain highest performing, but low performers gain more 4) People who are good with AI gain most

thumb_up_off_alt511

chat_bubble_outline15

repeat77

shareShare