Muhammad Hammad Khan (@hammad_khan23) Twitter Tweets • TwiCopy

Teknium (e/λ)

4 months ago

Looks like OpenAI's been using Nous' YaRN and kaiokendev's rope scaling for context length extension all along - of course never any credit but... Anyone who says "open source just steals from their 'real' research and rides on their shoulders" is completely wrong I called it

thumb_up_off_alt1,1K

chat_bubble_outline45

repeat105

shareShare

Marina Simakov

@simakov_marina

4 months ago

Connect your powerful AI agent to an MCP server. Enable auto-run. What could possibly go wrong? 😈 Turns out, when using Cursor with a Jira MCP, any local secret - API keys, AWS creds, SSH keys - is up for grabs. labs.zenity.io/p/when-a-jira-…

thumb_up_off_alt40

chat_bubble_outline1

repeat12

shareShare

Samuel Albanie 🇬🇧

@samuelalbanie

4 months ago

We just shipped Gemini 2.5 Deep Think it doesn't just recall research papers - it fuses ideas across papers in ways I haven't seen before this level of capability demands careful evaluation model card below 👇

thumb_up_off_alt1,1K

chat_bubble_outline40

repeat154

shareShare

Anish Athalye

@anishathalye

4 months ago

Missing Semester has grown past 100K subscribers on YouTube. Appreciate all the engagement and support! We plan to teach another iteration of the course in January 2026, revising the curriculum and covering new topics like AI IDEs and vibe coding.

thumb_up_off_alt703

chat_bubble_outline9

repeat44

shareShare

Eitan Turok

@eitanturok

4 months ago

I annotated the tinygrad flash attention kernel to make sure I understand it. automatically generating this GENERICALLY is pretty cool!

thumb_up_off_alt906

chat_bubble_outline10

repeat79

shareShare

Cloudflare

@cloudflare

4 months ago

Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites. cfl.re/4l7RV9b

thumb_up_off_alt6,6K

chat_bubble_outline230

repeat827

shareShare

Qwen

@alibaba_qwen

4 months ago

🚀 Meet Qwen-Image — a 20B MMDiT model for next-gen text-to-image generation. Especially strong at creating stunning graphic posters with native text. Now open-source. 🔍 Key Highlights: 🔹 SOTA text rendering — rivals GPT-4o in English, best-in-class for Chinese 🔹 In-pixel

thumb_up_off_alt3,3K

chat_bubble_outline189

repeat670

shareShare

Muhammad Hammad Khan

@hammad_khan23

4 months ago

😂🤣🤣

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Logan Kilpatrick

@officiallogank

4 months ago

Introducing Genie 3, the most advanced world simulator ever created, enabled by numerous research breakthroughs. 🤯 Featuring high fidelity visuals, 20-24 fps, prompting on the go, world memory, and more.

thumb_up_off_alt8,8K

chat_bubble_outline527

repeat1,1K

shareShare

Anthropic

@anthropicai

4 months ago

Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning.

thumb_up_off_alt8,8K

chat_bubble_outline438

repeat1,1K

shareShare

OpenAI

@openai

4 months ago

Our open models are here. Both of them. openai.com/open-models

thumb_up_off_alt18,18K

chat_bubble_outline1,1K

repeat3,3K

shareShare

Sebastian Raschka

@rasbt

4 months ago

Next to Qwen3 of comparable size: Looks like gpt-oss is a wide (vs deep) model

thumb_up_off_alt1,1K

chat_bubble_outline29

repeat255

shareShare

Jason Lee

@jasondeanlee

4 months ago

Answer: model is complete junk, it's a hallucination machine. Overfit to reasoning benchmarks and has absolutely zero recall ability

thumb_up_off_alt3,3K

chat_bubble_outline87

repeat163

shareShare

Tim Dettmers

@tim_dettmers

4 months ago

It seems the closed-source vs open-weights landscape has been leveled. GPT-5 is just 10% better at coding than an open-weight model you can run on a consumer desktop and soon laptop. If Anthropic cannot come up with a good model, then we will probably not see AGI for a while.

thumb_up_off_alt233

chat_bubble_outline14

repeat29

shareShare

Jason Weston

@jaseweston

4 months ago

...is today a good day for new paper posts? 🤖Learning to Reason for Factuality 🤖 📝: arxiv.org/abs/2508.05618 - New reward func for GRPO training of long CoTs for *factuality* - Design stops reward hacking by favoring precision, detail AND quality - Improves base model across

thumb_up_off_alt359

chat_bubble_outline1

repeat44

shareShare

Qwen

@alibaba_qwen

4 months ago

🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens! 🔧 Powered by: • Dual Chunk Attention (DCA) – A length extrapolation method that splits long sequences into manageable chunks while preserving global coherence. •

thumb_up_off_alt4,4K

chat_bubble_outline132

repeat509

shareShare

Lili

@lchen915

4 months ago

Self-Questioning Language Models: LLMs that learn to generate their own questions and answers via asymmetric self-play RL. There is no external training data – the only input is a single prompt specifying the topic.

thumb_up_off_alt769

chat_bubble_outline16

repeat130

shareShare

jack morris

@jxmnop

4 months ago

curious about the training data of OpenAI's new gpt-oss models? i was too. so i generated 10M examples from gpt-oss-20b, ran some analysis, and the results were... pretty bizarre time for a deep dive 🧵

thumb_up_off_alt3,3K

chat_bubble_outline90

repeat291

shareShare

Sayash Kapoor

@sayashk

4 months ago

How does GPT-5 compare against Claude Opus 4.1 on agentic tasks? Since their release, we have been evaluating these models on challenging science, web, service, and code tasks. Headline result: While cost-effective, so far GPT-5 never tops agentic leaderboards. More evals 🧵

thumb_up_off_alt322

chat_bubble_outline23

repeat51

shareShare

Sayash Kapoor

@sayashk

4 months ago

1) CORE-Bench (scientific reproducibility) gives agents two hours to reproduce the results from a scientific paper, given access to its code and data. Opus 4.1 is the first model to break the 50% barrier on CORE-Bench. GPT-5 is far behind — even behind Sonnet 3.7 and GPT-4.1.

thumb_up_off_alt32

chat_bubble_outline1

repeat4

shareShare