Muhammad Hammad Khan (@hammad_khan23) 's Twitter Profile
Muhammad Hammad Khan

@hammad_khan23

Views are my own. Search & RecSys | LLM

ID: 1609328383

linkhttps://www.linkedin.com/in/muhammad-hammad-khan-b84822142/ calendar_today21-07-2013 00:02:44

2,2K Tweet

828 Takipçi

5,5K Takip Edilen

Teknium (e/λ) (@teknium1) 's Twitter Profile Photo

Looks like OpenAI's been using Nous' YaRN and kaiokendev's rope scaling for context length extension all along - of course never any credit but... Anyone who says "open source just steals from their 'real' research and rides on their shoulders" is completely wrong I called it

Marina Simakov (@simakov_marina) 's Twitter Profile Photo

Connect your powerful AI agent to an MCP server. Enable auto-run. What could possibly go wrong? 😈 Turns out, when using Cursor with a Jira MCP, any local secret - API keys, AWS creds, SSH keys - is up for grabs. labs.zenity.io/p/when-a-jira-…

Samuel Albanie 🇬🇧 (@samuelalbanie) 's Twitter Profile Photo

We just shipped Gemini 2.5 Deep Think it doesn't just recall research papers - it fuses ideas across papers in ways I haven't seen before this level of capability demands careful evaluation model card below 👇

We just shipped Gemini 2.5 Deep Think

it doesn't just recall research papers - it fuses ideas across papers in ways I haven't seen before

this level of capability demands careful evaluation

model card below 👇
Anish Athalye (@anishathalye) 's Twitter Profile Photo

Missing Semester has grown past 100K subscribers on YouTube. Appreciate all the engagement and support! We plan to teach another iteration of the course in January 2026, revising the curriculum and covering new topics like AI IDEs and vibe coding.

Missing Semester has grown past 100K subscribers on YouTube. Appreciate all the engagement and support!

We plan to teach another iteration of the course in January 2026, revising the curriculum and covering new topics like AI IDEs and vibe coding.
Eitan Turok (@eitanturok) 's Twitter Profile Photo

I annotated the tinygrad flash attention kernel to make sure I understand it. automatically generating this GENERICALLY is pretty cool!

I annotated the tinygrad flash attention kernel to make sure I understand it. 

automatically generating this GENERICALLY is pretty cool!
Cloudflare (@cloudflare) 's Twitter Profile Photo

Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites. cfl.re/4l7RV9b

Qwen (@alibaba_qwen) 's Twitter Profile Photo

🚀 Meet Qwen-Image — a 20B MMDiT model for next-gen text-to-image generation. Especially strong at creating stunning graphic posters with native text. Now open-source. 🔍 Key Highlights: 🔹 SOTA text rendering — rivals GPT-4o in English, best-in-class for Chinese 🔹 In-pixel

🚀 Meet Qwen-Image — a 20B MMDiT model for next-gen text-to-image generation. Especially strong at creating stunning graphic posters with native text. Now open-source.

🔍 Key Highlights:
🔹 SOTA text rendering — rivals GPT-4o in English, best-in-class for Chinese
🔹 In-pixel
Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

Introducing Genie 3, the most advanced world simulator ever created, enabled by numerous research breakthroughs. 🤯 Featuring high fidelity visuals, 20-24 fps, prompting on the go, world memory, and more.

Jason Lee (@jasondeanlee) 's Twitter Profile Photo

Answer: model is complete junk, it's a hallucination machine. Overfit to reasoning benchmarks and has absolutely zero recall ability

Tim Dettmers (@tim_dettmers) 's Twitter Profile Photo

It seems the closed-source vs open-weights landscape has been leveled. GPT-5 is just 10% better at coding than an open-weight model you can run on a consumer desktop and soon laptop. If Anthropic cannot come up with a good model, then we will probably not see AGI for a while.

Jason Weston (@jaseweston) 's Twitter Profile Photo

...is today a good day for new paper posts? 🤖Learning to Reason for Factuality 🤖 📝: arxiv.org/abs/2508.05618 - New reward func for GRPO training of long CoTs for *factuality* - Design stops reward hacking by favoring precision, detail AND quality - Improves base model across

...is today a good day for new paper posts? 
🤖Learning to Reason for Factuality 🤖
📝: arxiv.org/abs/2508.05618
- New reward func for GRPO training of long CoTs for *factuality*
- Design stops reward hacking by favoring precision, detail AND quality
- Improves base model across
Qwen (@alibaba_qwen) 's Twitter Profile Photo

🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens! 🔧 Powered by: • Dual Chunk Attention (DCA) – A length extrapolation method that splits long sequences into manageable chunks while preserving global coherence. •

🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens!

🔧 Powered by:

• Dual Chunk Attention (DCA) –  A length extrapolation method that splits long sequences into manageable chunks while preserving global coherence.  

•
Lili (@lchen915) 's Twitter Profile Photo

Self-Questioning Language Models: LLMs that learn to generate their own questions and answers via asymmetric self-play RL. There is no external training data – the only input is a single prompt specifying the topic.

Self-Questioning Language Models: LLMs that learn to generate their own questions and answers via asymmetric self-play RL.

There is no external training data – the only input is a single prompt specifying the topic.
jack morris (@jxmnop) 's Twitter Profile Photo

curious about the training data of OpenAI's new gpt-oss models? i was too. so i generated 10M examples from gpt-oss-20b, ran some analysis, and the results were... pretty bizarre time for a deep dive 🧵

curious about the training data of OpenAI's new gpt-oss models? i was too. 

so i generated 10M examples from gpt-oss-20b, ran some analysis, and the results were... pretty bizarre

time for a deep dive 🧵
Sayash Kapoor (@sayashk) 's Twitter Profile Photo

How does GPT-5 compare against Claude Opus 4.1 on agentic tasks? Since their release, we have been evaluating these models on challenging science, web, service, and code tasks. Headline result: While cost-effective, so far GPT-5 never tops agentic leaderboards. More evals 🧵

How does GPT-5 compare against Claude Opus 4.1 on agentic tasks? 

Since their release, we have been evaluating these models on challenging science, web, service, and code tasks. 

Headline result: While cost-effective, so far GPT-5 never tops agentic leaderboards. More evals 🧵
Sayash Kapoor (@sayashk) 's Twitter Profile Photo

1) CORE-Bench (scientific reproducibility) gives agents two hours to reproduce the results from a scientific paper, given access to its code and data. Opus 4.1 is the first model to break the 50% barrier on CORE-Bench. GPT-5 is far behind — even behind Sonnet 3.7 and GPT-4.1.

1) CORE-Bench (scientific reproducibility) gives agents two hours to reproduce the results from a scientific paper, given access to its code and data.

Opus 4.1 is the first model to break the 50% barrier on CORE-Bench. GPT-5 is far behind — even behind Sonnet 3.7 and GPT-4.1.