Zizheng Pan (@zizhpan) 's Twitter Profile
Zizheng Pan

@zizhpan

Researcher @deepseek_ai | Previously @nvidia @MonashUni @UniofAdelaide. Words are my own.

ID: 1496348031494819842

calendar_today23-02-2022 04:55:48

241 Tweet

61,61K Takipçi

779 Takip Edilen

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🎉 Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience: • No system prompt • Temperature: 0.6 • Official prompts for search & file upload: bit.ly/4hyH8np • Guidelines to mitigate model bypass

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. ✅ BF16 support ✅ Paged KV cache (block size 64) ⚡ 3000 GB/s memory-bound & 580 TFLOPS

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Day 2 of #OpenSourceWeek: DeepEP Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference. ✅ Efficient and optimized all-to-all communication ✅ Both intranode and internode support with NVLink and RDMA ✅

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs ✅ No heavy dependency, as clean as a tutorial ✅ Fully Just-In-Time compiled

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚨 Off-Peak Discounts Alert! Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily: 🔹 DeepSeek-V3 at 50% off 🔹 DeepSeek-R1 at a massive 75% off Maximize your resources smarter — save more during these high-value hours!

🚨 Off-Peak Discounts Alert!

Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily:

🔹 DeepSeek-V3 at 50% off
🔹 DeepSeek-R1 at a massive 75% off

Maximize your resources smarter — save more during these high-value hours!
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies ✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. 🔗 github.com/deepseek-ai/Du… ✅ EPLB - an expert-parallel load balancer for V3/R1. 🔗

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min

Yefei He (@yefei_he) 's Twitter Profile Photo

Introducing NAR, our latest breakthrough in visual generation! 🎨 NAR adopts a "next neighbor prediction" mechanism, transforming visual generation into a step-by-step "outpainting" process. 📄 Paper: arxiv.org/abs/2503.10696 🌍 Project Page: yuanyu0.github.io/nar

Artificial Analysis (@artificialanlys) 's Twitter Profile Photo

DeepSeek takes the lead: DeepSeek V3-0324 is now the highest scoring non-reasoning model This is the first time an open weights model is the leading non-reasoning model, a milestone for open source. DeepSeek V3-0324 has jumped forward 7 points in Artificial Analysis

DeepSeek takes the lead: DeepSeek V3-0324 is now the highest scoring non-reasoning model

This is the first time an open weights model is the leading non-reasoning model, a milestone for open source.

DeepSeek V3-0324 has jumped forward 7 points in Artificial Analysis
Zhihong Shao (@zhs05232838) 's Twitter Profile Photo

We just released DeepSeek-Prover V2. - Solves nearly 90% of miniF2F problems - Significantly improves the SoTA performance on the PutnamBench - Achieves a non-trivial pass rate on AIME 24 & 25 problems in their formal version Github: github.com/deepseek-ai/De…

We just released DeepSeek-Prover V2.
- Solves nearly 90% of miniF2F problems
- Significantly improves the SoTA performance on the PutnamBench
- Achieves a non-trivial pass rate on AIME 24 & 25 problems in their formal version

Github: github.com/deepseek-ai/De…
Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🔥DeepSeek-R1-0528-Qwen3-8B INT4 model with AutoRound, AWQ, GPTQ, and GGUF formats (quantized by Intel AutoRound & Neural Compressor) are available at HF Intel space. Run with vLLM, SGLang and Transformers😍 huggingface.co/Intel/DeepSeek… huggingface.co/Intel/DeepSeek… huggingface.co/Intel/DeepSeek…

Weijie Wang (@wjwang2003) 's Twitter Profile Photo

🚀 We're excited to introduce ZPressor, a bottleneck-aware compression module for scalable feed-forward 3DGS. Existing feed-forward 3DGS models struggle with dense views, facing performance drops & massive redundancy. ZPressor leverages Information Bottleneck Theory to compress

🚀 We're excited to introduce ZPressor, a bottleneck-aware compression module for scalable feed-forward 3DGS.

Existing feed-forward 3DGS models struggle with dense views, facing performance drops & massive redundancy. ZPressor leverages Information Bottleneck Theory to compress
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨Breaking: New DeepSeek-r1 (0528) just tied for #1 in WebDev Arena, matching Claude Opus 4! More highlights: 💠 #6 Overall on Text Arena 💠 #2 in Coding, #4 in Hard Prompts, #5 in Math category 💠 MIT-licensed, currently the best open model on the leaderboard! Huge congrats

🚨Breaking: New DeepSeek-r1 (0528) just tied for #1 in WebDev Arena, matching Claude Opus 4!

More highlights:
💠 #6 Overall on Text Arena
💠 #2 in Coding, #4 in Hard Prompts, #5 in Math category
💠 MIT-licensed, currently the best open model on the leaderboard!

Huge congrats
Zizheng Pan (@zizhpan) 's Twitter Profile Photo

If you’re interested in AI/ML and vision research, don’t miss this great opportunity to work with Chuanxia — an amazing mentor and researcher who’s looking for new students!

Xingyi Yang (@yxy2168) 's Twitter Profile Photo

Life Update: Thrilled to join The Hong Kong Polytechnic University (PolyU) as Tenure-Track Assistant Professor in Data Science & AI! 🔍 Now hiring PhDs in the hottest topics of AI: 🔥 Generative AI👁️Computer Vision🤖 Agentic AI Just Apply→adamdad.github.io/opening #AIResearch #AI #GenAI #PhD #PolyU #HongKong

Life Update: Thrilled to join <a href="/HongKongPolyU/">The Hong Kong Polytechnic University (PolyU)</a> as Tenure-Track Assistant Professor in Data Science &amp; AI!

🔍 Now hiring PhDs in the hottest topics of AI:
🔥 Generative AI👁️Computer Vision🤖 Agentic AI
Just Apply→adamdad.github.io/opening

#AIResearch #AI #GenAI #PhD #PolyU #HongKong