Zizheng Pan (@zizhpan) Twitter Tweets • TwiCopy

DeepSeek

7 months ago

🎉 Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience: • No system prompt • Temperature: 0.6 • Official prompts for search & file upload: bit.ly/4hyH8np • Guidelines to mitigate model bypass

thumb_up_off_alt16,16K

chat_bubble_outline707

repeat1,1K

shareShare

DeepSeek

@deepseek_ai

6 months ago

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

thumb_up_off_alt16,16K

chat_bubble_outline901

repeat2,2K

shareShare

Zizheng Pan

@zizhpan

6 months ago

Hope everyone can enjoy the benefits of open science. Join us in celebrating #OpenSourceWeek! 🔥

thumb_up_off_alt829

chat_bubble_outline27

repeat31

shareShare

DeepSeek

@deepseek_ai

6 months ago

🚀 Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. ✅ BF16 support ✅ Paged KV cache (block size 64) ⚡ 3000 GB/s memory-bound & 580 TFLOPS

thumb_up_off_alt10,10K

chat_bubble_outline562

repeat1,1K

shareShare

DeepSeek

@deepseek_ai

6 months ago

🚀 Day 2 of #OpenSourceWeek: DeepEP Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference. ✅ Efficient and optimized all-to-all communication ✅ Both intranode and internode support with NVLink and RDMA ✅

thumb_up_off_alt8,8K

chat_bubble_outline519

repeat1,1K

shareShare

DeepSeek

@deepseek_ai

6 months ago

🚀 Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs ✅ No heavy dependency, as clean as a tutorial ✅ Fully Just-In-Time compiled

thumb_up_off_alt6,6K

chat_bubble_outline474

repeat1,1K

shareShare

DeepSeek

@deepseek_ai

6 months ago

🚨 Off-Peak Discounts Alert! Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily: 🔹 DeepSeek-V3 at 50% off 🔹 DeepSeek-R1 at a massive 75% off Maximize your resources smarter — save more during these high-value hours!

thumb_up_off_alt6,6K

chat_bubble_outline514

repeat717

shareShare

DeepSeek

@deepseek_ai

6 months ago

🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies ✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. 🔗 github.com/deepseek-ai/Du… ✅ EPLB - an expert-parallel load balancer for V3/R1. 🔗

thumb_up_off_alt6,6K

chat_bubble_outline451

repeat838

shareShare

DeepSeek

@deepseek_ai

6 months ago

🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min

thumb_up_off_alt10,10K

chat_bubble_outline532

repeat1,1K

shareShare

Zizheng Pan

@zizhpan

6 months ago

One More Thing! Happy weekend!

thumb_up_off_alt887

chat_bubble_outline32

repeat29

shareShare

Yefei He

@yefei_he

6 months ago

Introducing NAR, our latest breakthrough in visual generation! 🎨 NAR adopts a "next neighbor prediction" mechanism, transforming visual generation into a step-by-step "outpainting" process. 📄 Paper: arxiv.org/abs/2503.10696 🌍 Project Page: yuanyu0.github.io/nar

thumb_up_off_alt20

chat_bubble_outline2

repeat6

shareShare

Artificial Analysis

@artificialanlys

5 months ago

DeepSeek takes the lead: DeepSeek V3-0324 is now the highest scoring non-reasoning model This is the first time an open weights model is the leading non-reasoning model, a milestone for open source. DeepSeek V3-0324 has jumped forward 7 points in Artificial Analysis

thumb_up_off_alt3,3K

chat_bubble_outline67

repeat662

shareShare

Zizheng Pan

@zizhpan

5 months ago

Guess you probably already knew that we have an update for V3!

thumb_up_off_alt1,1K

chat_bubble_outline40

repeat38

shareShare

Zhihong Shao

@zhs05232838

4 months ago

We just released DeepSeek-Prover V2. - Solves nearly 90% of miniF2F problems - Significantly improves the SoTA performance on the PutnamBench - Achieves a non-trivial pass rate on AIME 24 & 25 problems in their formal version Github: github.com/deepseek-ai/De…

thumb_up_off_alt2,2K

chat_bubble_outline74

repeat329

shareShare

Zizheng Pan

@zizhpan

3 months ago

R1-0528 is out!🎉

thumb_up_off_alt1,1K

chat_bubble_outline42

repeat105

shareShare

Haihao Shen

@haihaoshen

3 months ago

🔥DeepSeek-R1-0528-Qwen3-8B INT4 model with AutoRound, AWQ, GPTQ, and GGUF formats (quantized by Intel AutoRound & Neural Compressor) are available at HF Intel space. Run with vLLM, SGLang and Transformers😍 huggingface.co/Intel/DeepSeek… huggingface.co/Intel/DeepSeek… huggingface.co/Intel/DeepSeek…

thumb_up_off_alt140

chat_bubble_outline3

repeat32

shareShare

Weijie Wang

@wjwang2003

3 months ago

🚀 We're excited to introduce ZPressor, a bottleneck-aware compression module for scalable feed-forward 3DGS. Existing feed-forward 3DGS models struggle with dense views, facing performance drops & massive redundancy. ZPressor leverages Information Bottleneck Theory to compress

thumb_up_off_alt33

chat_bubble_outline6

repeat8

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months ago

🚨Breaking: New DeepSeek-r1 (0528) just tied for #1 in WebDev Arena, matching Claude Opus 4! More highlights: 💠 #6 Overall on Text Arena 💠 #2 in Coding, #4 in Hard Prompts, #5 in Math category 💠 MIT-licensed, currently the best open model on the leaderboard! Huge congrats

thumb_up_off_alt857

chat_bubble_outline21

repeat106

shareShare

Zizheng Pan

@zizhpan

2 months ago

If you’re interested in AI/ML and vision research, don’t miss this great opportunity to work with Chuanxia — an amazing mentor and researcher who’s looking for new students!

thumb_up_off_alt45

chat_bubble_outline4

repeat7

shareShare

Xingyi Yang

@yxy2168

2 months ago

Life Update: Thrilled to join The Hong Kong Polytechnic University (PolyU) as Tenure-Track Assistant Professor in Data Science & AI! 🔍 Now hiring PhDs in the hottest topics of AI: 🔥 Generative AI👁️Computer Vision🤖 Agentic AI Just Apply→adamdad.github.io/opening #AIResearch #AI #GenAI #PhD #PolyU #HongKong

Life Update: Thrilled to join <a href="/HongKongPolyU/">The Hong Kong Polytechnic University (PolyU)</a> as Tenure-Track Assistant Professor in Data Science & AI!

🔍 Now hiring PhDs in the hottest topics of AI:
🔥 Generative AI👁️Computer Vision🤖 Agentic AI
Just Apply→adamdad.github.io/opening

#AIResearch #AI #GenAI #PhD #PolyU #HongKong

thumb_up_off_alt31

chat_bubble_outline4

repeat4

shareShare