Zhenda Xie (@zdaxie) 's Twitter Profile
Zhenda Xie

@zdaxie

Researcher @ DeepSeek AI // Pre-training and Scaling of Foundation Models

ID: 1677270954370822147

calendar_today07-07-2023 10:58:59

11 Tweet

610 Takipçi

122 Takip Edilen

AK (@_akhaliq) 's Twitter Profile Photo

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: huggingface.co/papers/2310.16… present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 DeepSeek Coder 33B is NOW LIVE! Open-source & absolutely FREE! #DeepSeekCoder 💥 Try out here: coder.deepseek.com 🤗 Also on Huggingface: huggingface.co/deepseek-ai 💬 Got questions? Join our Discord fam! discord.gg/Tc7c45Zzu5 🤖 Github page: deepseekcoder.github.io

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

📚Out Now: Technical Report on DeepSeek LLM 67B! Read more: arxiv.org/abs/2401.02954 🔍Discover our in-depth study on scaling laws and how data quality influences them. ✨In the MT-Bench evaluation, DeepSeek surpassed GPT-3.5-turbo, ranking just behind GPT-4. #DeepSeekLLM

📚Out Now: Technical Report on DeepSeek LLM 67B!  Read more: arxiv.org/abs/2401.02954

🔍Discover our in-depth study on scaling laws and how data quality influences them.

✨In the MT-Bench evaluation, DeepSeek surpassed GPT-3.5-turbo, ranking just behind GPT-4. 
#DeepSeekLLM
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🌟 Meet #DeepSeekMoE: The Next Gen of Large Language Models! Performance Highlights: 📈 DeepSeekMoE 2B matches its 2B dense counterpart with 17.5% computation. 🚀 DeepSeekMoE 16B rivals LLaMA2 7B with 40% computation. 🛠 DeepSeekMoE 145B significantly outperforms Gshard,

🌟 Meet #DeepSeekMoE: The Next Gen of Large Language Models!

Performance Highlights:
📈 DeepSeekMoE 2B matches its 2B dense counterpart with 17.5% computation.
🚀 DeepSeekMoE 16B rivals LLaMA2 7B with 40% computation.
🛠 DeepSeekMoE 145B significantly outperforms Gshard,
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Just Out: Tech Report On #DeepSeekCoder - An Open-Source Model Competing with #GPT4's Coding Capabilities. Paper Link: arxiv.org/abs/2401.14196 ⭐ Highlights: - Repo-level Data Construction - Topological Sort for Dependency Parsing - Fill-In-Middle Pre-training Strategy -

🚀 Just Out: Tech Report On #DeepSeekCoder - An Open-Source Model Competing with #GPT4's Coding Capabilities. Paper Link: arxiv.org/abs/2401.14196

⭐ Highlights:
- Repo-level Data Construction
- Topological Sort for Dependency Parsing
- Fill-In-Middle Pre-training Strategy
-
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

[1/5] 🚀 Announcing DeepSeek-VL, sota 1.3B and 7B visual-language models! Paper: arxiv.org/abs/2403.05525 GitHub: github.com/deepseek-ai/De… 📚 Diverse training corpus 👯 Hybrid Vision Encoder 🧠 3-stage training strategy 🆓 Totally free for commercial use and fully open-source

[1/5] 🚀 Announcing DeepSeek-VL, sota 1.3B and 7B visual-language models!

Paper: arxiv.org/abs/2403.05525
GitHub: github.com/deepseek-ai/De…

📚 Diverse training corpus
👯 Hybrid Vision Encoder
🧠 3-stage training strategy
🆓 Totally free for commercial use and fully open-source
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Launching DeepSeek-V2: The Cutting-Edge Open-Source MoE Model! 🌟 Highlights: > Places top 3 in AlignBench, surpassing GPT-4 and close to GPT-4-Turbo. > Ranks top-tier in MT-Bench, rivaling LLaMA3-70B and outperforming Mixtral 8x22B. > Specializes in math, code and reasoning.

🚀 Launching DeepSeek-V2: The Cutting-Edge Open-Source MoE Model!

🌟 Highlights:
> Places top 3 in AlignBench, surpassing GPT-4 and close to GPT-4-Turbo.
> Ranks top-tier in MT-Bench, rivaling LLaMA3-70B and outperforming Mixtral 8x22B.
> Specializes in math, code and reasoning.
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! 🔍 o1-preview-level performance on AIME & MATH benchmarks. 💡 Transparent thought process in real-time. 🛠️ Open-source models & API coming soon! 🌐 Try it now at chat.deepseek.com #DeepSeek

🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!

🔍 o1-preview-level performance on AIME & MATH benchmarks.
💡 Transparent thought process in real-time.
🛠️ Open-source models & API coming soon!

🌐 Try it now at chat.deepseek.com
#DeepSeek
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k