Danny Hallwood ๐Ÿ‡บ๐Ÿ‡ฆ (@stepbystepnomad) 's Twitter Profile
Danny Hallwood ๐Ÿ‡บ๐Ÿ‡ฆ

@stepbystepnomad

digital nomad not by choice, lived in Kyiv for the last decade

ID: 1504388732270657543

calendar_today17-03-2022 09:27:03

983 Tweet

127 Followers

1,1K Following

HotSotin ๐Ÿ‡ซ๐Ÿ‡ฎ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บโ–ณ NAFO (@hotsotin) 's Twitter Profile Photo

1923 โ€“ earthquake in Kamchatka 1924 โ€“ Lenin died 1952 โ€“ earthquake in Kamchatka 1953 โ€“ Stalin died 2006 โ€“ earthquake in Kamchatka 2007 โ€“ Yeltsin died 2025 โ€“ earthquake in Kamchatka 2026 - ???

็™’ใ—ใฎๅ‹•็‰ฉ (@animalkyat) 's Twitter Profile Photo

ใพใ˜ใงใใ†ใ„ใ†ใฎใ„ใ„ใ‹ใ‚‰ๆ—ฉใๆธกใฃใฆใใ‚Œโ€ฆ

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

Structural Innovation & Ultra-High Context Efficiency ๐Ÿ”น Novel Attention: Token-wise compression + DSA (DeepSeek Sparse Attention). ๐Ÿ”น Peak Efficiency: World-leading long context with drastically reduced compute & memory costs. ๐Ÿ”น 1M Standard: 1M context is now the default

Structural Innovation & Ultra-High Context Efficiency

๐Ÿ”น Novel Attention: Token-wise compression + DSA (DeepSeek Sparse Attention).
๐Ÿ”น Peak Efficiency: World-leading long context with drastically reduced compute & memory costs.
๐Ÿ”น 1M Standard: 1M context is now the default
Deli Chen (@victor207755822) 's Twitter Profile Photo

DeepSeek-V3: Dec 26, 2024 DeepSeek-V4: Apr 24, 2026 484 days later, we humbly share our labor of love. As always, we stay true to long-termism and open source for all. AGI belongs to everyone. โค๏ธ๐ŸŒ #DeepSeekV4 #AGIforEveryone #OpenSource

Danny Hallwood ๐Ÿ‡บ๐Ÿ‡ฆ (@stepbystepnomad) 's Twitter Profile Photo

Structural Innovation & Ultra-High Context Efficiency 90% compression on context is a massive deal This tech will be ported to other smaller models, 1M context window on home models to come. Novel Attention: Token-wise compression + DSA (DeepSeek Sparse Attention). Peak

Structural Innovation & Ultra-High Context Efficiency 90% compression on context is a massive deal

This tech will be ported to other smaller models, 1M context window on home models to come. 

 Novel Attention: Token-wise compression + DSA (DeepSeek Sparse Attention).
 Peak
Danny Hallwood ๐Ÿ‡บ๐Ÿ‡ฆ (@stepbystepnomad) 's Twitter Profile Photo

DeepSeek has always excelled in new architecture. V3 gave us MOE mixture of experts, mixed precision training, multi-token prediction, multi-head latent attention. That cascaded into wildly improved smaller OS models for self hosting. Expect 1M context context on other models

์†ก์ค€ Jun Song (@songjunkr) 's Twitter Profile Photo

์ „์ฒด ์˜คํ”ˆ์†Œ์Šค ๋กœ์ปฌLLM ๋ชจ๋ธ ์„ฑ๋Šฅ ๋น„๊ตํ‘œ ์•„๋ž˜ ๋งฅ์—์„œ ๊ตฌ๋™ ๊ฐ€๋Šฅ Qwen3.6-27b : 32gb Minimax-M2.7 : 64gb~ DeepSeek-V4-Flash : 128gb~ GLM-5.1 : 256gb~ Kimi-K2.6 : 512gb~ DeepSeek-V4-Pro : 512gb~ ๋‹น์‹ ์€ ์–ด๋–ค ๋ชจ๋ธ์„ ์‚ฌ์šฉํ• ๊ฑด๊ฐ€์š”?

์ „์ฒด ์˜คํ”ˆ์†Œ์Šค ๋กœ์ปฌLLM ๋ชจ๋ธ ์„ฑ๋Šฅ ๋น„๊ตํ‘œ

์•„๋ž˜ ๋งฅ์—์„œ ๊ตฌ๋™ ๊ฐ€๋Šฅ

Qwen3.6-27b : 32gb
Minimax-M2.7 : 64gb~
DeepSeek-V4-Flash : 128gb~
GLM-5.1 : 256gb~
Kimi-K2.6 : 512gb~
DeepSeek-V4-Pro : 512gb~

๋‹น์‹ ์€ ์–ด๋–ค ๋ชจ๋ธ์„ ์‚ฌ์šฉํ• ๊ฑด๊ฐ€์š”?