Alex Bishka (@alex_bishka) 's Twitter Profile
Alex Bishka

@alex_bishka

Tinkerer | MI Enthusiast: bishka.dev/ml | Mind The Abstract: mindtheabstract.com

ID: 1906228607187668992

linkhttps://bishka.dev calendar_today30-03-2025 06:15:00

75 Tweet

11 Takipçi

103 Takip Edilen

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗

Alex Bishka (@alex_bishka) 's Twitter Profile Photo

I've added navigation b/w weekly summaries on Mind The Abstract You can now go back in time to see previous AI/ML weekly arXiv summaries (mindtheabstract.com/newsletter/202… to mindtheabstract.com/newsletter/202…) I'll be adding more newsletters and exploration features b/w them!

Anthropic (@anthropicai) 's Twitter Profile Photo

New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down.

New Anthropic Research: Agentic Misalignment.

In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down.
Alex Bishka (@alex_bishka) 's Twitter Profile Photo

What do you do if your validation loss is 99.90% or higher? How do you meaningfully select a model at that point? Can you somehow "uncap" your ceiling without affecting model performance? Asking for a friend🙃