Alex Bishka (@alex_bishka) Twitter Tweets • TwiCopy

Alex Bishka

@alex_bishka

+ Follow

Tinkerer | MI Enthusiast: bishka.dev/ml | Mind The Abstract: mindtheabstract.com

ID: 1906228607187668992

linkhttps://bishka.dev calendar_today30-03-2025 06:15:00

75 Tweet

11 Takipçi

103 Takip Edilen

DeepSeek

@deepseek_ai

6 months ago

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

thumb_up_off_alt16,16K

chat_bubble_outline901

repeat2,2K

shareShare

DeepSeek

@deepseek_ai

3 months ago

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗

thumb_up_off_alt9,9K

chat_bubble_outline386

repeat1,1K

shareShare

benedict neo

@benxneo

3 months ago

@saurabhalonee arxiv.org/abs/1906.02691

thumb_up_off_alt20

chat_bubble_outline0

repeat7

shareShare

Yann LeCun

@ylecun

3 months ago

V-JEPA-v2

thumb_up_off_alt1,1K

chat_bubble_outline65

repeat123

shareShare

Alex Bishka

@alex_bishka

2 months ago

I've added navigation b/w weekly summaries on Mind The Abstract You can now go back in time to see previous AI/ML weekly arXiv summaries (mindtheabstract.com/newsletter/202… to mindtheabstract.com/newsletter/202…) I'll be adding more newsletters and exploration features b/w them!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Yann LeCun

@ylecun

2 months ago

I don't wanna say 'I told you so', but I told you so.

thumb_up_off_alt5,5K

chat_bubble_outline291

repeat508

shareShare

Anthropic

@anthropicai

2 months ago

New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down.

thumb_up_off_alt3,3K

chat_bubble_outline165

repeat573

shareShare

Alex Bishka

@alex_bishka

21 days ago

What do you do if your validation loss is 99.90% or higher? How do you meaningfully select a model at that point? Can you somehow "uncap" your ceiling without affecting model performance? Asking for a friend🙃

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare