José Carlos (@josegarciaor) 's Twitter Profile
José Carlos

@josegarciaor

matemático. software libre. [email protected]. jerez. devsecops

ID: 246469625

linkhttps://nebux.cloud calendar_today02-02-2011 21:06:42

13,13K Tweet

628 Takipçi

1,1K Takip Edilen

EL PAÍS (@el_pais) 's Twitter Profile Photo

🔴 ÚLTIMA HORA | Muere a los 89 años el expresidente de Uruguay José ‘Pepe’ Mujica, el revolucionario tranquilo elpais.com/america/2025-0…

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Announcing the newest releases from Meta FAIR. We’re releasing new groundbreaking models, benchmarks, and datasets that will transform the way researchers approach molecular property prediction, language processing, and neuroscience. 1️⃣ Open Molecules 2025 (OMol25): A dataset

J. OB (@jortega407) 's Twitter Profile Photo

Como de normalizado tienen los trabajadores del hospital de Jerez que haya entre 5 y 6 personas esperando cama horas/días y los dejen de cualquier manera sin atender a la sintomatología, dando por bueno que se queden en un sillón toda la noche. Incluso algunos repitiendo noche.

Mistral AI (@mistralai) 's Twitter Profile Photo

Announcing Magistral, our first reasoning model designed to excel in domain-specific, transparent, and multilingual reasoning.

Josh (@josh_bickett) 's Twitter Profile Photo

Today is my first day at Google Cloud. I'm using GPT-4o for coding. It's super powerful. It helped me find and delete some code that wasn't doing anything!

Zephyr (@zephyr_z9) 's Twitter Profile Photo

Huge drop from Baidu From 0.3B to 424B Multiple checkpoints (Pretrained, Post-trained, and base models) kalomaze Competitive with Qwen3 and Deepseek V3 0324 Trained on Kunlun III chips using Paddle Paddle framework (Baidu's CUDA)

Huge drop from Baidu
From 0.3B to 424B
Multiple checkpoints (Pretrained, Post-trained, and base models) <a href="/kalomaze/">kalomaze</a>
Competitive with Qwen3 and Deepseek V3 0324
Trained on Kunlun III chips using Paddle Paddle framework (Baidu's CUDA)
clem 🤗 (@clementdelangue) 's Twitter Profile Photo

Every tech company can and should train their own deepseek R1, Llama or GPT5, just like every tech company writes their own code (and AI is no more than software 2.0). This is why we're releasing the Ultra-Scale Playbook. 200 pages to master: - 5D parallelism (DP, TP, PP, EP,

Every tech company can and should train their own deepseek R1, Llama or GPT5, just like every tech company writes their own code (and AI is no more than software 2.0).

This is why we're releasing the Ultra-Scale Playbook. 200 pages to master:
- 5D parallelism (DP, TP, PP, EP,
Unsloth AI (@unslothai) 's Twitter Profile Photo

Can a 1-bit or 3-bit quantized model outperform GPT-4.1 or Claude-Opus-4? Yes! Today, we're excited to show how LLMs like DeepSeek-V3.1 can be quantized to just 1-bit or 3-bit, and still beat SOTA models like Claude-Opus-4 (thinking) on Aider Polyglot. Details and blog below!

Can a 1-bit or 3-bit quantized model outperform GPT-4.1 or Claude-Opus-4?

Yes!

Today, we're excited to show how LLMs like DeepSeek-V3.1 can be quantized to just 1-bit or 3-bit, and still beat SOTA models like Claude-Opus-4 (thinking) on Aider Polyglot.

Details and blog below!
Gabriele Berton (@gabriberton) 's Twitter Profile Photo

[paper release!] Did you know that you can - speed up any LLM by 4x - and reduce its memory footprint by 2x - and improve its results - without modifying the model at all How??? Here is how we do it 🧵

[paper release!]

Did you know that you can

- speed up any LLM by 4x
- and reduce its memory footprint by 2x
- and improve its results
- without modifying the model at all

How???

Here is how we do it 🧵
Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.
Kimi.ai (@kimi_moonshot) 's Twitter Profile Photo

Kimi Linear Tech Report is dropped! 🚀 huggingface.co/moonshotai/Kim… Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi

Priyanka Lakhara (@codewithpri) 's Twitter Profile Photo

> created the Linux kernel at 21 > built Git because no existing tool was good enough > accidentally became the backbone of servers, Android, cloud, supercomputers > never chased fame, money, titles or hype > stayed private, consistent, and brutally honest for decades > still

&gt; created the Linux kernel at 21
&gt; built Git because no existing tool was good enough
&gt; accidentally became the backbone of servers, Android, cloud, supercomputers
&gt; never chased fame, money, titles or hype
&gt; stayed private, consistent, and brutally honest for decades
&gt; still
Haider. (@slow_developer) 's Twitter Profile Photo

Mathematician Terence Tao: Training and running LLMs isn't mathematically difficult; any math undergrad could understand the basics The mystery is that we have no theory to predict why models excel at certain tasks and fail at others "we can only make empirical experiments"