Manuel Alejandro de Brito Fontes (@aledbf) 's Twitter Profile
Manuel Alejandro de Brito Fontes

@aledbf

ID: 94353906

calendar_today03-12-2009 15:45:29

426 Tweet

362 Followers

457 Following

Binyuan Hui (@huybery) 's Twitter Profile Photo

We'll continuously enhance the qwen code (cli tool) based on your feedback and even release improved qwen-coder (model)! Our goal is to match Claude Code's performance while remaining fully open-source!

Gergely Orosz (@gergelyorosz) 's Twitter Profile Photo

Something deeply ironic: how startups asking for devs to put in 6+ days per week, 80+ hour per weeks are... AI startups You'd assume that the value add of AI could be humans needing to do less work! So devs could spin off agents, go home and sleep. But no, doesn't work like this

Rhys (@rhyssullivan) 's Twitter Profile Photo

Got inspired so I recreated a demo of this w/ Claude Code & Vercel Sandbox Each thread gets their own sandbox to develop in, but if you wanted to they could all use the same sandbox via worktrees

vLLM (@vllm_project) 's Twitter Profile Photo

🚀 Amazing community project! vLLM CLI — a command-line tool for serving LLMs with vLLM: ✅ Interactive menu-driven UI & scripting-friendly CLI ✅ Local + HuggingFace Hub model management ✅ Config profiles for perf/memory tuning ✅ Real-time server & GPU monitoring ✅ Error

🚀 Amazing community project!

vLLM CLI — a command-line tool for serving LLMs with vLLM:
✅ Interactive menu-driven UI & scripting-friendly CLI
✅ Local + HuggingFace Hub model management
✅ Config profiles for perf/memory tuning
✅ Real-time server & GPU monitoring
✅ Error
Matt Beton (@mattbeton) 's Twitter Profile Photo

Linear scaling achieved with multiple DeepSeek v3.1 instances. 4x macs = 4x throughput. 2x M3 Ultra Mac Studios = 1x DeepSeek @ 14 tok/sec 4x M3 Ultra Mac Studios = 2x DeepSeek @ 28 tok/sec DeepSeek V3.1 is a 671B parameter model - so at its native 8-bit quantization, it

Junyang Lin (@justinlin610) 's Twitter Profile Photo

Qwen3-Next, or to say, a preview of our next generation (3.5?) is out! This time we try to be bold, but actually we have been doing experiments on hybrid models and linear attention for about a year. We believe that our solution shoud be at least a stable and solid solution to

Alex Cheema - e/acc (@alexocheema) 's Twitter Profile Photo

NVIDIA sent us 2 DGX Sparks. For a while we wondered what we would do with them. The memory bandwidth is 273GB/s making it 3x slower than an M3 Ultra (819GB/s) for batch_size=1 inference. But it has 4x more FLOPS (100 TFLOPS compared to 26 TFLOPS). So we thought, what if we

Manuel Alejandro de Brito Fontes (@aledbf) 's Twitter Profile Photo

Wrapping up my nearly five-year journey at Gitpod today! Grateful for all the experiences and the amazing people I've met along the way. On to the next chapter! 🚀

Kohei Tokunaga (@tokunagakohei) 's Twitter Profile Photo

LLMlet: P2P distributed LLM inference on browsers with Wasm-compiled llama.cpp + WebRTC Repo: github.com/ktock/llmlet Demo: ktock.github.io/llmlet-demo/ A model can't fit in a tab can be split and run on multiple browsers. Still experimental and missing parallelism and TURN service.

LLMlet: P2P distributed LLM inference on browsers with Wasm-compiled llama.cpp + WebRTC

Repo: github.com/ktock/llmlet
Demo: ktock.github.io/llmlet-demo/

A model can't fit in a tab can be split and run on multiple browsers. Still experimental and missing parallelism and TURN service.
Awni Hannun (@awnihannun) 's Twitter Profile Photo

The latest MLX is out! And it has a new distributed back-end (JACCL) that uses RDMA over TB5 for super low-latency communication across multiple Macs. Thanks to Angelos Katharopoulos

The latest MLX is out!

And it has a new distributed back-end (JACCL) that uses RDMA over TB5 for super low-latency communication across multiple Macs.

Thanks to <a href="/angeloskath/">Angelos Katharopoulos</a>
Alex Cheema - e/acc (@alexocheema) 's Twitter Profile Photo

Total unified memory: 2TB @ 3.2TB/s. Apple Silicon leads in memory / memory bandwidth unit economics. This is what matters for local AI where batch_size is small and workloads are memory-bound.

Manuel Alejandro de Brito Fontes (@aledbf) 's Twitter Profile Photo

Built qemubox: experimental containerd shim that runs containers in lightweight QEMU/KVM VMs: - ~300ms boot with full systemd - Docker works inside the VM - Snapshot & commit like regular images Demo: asciinema.org/a/5GJ0fPswxolR… GitHub: github.com/aledbf/qemubox