Manuel Alejandro de Brito Fontes (@aledbf) Twitter Tweets • TwiCopy

Binyuan Hui

9 months ago

We'll continuously enhance the qwen code (cli tool) based on your feedback and even release improved qwen-coder (model)! Our goal is to match Claude Code's performance while remaining fully open-source!

thumb_up_off_alt1,1K

chat_bubble_outline103

repeat107

shareShare

AI SDK

@aisdk

9 months ago

Stream preliminary tool results

thumb_up_off_alt283

chat_bubble_outline8

repeat21

shareShare

Gergely Orosz

@gergelyorosz

9 months ago

Something deeply ironic: how startups asking for devs to put in 6+ days per week, 80+ hour per weeks are... AI startups You'd assume that the value add of AI could be humans needing to do less work! So devs could spin off agents, go home and sleep. But no, doesn't work like this

thumb_up_off_alt1,1K

chat_bubble_outline65

repeat128

shareShare

Rhys

@rhyssullivan

9 months ago

Got inspired so I recreated a demo of this w/ Claude Code & Vercel Sandbox Each thread gets their own sandbox to develop in, but if you wanted to they could all use the same sandbox via worktrees

thumb_up_off_alt106

chat_bubble_outline5

repeat4

shareShare

vLLM

@vllm_project

9 months ago

🚀 Amazing community project! vLLM CLI — a command-line tool for serving LLMs with vLLM: ✅ Interactive menu-driven UI & scripting-friendly CLI ✅ Local + HuggingFace Hub model management ✅ Config profiles for perf/memory tuning ✅ Real-time server & GPU monitoring ✅ Error

thumb_up_off_alt841

chat_bubble_outline9

repeat130

shareShare

Matt Beton

@mattbeton

9 months ago

Linear scaling achieved with multiple DeepSeek v3.1 instances. 4x macs = 4x throughput. 2x M3 Ultra Mac Studios = 1x DeepSeek @ 14 tok/sec 4x M3 Ultra Mac Studios = 2x DeepSeek @ 28 tok/sec DeepSeek V3.1 is a 671B parameter model - so at its native 8-bit quantization, it

thumb_up_off_alt1,1K

chat_bubble_outline49

repeat147

shareShare

Junyang Lin

@justinlin610

8 months ago

Qwen3-Next, or to say, a preview of our next generation (3.5?) is out! This time we try to be bold, but actually we have been doing experiments on hybrid models and linear attention for about a year. We believe that our solution shoud be at least a stable and solid solution to

thumb_up_off_alt1,1K

chat_bubble_outline53

repeat121

shareShare

Awni Hannun

@awnihannun

8 months ago

The new batch generation in MLX LM is pretty fast. Here's 4 simultaneous generations with Qwen3 4B on my M4 max:

thumb_up_off_alt255

chat_bubble_outline20

repeat22

shareShare

Alex Cheema - e/acc

@alexocheema

7 months ago

NVIDIA sent us 2 DGX Sparks. For a while we wondered what we would do with them. The memory bandwidth is 273GB/s making it 3x slower than an M3 Ultra (819GB/s) for batch_size=1 inference. But it has 4x more FLOPS (100 TFLOPS compared to 26 TFLOPS). So we thought, what if we

thumb_up_off_alt1,1K

chat_bubble_outline51

repeat129

shareShare

Manuel Alejandro de Brito Fontes

@aledbf

6 months ago

Wrapping up my nearly five-year journey at Gitpod today! Grateful for all the experiences and the amazing people I've met along the way. On to the next chapter! 🚀

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

paco xu

@xu_paco

6 months ago

containerd sandbox runtime using vms github.com/containerd/ner…

thumb_up_off_alt26

chat_bubble_outline1

repeat5

shareShare

Kubernetes

@kubernetesio

6 months ago

Blog: Ingress NGINX Retirement: What You Need to Know - kubernetes.dev/blog/2025/11/1… #Kubernetes

thumb_up_off_alt212

chat_bubble_outline3

repeat79

shareShare

Kohei Tokunaga

@tokunagakohei

6 months ago

LLMlet: P2P distributed LLM inference on browsers with Wasm-compiled llama.cpp + WebRTC Repo: github.com/ktock/llmlet Demo: ktock.github.io/llmlet-demo/ A model can't fit in a tab can be split and run on multiple browsers. Still experimental and missing parallelism and TURN service.

thumb_up_off_alt13

chat_bubble_outline2

repeat3

shareShare

Jake Tivy

@jakkuh_t

5 months ago

VIDEO on this INSANITY is live!! Watch now! :D youtu.be/4l4UWZGxvoc

thumb_up_off_alt716

chat_bubble_outline20

repeat17

shareShare

Awni Hannun

@awnihannun

5 months ago

The latest MLX is out! And it has a new distributed back-end (JACCL) that uses RDMA over TB5 for super low-latency communication across multiple Macs. Thanks to Angelos Katharopoulos

thumb_up_off_alt222

chat_bubble_outline6

repeat44

shareShare

Alex Cheema - e/acc

@alexocheema

5 months ago

Total unified memory: 2TB @ 3.2TB/s. Apple Silicon leads in memory / memory bandwidth unit economics. This is what matters for local AI where batch_size is small and workloads are memory-bound.

thumb_up_off_alt494

chat_bubble_outline26

repeat41

shareShare

Awni Hannun

@awnihannun

4 months ago

Running OpenCode with MLX and Nemotron 3 Nano locally on an M4 Max is pretty nice. Here's a quick demo:

thumb_up_off_alt293

chat_bubble_outline14

repeat20

shareShare

Manuel Alejandro de Brito Fontes

@aledbf

4 months ago

Built qemubox: experimental containerd shim that runs containers in lightweight QEMU/KVM VMs: - ~300ms boot with full systemd - Docker works inside the VM - Snapshot & commit like regular images Demo: asciinema.org/a/5GJ0fPswxolR… GitHub: github.com/aledbf/qemubox

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare