David Pissarra (@davidpissarra) 's Twitter Profile
David Pissarra

@davidpissarra

PhD Student @NYU_Courant | MSc @istecnico @Tsinghua_Uni | prev: Research Intern @CSDatCMU

ID: 1213860292330848261

linkhttp://davidpissarra.com calendar_today05-01-2020 16:30:52

14 Tweet

114 Takipçi

133 Takip Edilen

Charlie Ruan (@charlie_ruan) 's Twitter Profile Photo

Run Mistral AI's 7B model on your browser with WebGPU acceleration! Try it out at webllm.mlc.ai For native LLM deployment, sliding window attention is particularly helpful for enjoying longer context with less memory requirement.

Run <a href="/MistralAI/">Mistral AI</a>'s 7B model on your browser with <a href="/WebGPU/">WebGPU</a> acceleration! Try it out at webllm.mlc.ai

For native LLM deployment, sliding window attention is particularly helpful for enjoying longer context with less memory requirement.
David Pissarra (@davidpissarra) 's Twitter Profile Photo

Run the Mistral-7B-Instruct-v0.2 model on iPhone! Supports now StreamingLLM for endless generation. Try the MLC Chat App via TestFlight llm.mlc.ai For native LLM deployment, attention sinks are particularly helpful for longer generation with less memory requirement.

Guangxuan Xiao (@guangxuan_xiao) 's Twitter Profile Photo

Exciting news: StreamingLLM is now available on iPhone! 🎉 A huge thanks to David Pissarra for his fantastic extension to our work. Can't wait to explore the possibilities with StreamingLLM!

Charlie Ruan (@charlie_ruan) 's Twitter Profile Photo

New WizardMath V1.1 from WizardLM on WebLLM! Took me only ~20 mins to deploy it on browser with WebGPU acceleration. WebLLM can be an easy way for folks to try new models — a laptop with Chrome, that’s it! We are actively working on WebLLM to make it even better!

Charlie Ruan (@charlie_ruan) 's Twitter Profile Photo

With Chrome v121, you can run webllm.mlc.ai on your Android web browser with WebGPU acceleration, everything locally! Here is a 1x speed demo of running 4-bit quantized Phi-2 on Samsung S23. Thank you François Jason Mayes for the support and suggestions!

Charlie Ruan (@charlie_ruan) 's Twitter Profile Photo

CodeLlama 70B is now on MLC LLM -- local deployment everywhere! Thanks to JIT compilation, running on different platforms (even w/ multi-GPU) is made easy -- see how M2 Mac (left) and 2 x RTX4090 (right) have almost the same code. llm.mlc.ai/docs/ huggingface.co/mlc-ai

CodeLlama 70B is now on MLC LLM -- local deployment everywhere!

Thanks to JIT compilation, running on different platforms (even w/ multi-GPU) is made easy -- see how M2 Mac (left) and 2 x RTX4090 (right) have almost the same code.

llm.mlc.ai/docs/
huggingface.co/mlc-ai
Ruihang Lai (@ruihanglai) 's Twitter Profile Photo

Run Gemma model locally on iPhone - we get blazing fast 20 tok/s for 2B model. This shows amazing potential ahead for Gemma fine-tunes on phones, made possible by the new MLC SLM compilation flow by Junru Shao from octoaicloud and many other contributors. github.com/mlc-ai/mlc-llm

Charlie Ruan (@charlie_ruan) 's Twitter Profile Photo

webllm.mlc.ai now adds Gemma from Google DeepMind! The 2b model is perfect for building in-browser agents with WebGPU acceleration -- everything local! Here is a 1x speed demo of 4-bit quantized gemma-2b-it on Google Pixel 7 Pro with Chrome.

Charlie Ruan (@charlie_ruan) 's Twitter Profile Photo

Excited to share WebLLM engine: a high-performance in-browser LLM inference engine! WebLLM offers local GPU acceleration via WebGPU, fully OpenAI-compatible API, and built-in web workers support to separate backend executions. Check out the blog post: blog.mlc.ai/2024/06/13/web…

Shanli Xing (@0xsling0) 's Twitter Profile Photo

🤔 Can AI optimize the systems it runs on? 🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents: - Standardized signature for LLM serving kernels - Implement kernels with your preferred language - Benchmark them against real-world serving

🤔 Can AI optimize the systems it runs on?

🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents:

- Standardized signature for LLM serving kernels
- Implement kernels with your preferred language
- Benchmark them against real-world serving