John T Davies ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ (@jtdavies) 's Twitter Profile
John T Davies ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ

@jtdavies

Entrepreneur, CTO in Gen-AI, investor, father to 3 grown boys, husband to Rachel, astrophysicist, keen photographer, cyclist, รผber-geek, travelled a lot.

ID: 15784290

linkhttp://www.johntdavies.com calendar_today08-08-2008 23:20:02

3,3K Tweet

1,1K Takipรงi

531 Takip Edilen

Rod Johnson (@springrod) 's Twitter Profile Photo

John Davies giving an important talk on local LLMs. Currently describing how Incept5 has built a banking workflow on Embabel using only local models. Java John T Davies ๐Ÿ‡ช๐Ÿ‡บ Incept5

John Davies giving an important talk on local LLMs. Currently describing how Incept5 has built a banking workflow on Embabel using only local models.

<a href="/java/">Java</a> <a href="/jtdavies/">John T Davies ๐Ÿ‡ช๐Ÿ‡บ</a> <a href="/incept5/">Incept5</a>
John T Davies ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ (@jtdavies) 's Twitter Profile Photo

Travelling back from a great week at Devoxx. I love travelling by train, this is the relatively new Frecciarossa service from Paris to Marseille. Working with a full meal service and an office at 320km/h (over 200mph), cheaper than a flight and very civilised!

Travelling back from a great week at <a href="/Devoxx/">Devoxx</a>. I love travelling by train, this is the relatively new Frecciarossa service from Paris to Marseille. Working with a full meal service and an office at 320km/h (over 200mph), cheaper than a flight and very civilised!
John T Davies ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ (@jtdavies) 's Twitter Profile Photo

I fed this image (below) from Prince Canuma's ongoing work on MLX-VLM to support Qwen3-VL and batch. Qwen's Qwen3-VL-30B-a3B-Instruct-4bit (MLX) with the following prompt. For me on an M4 it was... Prompt: 895 tokens, 711.575 tokens-per-sec Generation: 134 tokens,

I fed this image (below) from <a href="/Prince_Canuma/">Prince Canuma</a>'s ongoing work on MLX-VLM to support Qwen3-VL and batch. 
<a href="/Alibaba_Qwen/">Qwen</a>'s Qwen3-VL-30B-a3B-Instruct-4bit (MLX) with the following prompt. For me on an M4 it was...

Prompt: 895 tokens, 711.575 tokens-per-sec
Generation: 134 tokens,
John T Davies ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ (@jtdavies) 's Twitter Profile Photo

So itโ€™s just a souped-up Raspberry Pi. Get a 48 or 64GB Mac Mini for half the price and it will run the same models way faster, mine runs the same qwen3-32b-8bit at over 30tps and thatโ€™s not even with MLX.

John T Davies ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ (@jtdavies) 's Twitter Profile Photo

Just downloaded and the MLX version (lmstudio-community/Qwen3-VL-2B-Instruct-MLX-bf16), great image decoding and over 100 tokens/second. Next the 32B versions.

Just downloaded and the MLX version (lmstudio-community/Qwen3-VL-2B-Instruct-MLX-bf16), great image decoding and over 100 tokens/second. Next the 32B versions.
John T Davies ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ (@jtdavies) 's Twitter Profile Photo

310 Tokens / second with an unquantised (bf16) model using MLX, this is CRAZY fast! I've just tried the new MLX-VML beta from Prince Canuma (using an M4 Max). DeepSeek-OCR (3B) on MLX is a game-changer.

310 Tokens / second with an unquantised (bf16) model using MLX, this is CRAZY fast!
I've just tried the new MLX-VML beta from <a href="/Prince_Canuma/">Prince Canuma</a> (using an M4 Max). DeepSeek-OCR (3B) on MLX is a game-changer.
John T Davies ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ (@jtdavies) 's Twitter Profile Photo

This looks very interesting, a perfect size too... moonshotai/Kimi-Linear-48B-A3B-Instruct On MMLU-Pro (4k context length), Kimi Linear achieves 51.0 performance with similar speed as full attention. Offering significant speedups at long sequence lengths (1M tokens).

This looks very interesting, a perfect size too...

moonshotai/Kimi-Linear-48B-A3B-Instruct

On MMLU-Pro (4k context length), Kimi Linear achieves 51.0 performance with similar speed as full attention. 

Offering significant speedups at long sequence lengths (1M tokens).
John T Davies ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿ‡บ๐ŸŒ (@jtdavies) 's Twitter Profile Photo

Wow!!! We spoke over lunch for 6 hours, so many ideas, Prince Canuma is going to take over the AI world. Remember this day folks, it started in Krakow! Top secret for now, co-investors, apply now!

Wow!!!  We spoke over lunch for 6 hours, so many ideas, <a href="/Prince_Canuma/">Prince Canuma</a> is going to take over the AI world. Remember this day folks, it started in Krakow!
Top secret for now, co-investors, apply now!