Daniel Han(@danielhanchen) 's Twitter Profileg
Daniel Han

@danielhanchen

Building @UnslothAI. Finetune LLMs 30x faster https://t.co/aRyAAgKOR7. Prev ML at NVIDIA. Hyperlearn used by NASA. I like maths, making code go fast

ID:717359704226172928

linkhttps://unsloth.ai/ calendar_today05-04-2016 14:34:16

703 Tweets

7,0K Followers

930 Following

Daniel Han(@danielhanchen) 's Twitter Profile Photo

Phi-3's sliding window is 2048 and not 2047! So not an odd number! Glad it got resolved quickly!

Also looks like in fact Phi-3 (3.8b) uses sliding window attention like Mistral. 2048 context length, but SWA up to 4096.

Link to PR: huggingface.co/microsoft/Phi-…

Phi-3's sliding window is 2048 and not 2047! So not an odd number! Glad it got resolved quickly! Also looks like in fact Phi-3 (3.8b) uses sliding window attention like Mistral. 2048 context length, but SWA up to 4096. Link to PR: huggingface.co/microsoft/Phi-…
account_circle
Daniel Han(@danielhanchen) 's Twitter Profile Photo

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to Unsloth AI:

1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096?
2. Upcasted RoPE? Like Gemma?
3. Dynamic

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to @UnslothAI: 1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096? 2. Upcasted RoPE? Like Gemma? 3. Dynamic
account_circle
Unsloth AI(@UnslothAI) 's Twitter Profile Photo

Unsloth is currently trending on Github! 🙌🦥

If you want to finetune LLMs like Llama 3 or Mistral, now is a good time to try!⭐️

github.com/unslothai/unsl…

Unsloth is currently trending on Github! 🙌🦥 If you want to finetune LLMs like Llama 3 or Mistral, now is a good time to try!⭐️ github.com/unslothai/unsl…
account_circle
Daniel Han(@danielhanchen) 's Twitter Profile Photo

Phi-3 Mini 3.8b Instruct is out!!
68.8 MMLU vs Llama-3 8b Instruct's 66.0 MMLU (Phi team's own evals)

The long context 128K model is also out at huggingface.co/microsoft/Phi-…

Working on adding this into Unsloth AI! Some fused linear modules need unfusing :)
huggingface.co/microsoft/Phi-…

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

Microsoft just released Phi-3

- phi-3-mini: 3.8B model trained on 3.3T tokens rivals Mixtral 8x7B and GPT-3.5
- phi-3-medium: 14B model trained on 4.8T tokens w/ 78% on MMLU and 8.9 on MT-bench

arxiv.org/abs/2404.14219

Microsoft just released Phi-3 - phi-3-mini: 3.8B model trained on 3.3T tokens rivals Mixtral 8x7B and GPT-3.5 - phi-3-medium: 14B model trained on 4.8T tokens w/ 78% on MMLU and 8.9 on MT-bench arxiv.org/abs/2404.14219
account_circle
Jeremy Howard(@jeremyphoward) 's Twitter Profile Photo

Today at Answer.AI we've got something new for you: FSDP/QDoRA. We've tested it with AI at Meta Llama3 and the results blow away anything we've seen before.

I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵

Today at @answerdotai we've got something new for you: FSDP/QDoRA. We've tested it with @AIatMeta Llama3 and the results blow away anything we've seen before. I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵
account_circle
Daniel Han(@danielhanchen) 's Twitter Profile Photo

Highly recommend Kaggle! You get Tesla T4 GPUs I think 12 hour runs, and 30 hours for free per week!

I also have a Unsloth AI Kaggle notebook for Llama-3 8B which makes finetuning 2x faster and use 60% less VRAM! Kaggle notebook: kaggle.com/code/danielhan…

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

🔥llm.c update: Our single file of 2,000 ~clean lines of C/CUDA code now trains GPT-2 (124M) on GPU at speeds ~matching PyTorch (fp32, no flash attention)
github.com/karpathy/llm.c…

On my A100 I'm seeing 78ms/iter for llm.c and 80ms/iter for PyTorch. Keeping in mind this is fp32,

🔥llm.c update: Our single file of 2,000 ~clean lines of C/CUDA code now trains GPT-2 (124M) on GPU at speeds ~matching PyTorch (fp32, no flash attention) github.com/karpathy/llm.c… On my A100 I'm seeing 78ms/iter for llm.c and 80ms/iter for PyTorch. Keeping in mind this is fp32,
account_circle
Teknium (e/λ)(@Teknium1) 's Twitter Profile Photo

I think Meta and Llama-3 is the final nail in the coffin to several misconceptions I've been fighting against for the last year.

Llama-3 Chat was trained on over 10M Instruction/Chat samples, and is one of the only finetunes that shows significant improvements to MMLU.

I think Meta and Llama-3 is the final nail in the coffin to several misconceptions I've been fighting against for the last year. Llama-3 Chat was trained on over 10M Instruction/Chat samples, and is one of the only finetunes that shows significant improvements to MMLU.
account_circle
Annie ❤️‍🔥💫(@AnnieLiao_2000) 's Twitter Profile Photo

Come build with us and get $100K+ in credits!

Build Club have just launched a 6 week AI accelerator for top builders, backed by AWS 🚀✨

Build with the likes of Daniel Han, Micah Hill-Smith and receive mentorship with swyx , Logan Kilpatrick + more...

Details 🧵

account_circle
Daniel Han(@danielhanchen) 's Twitter Profile Photo

For burning Qs, if you don't know, Unsloth AI has a wiki page at github.com/unslothai/unsl…!
1. How to update Unsloth so Llama-3 works
2. Fix OOM in eval loops
3. Train lm_head, embed_tokens
4. Resume from checkpoint
5. Saving to GGUF
6. Chat templates
7. Enable 2x faster inference

account_circle
Andrej Karpathy(@karpathy) 's Twitter Profile Photo

Congrats to AI at Meta on Llama 3 release!! 🎉
ai.meta.com/blog/meta-llam…
Notes:

Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we'll see when the rankings come in @ lmsys.org :))
400B is still training, but already encroaching

account_circle