Shenbin Qian (@shenbinqian) Twitter Tweets • TwiCopy

AK

3 years ago

Toolformer: Language Models Can Teach Themselves to Use Tools introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction abs: arxiv.org/abs/2302.04761

thumb_up_off_alt1,1K

chat_bubble_outline32

repeat269

shareShare

Chen Lu

@_chen_lu_

a year ago

I wrote a UNet diffusion model in pure CUDA: github.com/clu0/unet.cu This project was inspired by Andrej Karpathy 's llm.c (github.com/karpathy/llm.c). I also learnt a lot about CUDA kernels from Simon Boehm 's Matmul blog (siboehm.com/articles/22/CU…). (1/3)

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat166

shareShare

Sanchit Gandhi

@sanchitgandhi99

a year ago

Local Gemma: a package for running LARGE language models with SMALL amounts of memory 🤏 The Gemma-2 27b model typically requires 70GB GPU memory. With Local Gemma, this is just 5GB 🤯 Try it for yourself in 2-steps: pip install local-gemma local-gemma --preset memory_extreme

thumb_up_off_alt377

chat_bubble_outline10

repeat68

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

a year ago

AI Agents That Matter abs: arxiv.org/abs/2407.01502 Performs a careful analysis of existing benchmarks, analyzing across additional axes like cost, proposes new baselines 1. AI agent evaluations must be cost-controlled 2. Jointly optimizing accuracy and cost can yield better

thumb_up_off_alt480

chat_bubble_outline9

repeat108

shareShare

Unsloth AI

@unslothai

a year ago

We made a step-by-step tutorial on how to finetune Llama-3 with Google Colab & deploy it to @Ollama Tutorial: docs.unsloth.ai/tutorials/how-… Colab notebook: colab.research.google.com/drive/1WZDi7AP… Blog post & video coming soon. 🦥

thumb_up_off_alt571

chat_bubble_outline7

repeat119

shareShare

Interesting STEM

@interestingstem

a year ago

SQL Cheatsheet

thumb_up_off_alt485

chat_bubble_outline1

repeat96

shareShare

Zhaorun Chen @ICLR2025

@zrchen_aisafety

a year ago

We know LLM agents 🤖 are powerful and popular these days, but can they be subverted to act as killer agents 😈 just like in Westworld?😱 Sadly, the answer is YES! 😱😱 🔥🔥 We reveal the vulnerability and potential threats of generic LLM agents in our new work AgentPoison:

thumb_up_off_alt219

chat_bubble_outline8

repeat66

shareShare

Mistral AI

@mistralai

a year ago

mistral.ai/news/mistral-l…

thumb_up_off_alt2,2K

chat_bubble_outline162

repeat392

shareShare

Andrej Karpathy

@karpathy

a year ago

To help explain the weirdness of LLM Tokenization I thought it could be amusing to translate every token to a unique emoji. This is a lot closer to truth - each token is basically its own little hieroglyph and the LLM has to learn (from scratch) what it all means based on

thumb_up_off_alt7,7K

chat_bubble_outline291

repeat1,1K

shareShare

PyTorch

@pytorch

a year ago

Introducing FlexAttention: a new API that lets you implement diverse attention variants in just a few lines of idiomatic PyTorch code. 🔥 Check out the blog post for more details: hubs.la/Q02KsKNR0

thumb_up_off_alt483

chat_bubble_outline3

repeat91

shareShare

Shenbin Qian

@shenbinqian

a year ago

Interesting

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI at Meta

@aiatmeta

a year ago

Using structured weight pruning and knowledge distillation, the NVIDIA AI research team refined Llama 3.1 8B into a new Llama-3.1-Minitron 4B. They're releasing the new models on Hugging Face and shared a deep dive on how they did it ➡️ go.fb.me/b2h2c8

Using structured weight pruning and knowledge distillation, the <a href="/NVIDIAAI/">NVIDIA AI</a> research team refined Llama 3.1 8B into a new Llama-3.1-Minitron 4B.

They're releasing the new models on <a href="/huggingface/">Hugging Face</a> and shared a deep dive on how they did it ➡️ go.fb.me/b2h2c8

thumb_up_off_alt1,1K

chat_bubble_outline30

repeat342

shareShare

Shenbin Qian

@shenbinqian

a year ago

Mark mark.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Shenbin Qian

@shenbinqian

a year ago

Thank you Sui Sui HE スイ for the invitation! I'm looking forward to sharing my research on MT evaluation at the upcoming seminar. Hope to see many of you there for an engaging discussion!

thumb_up_off_alt1

chat_bubble_outline2

repeat1

shareShare

CTS Surrey

@cts_surrey

a year ago

Due to its popularity, we are extending our 'ASR for interpreting' course applications! Are you a company interested in #ASR & #AI for #interpreting? Fill in our survey to share your experience & join a 4-day online course on #bespoke interpreting ASR. 🤩surreyfbel.qualtrics.com/jfe/form/SV_8u…

thumb_up_off_alt1

chat_bubble_outline0

repeat2

shareShare

Cheng Han Chiang (姜成翰)

@dcml0714

9 months ago

🚀 New Paper Alert! 🚀 Want better LLM-as-a-Judge? TRACT: 🧠 CoT + Regression-Aware Fine-tuning (RAFT) = Better numerical predictions! 📊 arxiv.org/abs/2503.04381 🧵👇 A thread on TRACT:

thumb_up_off_alt285

chat_bubble_outline4

repeat63

shareShare