arlo_son (@gson_ai) 's Twitter Profile
arlo_son

@gson_ai

Undergraduate @ Yonsei. UIC Economics.

ID: 1621735807232126976

calendar_today04-02-2023 05:02:21

254 Tweet

133 Takipçi

213 Takip Edilen

Seungone Kim @ NAACL2025 (@seungonekim) 's Twitter Profile Photo

#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.

#NLProc 
Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? 

Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.
TuringPost (@theturingpost) 's Twitter Profile Photo

10 Free Comprehensive Datasets for Supervised Fine-Tuning: ▪️ Awesome ChatGPT Prompts ▪️ FineWeb from Hugging Face ▪️ FineWeb 2 ▪️ OpenO1-SFT ▪️ Cleaned Alpaca Dataset ▪️ LMSYS-Chat-1M ▪️ Dolma from Ai2 Math datasets: ▪️ FineMath ▪️ QwQ-LongCoT-130K ▪️ GSM8K Save the

10 Free Comprehensive Datasets for Supervised Fine-Tuning:

▪️ Awesome ChatGPT Prompts
▪️ FineWeb from <a href="/huggingface/">Hugging Face</a>
▪️ FineWeb 2
▪️ OpenO1-SFT
▪️ Cleaned Alpaca Dataset
▪️ LMSYS-Chat-1M
▪️ Dolma from <a href="/allen_ai/">Ai2</a>

Math datasets:
▪️ FineMath
▪️ QwQ-LongCoT-130K
▪️ GSM8K

Save the
Lifan Yuan (@lifan__yuan) 's Twitter Profile Photo

lessons learned: (1) *capable* (small) base models are good enough to start rl, where (2) reasoning patterns *tailored to each task* just emerge, e.g. self-verification for countdown and decomposition for multiplication. will keep working on demystifying long cot, stay tuned🫡

Trelis Research (@trelisresearch) 's Twitter Profile Photo

+ GRPO is Poor and for the GPU-Rich + ------------------------------- *A specific GRPO vs SFT video will be out next week, but I'm putting initial results here* I trained Llama 3.2 1B on GSM8K with: 1. SFT 2. ORPO 3. GRPO For SFT and ORPO, I generated training data using Llama

+ GRPO is Poor and for the GPU-Rich +
-------------------------------

*A specific GRPO vs SFT video will be out next week, but I'm putting initial results here*

I trained Llama 3.2 1B on GSM8K with:
1. SFT
2. ORPO
3. GRPO

For SFT and ORPO, I generated training data using Llama
Seungone Kim @ NAACL2025 (@seungonekim) 's Twitter Profile Photo

NAACL HLT 2025 I'll also be presenting our KMMLU paper with arlo_son! It is one of the most widely adopted benchmarks used by companies such as NAVER_official LG AI Research Kakao Corp that develop Korean LLMs. 📅 Session C: Wednesday April 30th, 14:00-15:30 x.com/gson_AI/status…

Stella Biderman (@blancheminerva) 's Twitter Profile Photo

People are really eager to use AIs "to accelerate science" (whatever that means). Designing meaningful tests tailored to proposed use-cases is a lot of work, but it's work I'm quite excited about. Bottom line: Current models aren't usable at identifying major flaws in papers.