arlo_son (@gson_ai) Twitter Tweets • TwiCopy

arlo_son

@gson_ai

+ Follow

Undergraduate @ Yonsei. UIC Economics.

ID: 1621735807232126976

calendar_today04-02-2023 05:02:21

254 Tweet

133 Takipçi

213 Takip Edilen

Seungone Kim @ NAACL2025

@seungonekim

a year ago

#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.

thumb_up_off_alt185

chat_bubble_outline2

repeat49

shareShare

TuringPost

@theturingpost

a year ago

10 Free Comprehensive Datasets for Supervised Fine-Tuning: ▪️ Awesome ChatGPT Prompts ▪️ FineWeb from Hugging Face ▪️ FineWeb 2 ▪️ OpenO1-SFT ▪️ Cleaned Alpaca Dataset ▪️ LMSYS-Chat-1M ▪️ Dolma from Ai2 Math datasets: ▪️ FineMath ▪️ QwQ-LongCoT-130K ▪️ GSM8K Save the

10 Free Comprehensive Datasets for Supervised Fine-Tuning:

▪️ Awesome ChatGPT Prompts
▪️ FineWeb from <a href="/huggingface/">Hugging Face</a>
▪️ FineWeb 2
▪️ OpenO1-SFT
▪️ Cleaned Alpaca Dataset
▪️ LMSYS-Chat-1M
▪️ Dolma from <a href="/allen_ai/">Ai2</a>

Math datasets:
▪️ FineMath
▪️ QwQ-LongCoT-130K
▪️ GSM8K

Save the

thumb_up_off_alt115

chat_bubble_outline3

repeat30

shareShare

Lifan Yuan

@lifan__yuan

10 months ago

lessons learned: (1) *capable* (small) base models are good enough to start rl, where (2) reasoning patterns *tailored to each task* just emerge, e.g. self-verification for countdown and decomposition for multiplication. will keep working on demystifying long cot, stay tuned🫡

thumb_up_off_alt141

chat_bubble_outline6

repeat14

shareShare

Trelis Research

@trelisresearch

9 months ago

+ GRPO is Poor and for the GPU-Rich + ------------------------------- *A specific GRPO vs SFT video will be out next week, but I'm putting initial results here* I trained Llama 3.2 1B on GSM8K with: 1. SFT 2. ORPO 3. GRPO For SFT and ORPO, I generated training data using Llama

thumb_up_off_alt410

chat_bubble_outline18

repeat59

shareShare

Seungone Kim @ NAACL2025

@seungonekim

7 months ago

NAACL HLT 2025 I'll also be presenting our KMMLU paper with arlo_son! It is one of the most widely adopted benchmarks used by companies such as NAVER_official LG AI Research Kakao Corp that develop Korean LLMs. 📅 Session C: Wednesday April 30th, 14:00-15:30 x.com/gson_AI/status…

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

arlo_son

@gson_ai

7 months ago

I'll be presenting KMMLU, the-most used korean benchmark by Korean big techs at the moment with Seungone Kim today, at 2pm!

thumb_up_off_alt11

chat_bubble_outline0

repeat2

shareShare

Stella Biderman

@blancheminerva

6 months ago

People are really eager to use AIs "to accelerate science" (whatever that means). Designing meaningful tests tailored to proposed use-cases is a lot of work, but it's work I'm quite excited about. Bottom line: Current models aren't usable at identifying major flaws in papers.

thumb_up_off_alt77

chat_bubble_outline1

repeat9

shareShare