not engineer (@wjhtomwjh) Twitter Tweets • TwiCopy

not engineer

@wjhtomwjh

+ Follow

engineer

ID: 1669202882

calendar_today14-08-2013 01:38:25

1,1K Tweet

165 Followers

617 Following

not engineer

@wjhtomwjh

a year ago

everyone betting me to lose but gonna make r1-1.5b as cheap as xgboost

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

not engineer

@wjhtomwjh

a year ago

is there a r1 distilled but 0.5b, would beat xgboost price

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

u can just copy paste python code calling o1 structured outputs api and print response.choices[0].message.parsed and use that as ur qwen0.5bcoder prompt, which performs better than naturaln lang prompt on chat based instruct model. this is especially true for low entropy task

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

not engineer

@wjhtomwjh

a year ago

hyperfitting is very real and useful for structured data extraction

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

not engineer

@wjhtomwjh

a year ago

small models need hyperfit for reasoning, some paper will talk about it in a few months, just some observations in the trenches

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

not engineer

@wjhtomwjh

a year ago

curriculum learning will be popular again, all post training is better curriculum learning

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

not engineer

@wjhtomwjh

a year ago

wizard of oz start to fall, long put to hedge before too late

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

not engineer

@wjhtomwjh

a year ago

one simple reward based on offline eval metric, no fancy formatting, grpo just works on 0.5B for structured data extraction, key is to hyperfit on small high quality sample first

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

not engineer

@wjhtomwjh

a year ago

teach ur small lm like an infant and it will be smarter than adults

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

not engineer

@wjhtomwjh

a year ago

u dont need format reward for grpo as long as u have a reasonably finetunes 0.5b model to start with, u would use constrained generation anyway at inference time.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

not engineer

@wjhtomwjh

a year ago

4o insge is pretty good at philosophy - RL interpretation of strong church turing thesis <> Formal causes

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare