not engineer (@wjhtomwjh) 's Twitter Profile
not engineer

@wjhtomwjh

engineer

ID: 1669202882

calendar_today14-08-2013 01:38:25

1,1K Tweet

165 Followers

617 Following

not engineer (@wjhtomwjh) 's Twitter Profile Photo

u can just copy paste python code calling o1 structured outputs api and print response.choices[0].message.parsed and use that as ur qwen0.5bcoder prompt, which performs better than naturaln lang prompt on chat based instruct model. this is especially true for low entropy task

not engineer (@wjhtomwjh) 's Twitter Profile Photo

small models need hyperfit for reasoning, some paper will talk about it in a few months, just some observations in the trenches

not engineer (@wjhtomwjh) 's Twitter Profile Photo

one simple reward based on offline eval metric, no fancy formatting, grpo just works on 0.5B for structured data extraction, key is to hyperfit on small high quality sample first

not engineer (@wjhtomwjh) 's Twitter Profile Photo

u dont need format reward for grpo as long as u have a reasonably finetunes 0.5b model to start with, u would use constrained generation anyway at inference time.