Tatsunori Hashimoto(@tatsu_hashimoto) 's Twitter Profile Photo

We are releasing AlpacaFarm, a simulator enabling everyone to run and study the full RLHF pipeline at a fraction of the time (<24h) and cost (<$200) w/ LLM-simulated annotators. Starting w/ Alpaca, we show RLHF gives big 10+% winrate gains vs davinci003 (crfm.stanford.edu/2023/05/22/alp…)

We are releasing AlpacaFarm, a simulator enabling everyone to run and study the full RLHF pipeline at a fraction of the time (<24h) and cost (<$200) w/ LLM-simulated annotators. Starting w/ Alpaca, we show RLHF gives big 10+% winrate gains vs davinci003 (crfm.stanford.edu/2023/05/22/alp…)
account_circle
Simeng Sun(@simeng_ssun) 's Twitter Profile Photo

Happy to share our paper on aligning LLaMA 7B with LoRA-based RLHF!

LoRA uses 2 A100s compared to 8 for full model tuning, and yields higher win rate on AlpacaFarm with only 10h training ✅

More details below:

tinyurl.com/apclr

Happy to share our paper on aligning LLaMA 7B with LoRA-based RLHF!

LoRA uses 2 A100s compared to 8 for full model tuning, and yields higher win rate on AlpacaFarm with only 10h training ✅

More details below:

tinyurl.com/apclr
account_circle
Seungone Kim(@seungonekim) 's Twitter Profile Photo

🤔 How can you evaluate whether your LLM is humorous or not? Among various versions during development, how can you track whether your LLM is inspiring while being culturally sensitive?

Current evaluation resources (e.g., MMLU, Big Bench, AlpacaFarm) are confined to generic,

🤔 How can you evaluate whether your LLM is humorous or not? Among various versions during development, how can you track whether your LLM is inspiring while being culturally sensitive?

Current evaluation resources (e.g., MMLU, Big Bench, AlpacaFarm) are confined to generic,
account_circle
Xuechen Li(@lxuechen) 's Twitter Profile Photo

Belatedly, I finally had a chance to update the AlpacaFarm paper with DPO results.

TL;DR: DPO performs similarly to RLHF+PPO but is much more memory-friendly. Previously, PPO fine-tuning took ~2 hours on 8 A100 GPUs. Our DPO runs take about the same time on 4 GPUs. DPO with LoRA

Belatedly, I finally had a chance to update the AlpacaFarm paper with DPO results.

TL;DR: DPO performs similarly to RLHF+PPO but is much more memory-friendly. Previously, PPO fine-tuning took ~2 hours on 8 A100 GPUs. Our DPO runs take about the same time on 4 GPUs. DPO with LoRA
account_circle
Creekwater Alpaca Farm(@cwafarm) 's Twitter Profile Photo

Did you know we are open 7 days a week? In addition to our tours on the weekends, we do private tours during the week. That's right, a tour with just your group! The cost is $50 for the first 2 people, and $12 per person after that. farm
770-465-5181

Did you know we are open 7 days a week? In addition to our tours on the weekends, we do private tours during the week.  That's right, a tour with just your group!  The cost is $50 for the first 2 people, and $12 per person after that.  #tour #alpaca #alpacafarm
770-465-5181
account_circle