Tesfay Zemuy Gebrekidan (@tesfayzemuy) 's Twitter Profile
Tesfay Zemuy Gebrekidan

@tesfayzemuy

ID: 751064294981238784

calendar_today07-07-2016 14:44:16

18 Tweet

71 Followers

62 Following

Sebastian Stein (@profsebstein) 's Twitter Profile Photo

Presenting Tesfay Gebrekidan's (Tesfay Zemuy Gebrekidan) work on Deep Reinforcement Learning with Coalition Action Selection in Crystal Room 2 at The AAMAS Conference now (10:30-12:30 session). Hope to see you there!

Presenting Tesfay Gebrekidan's (<a href="/TesfayZemuy/">Tesfay Zemuy Gebrekidan</a>) work on Deep Reinforcement Learning with Coalition Action Selection in Crystal Room 2 at <a href="/AAMASconf/">The AAMAS Conference</a> now (10:30-12:30 session).

Hope to see you there!
Tesfay Zemuy Gebrekidan (@tesfayzemuy) 's Twitter Profile Photo

My PhD thesis titled "Deep Reinforcement Learning for Online Combinatorial Resource Allocation with Arbitrary State and Action Spaces" is now available publicly. See the highlight on LinkedIn. linkedin.com/posts/tesfayz_…

Tom Yeh (@proftomyeh) 's Twitter Profile Photo

[VAE] by Hand ✍️ A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure. In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the

Sebastian Raschka (@rasbt) 's Twitter Profile Photo

Direct Preference Optimization (DPO) has become one of the go-to methods to align large language models (LLMs) more closely with user preferences. If you want to learn how it works, I coded it from scratch: github.com/rasbt/LLMs-fro…

Direct Preference Optimization (DPO) has become one of the go-to methods to align large language models (LLMs) more closely with user preferences. 
If you want to learn how it works, I coded it from scratch: github.com/rasbt/LLMs-fro…
Tesfay Zemuy Gebrekidan (@tesfayzemuy) 's Twitter Profile Photo

Understanding full-RL requires understanding the difference between multiarmed bandit -- which has a single observation and multiple actions, contextual bandit (semi-RL) -- which has multiple observations, and the full-RL with delayed reward. #RLHF is not fully #RL.

Tesfay Zemuy Gebrekidan (@tesfayzemuy) 's Twitter Profile Photo

AI is going to be the rational reviewer. At least it replaces the role of reviewers with minimal involvement of chairs. I am astonished with how cool ai and ml papers has summarized our work as linked here. aimodels.fyi/papers/arxiv/c… #AI #ML

elvis (@omarsar0) 's Twitter Profile Photo

A Deep Dive into Reasoning LLMs This is a really nice summary of the progress made in post-training and reasoning LLMs. Highly recommend this one!

A Deep Dive into Reasoning LLMs

This is a really nice summary of the progress made in post-training and reasoning LLMs.

Highly recommend this one!