Tesfay Zemuy Gebrekidan (@tesfayzemuy) Twitter Tweets • TwiCopy

Tesfay Zemuy Gebrekidan

@tesfayzemuy

5 years ago

This two-legged robot taught itself how to walk blogs.mathworks.com/headlines/2021…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Stat.ML Papers

@statmlpapers

2 years ago

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization ift.tt/FJiwevm

thumb_up_off_alt221

chat_bubble_outline4

repeat73

shareShare

Very excited to attend The AAMAS Conference this week! The Citizen-Centric Artificial Intelligence Systems team is presenting 2 full papers, 1 Blue Sky Ideas paper, 2 extended abstracts and 1 demo. More details below 👇

thumb_up_off_alt7

chat_bubble_outline2

repeat3

shareShare

Sebastian Stein

@profsebstein

2 years ago

Presenting Tesfay Gebrekidan's (Tesfay Zemuy Gebrekidan) work on Deep Reinforcement Learning with Coalition Action Selection in Crystal Room 2 at The AAMAS Conference now (10:30-12:30 session). Hope to see you there!

Presenting Tesfay Gebrekidan's (<a href="/TesfayZemuy/">Tesfay Zemuy Gebrekidan</a>) work on Deep Reinforcement Learning with Coalition Action Selection in Crystal Room 2 at <a href="/AAMASconf/">The AAMAS Conference</a> now (10:30-12:30 session).

Hope to see you there!

thumb_up_off_alt14

chat_bubble_outline1

repeat4

shareShare

Tesfay Zemuy Gebrekidan

@tesfayzemuy

2 years ago

My PhD thesis titled "Deep Reinforcement Learning for Online Combinatorial Resource Allocation with Arbitrary State and Action Spaces" is now available publicly. See the highlight on LinkedIn. linkedin.com/posts/tesfayz_…

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Tom Yeh

@proftomyeh

2 years ago

[VAE] by Hand ✍️ A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure. In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the

thumb_up_off_alt931

chat_bubble_outline10

repeat172

shareShare

Sebastian Raschka

@rasbt

2 years ago

Direct Preference Optimization (DPO) has become one of the go-to methods to align large language models (LLMs) more closely with user preferences. If you want to learn how it works, I coded it from scratch: github.com/rasbt/LLMs-fro…

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat312

shareShare

Tesfay Zemuy Gebrekidan

@tesfayzemuy

2 years ago

Understanding full-RL requires understanding the difference between multiarmed bandit -- which has a single observation and multiple actions, contextual bandit (semi-RL) -- which has multiple observations, and the full-RL with delayed reward. #RLHF is not fully #RL.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Tesfay Zemuy Gebrekidan

@tesfayzemuy

a year ago

AI is going to be the rational reviewer. At least it replaces the role of reviewers with minimal involvement of chairs. I am astonished with how cool ai and ml papers has summarized our work as linked here. aimodels.fyi/papers/arxiv/c… #AI #ML

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare