Harsh Bhatt (@harshbhatt7585) Twitter Tweets • TwiCopy

Harsh Bhatt

@harshbhatt7585

+ Follow

19 | training neural networks since 16.
prev @ remyx.ai secta.ai aragon.ai voice.ai
alumni@ tks.world launchx.com

ID: 1578325960272662528

linkhttps://youtube.com/channel/UCiD7kslR7lKSaPGSQ-heOWg calendar_today07-10-2022 10:07:19

943 Tweet

474 Takipçi

565 Takip Edilen

Harsh Bhatt

@harshbhatt7585

2 months ago

The main problem with using dropout in reinforcement learning is that the policy network is used twice. Once when collecting rollouts and again during the gradient update. Each time a new dropout mask is randomly generated, meaning the network behaves slightly differently even

thumb_up_off_alt16

chat_bubble_outline1

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

2 months ago

Culprit of AI film festival Suhas Sumukh

Culprit of AI film festival <a href="/suhasasumukh/">Suhas Sumukh</a>

thumb_up_off_alt33

chat_bubble_outline3

repeat1

shareShare

Harsh Bhatt

@harshbhatt7585

2 months ago

GRPO vs PPO in the same env.

thumb_up_off_alt16

chat_bubble_outline3

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

2 months ago

anyone in delhi?

thumb_up_off_alt6

chat_bubble_outline3

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

Delhi.

thumb_up_off_alt8

chat_bubble_outline2

repeat1

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

GRPO merged! find PR here: github.com/Metta-AI/metta…

thumb_up_off_alt14

chat_bubble_outline0

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

I am live, join from here: youtube.com/live/p6oqsrAZG…

thumb_up_off_alt3

chat_bubble_outline2

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

Revisiting this paper.

thumb_up_off_alt8

chat_bubble_outline2

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

Monte-Carlo learning is purely trial and error learning.

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

Our brain is off-policy learner with on-policy corrections. 1. When we go to sleep we replays experiences during sleep or rest (like the “replay buffer” in RL). That’s an off-policy mechanism. 2. Learning from observation: We can learn by watching others (imitation learning).

thumb_up_off_alt7

chat_bubble_outline2

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

built a UI for live training of an agent in gridworld env with Q-learning, you can see how agent is training and how Q score of a state is being updated. During the training you can tweak parameters like learning rate, exploration (epsilon), discount factor and see how training

thumb_up_off_alt15

chat_bubble_outline4

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

Wrote a simple article on Q-learning Algorithm. I will also create a video on it :) plume-robin-b8f.notion.site/Q-learning-29b…

thumb_up_off_alt10

chat_bubble_outline1

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

RL Madness x.com/i/broadcasts/1…

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Harsh Bhatt

@harshbhatt7585

a month ago

Congratulations to Women’s Team India for winning the Cricket World Cup. I really enjoyed watching the match. This cup will inspire the whole next generation not just girls but also boys to follow the passion irrespective of any circumstances. It is a great cricket memory to

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare