AvivTamarLab (@avivtamarlab) Twitter Tweets • TwiCopy

Aviv Tamar

3 years ago

On Mar 21 we forgot to shut down a machine running a simple sanity check experiment. A couple days later, we were surprised to see beautiful results, which we couldn’t quite explain! 👇

thumb_up_off_alt406

chat_bubble_outline5

repeat67

shareShare

Aviv Tamar

@avivtamar1

3 years ago

Enough games. The RL field needs to mature. New blog post with shiemannor avivtamar.substack.com/p/deployablerl…

thumb_up_off_alt380

chat_bubble_outline16

repeat69

shareShare

DOTE won BEST PAPER at #nsdi23 !!! DOTE trains a deep neural network that directly outputs traffic engineering configurations. This works great for traffic that is difficult to predict accurately, e.g., MSFT's costumer facing WAN Yarin Perry will present Wed 14:40EDT 👇

thumb_up_off_alt38

chat_bubble_outline2

repeat7

shareShare

Aviv Tamar

@avivtamar1

2 years ago

Working on robotic bin picking? #ICRA2023 work led by Osaro’s research team shows how to improve throughput at deployment time Idea: optimize sequence of tools changes based on pretrained grasp success maps = better throughput for free! arxiv.org/abs/2302.07940 Poster Wed 9am

thumb_up_off_alt20

chat_bubble_outline0

repeat4

shareShare

Aviv Tamar

@avivtamar1

2 years ago

Meta-RL is all about inferring the task from a history of observations. But how to best learn a history embedding? In ContraBAR ( #ICML2023 w\ Era Choshen ) we investigate a contrastive learning approach. Paper: arxiv.org/pdf/2306.02418… Code: github.com/ec2604/ContraB… 👇

thumb_up_off_alt42

chat_bubble_outline2

repeat9

shareShare

Aviv Tamar

@avivtamar1

2 years ago

Check out these beautiful videos 🤩 Deep *dynamic* latent particles is a new object-based video prediction method, led by Tal Daniel Key idea: Latent variables = particles, making it easier to learn latent dynamics 👇

thumb_up_off_alt33

chat_bubble_outline2

repeat7

shareShare

Aviv Tamar

@avivtamar1

2 years ago

Teacher-student algos are great for learning w/ partial observability: teacher trained with full info -> student imitates it. But what if full-info policy is very different from partial-info? TGRL cleverly balances imitation with RL, leading to a very practical method #ICML2023

thumb_up_off_alt21

chat_bubble_outline0

repeat3

shareShare

Orr Krupnik

@orrkrup

2 years ago

What do you do when your robot world model just doesn't cut it? Fine-tune it, of course! New paper in #CoRL2023 next week, "Fine-Tuning Generative Models as an Inference Method for Robotic Tasks" 1/ >>> orrkrup.com/mace

thumb_up_off_alt31

chat_bubble_outline1

repeat5

shareShare

Aviv Tamar

@avivtamar1

2 years ago

We recently had a bit of a breakthrough in generalization in RL, led by Ev Zisselman TL;DR: learning MaxEnt exploration generalizes better than maximizing reward. We use this to set a new SOTA for ProcGen + significantly improve on hard games like Heist! #NeurIPS2023 Details👇

thumb_up_off_alt132

chat_bubble_outline3

repeat15

shareShare

Zohar Rimon

@zoharrimon

2 years ago

Our work "MAMBA: An Effective World Model Approach for Meta Reinforcement Learning" got accepted to ICLR 2026! It was super fun working on this one with tom jurgenson, Orr Krupnik, Gilad Adler, and Aviv Tamar Paper: arxiv.org/abs/2403.09859 Code: github.com/zoharri/mamba 🧵 [1/9]

thumb_up_off_alt62

chat_bubble_outline2

repeat13

shareShare

Aviv Tamar

@avivtamar1

2 years ago

Generalization in RL is hard. Compositional generalization is even harder… We made some progress in our #ICLR2024 spotlight w/ Dan Haramati and Tal Daniel RL trains a robotic manipulation policy that generalizes to different numbers of objects Code+paper: sites.google.com/view/entity-ce…

thumb_up_off_alt76

chat_bubble_outline1

repeat15

shareShare

Mirco Mutti

@mirco_mutti

a year ago

When does meta training truly benefits RL efficiency? In our #ICML2024 paper, Aviv Tamar and me analyse the conditions under which fast regret rates can be achieved at test time arxiv.org/abs/2406.02282 1/5

thumb_up_off_alt33

chat_bubble_outline2

repeat8

shareShare

Aviv Tamar

@avivtamar1

a year ago

This project completely reshaped my view on tree search + neural networks arxiv.org/abs/2406.02103 Using a NN for value/policy in MCTS is standard, but if the network errs, search performance goes down. We asked: if we have uncertainty estimates, can we exploit them?

thumb_up_off_alt157

chat_bubble_outline3

repeat29

shareShare

Aviv Tamar

@avivtamar1

a year ago

Want to learn / teach RL? Check out new book draft: Reinforcement Learning - Foundations sites.google.com/view/rlfoundat… W/ shiemannor and Yishay Mansour This is a rigorous first course in RL, based on our teaching at TAU CS and Technion ECE.

Want to learn / teach RL?
Check out new book draft:
Reinforcement Learning - Foundations
sites.google.com/view/rlfoundat…
W/ <a href="/shiemannor/">shiemannor</a> and <a href="/YishayMansour/">Yishay Mansour</a>
This is a rigorous first course in RL, based on our teaching at TAU CS and Technion ECE.

thumb_up_off_alt633

chat_bubble_outline8

repeat107

shareShare

Aviv Tamar

@avivtamar1

3 months ago

Robots wil eventually be able to explore/adapt, but how can we trust strategies that are hard to interpret? We take the first step in *interpretable* exploration, and find a tree-like exploration rule that is both efficient (low regret) and interpretable (shallow tree) #ICML25

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare