https://www.youtube.com/watch?v=lL-nq8zhi18
- Reward engineering 문제와 reward exploitation문제를 해결하고자 human perference가 반영된 reward를 sequential query로부터 학습
- Human-in-the-loop RL
- 관련 papers:
- PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training (ICML 2021)
- Paper link: https://arxiv.org/abs/2106.05091
- Site: https://sites.google.com/view/icml21pebble
- Code: https://github.com/pokaxpoka/B_Pref - B-Pref: Benchmark for Preference-based RL (NeurIPS 2021, Track)
- Openreview link: https://openreview.net/forum?id=ps95-mkHF_ d
- Code: https://github.com/pokaxpoka/B_Pref
'AI & RL > Human-in-the-Loop RL' 카테고리의 다른 글
[북마크] Recursively Summarizing Books with Human Feedback (Jeff Wu, ArXiv 2021) (0) | 2021.09.25 |
---|