[북마크] Recursively Summarizing Books with Human Feedback (Jeff Wu, ArXiv 2021)

2021. 9. 25. 20:29

Author : Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nissan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano
Paper Link : https://arxiv.org/abs/2109.10862v1d
Blog : https://openai.com/blog/summarizing-books/

Summarizing Books with Human Feedback

Scaling human oversight of AI systems for tasks that are difficult to evaluate.

openai.com

RL을 사용하여 Human feedback으로부터 preference를 반영한(=Reward modeling) summarizing방법(=policy)을 학습
-> Supervised Learning보다 더 plausible
책 전체를 한번에 요약하기보다 책을 나누어 요약을하고 이를 다시 recursively 요약하는 공통의 RL policy를 학습
-> Scalability
GPT-3의 Human-in-the-loop RL 기반의 fine tunning

[강연] Toward a Tractable Solution for Human-in-the-loop Reinforcement Learning: Algorithm and Benchmark (Kimin Lee, 서울대 AI여름학교 2021) (0)	2021.09.25

AI & Medicine