Author : Minmin Chen, Bo Chang, Can Xu, Ed H. Chi
Paper Link : https://dl.acm.org/doi/10.1145/3437963.3441764

 

User Response Models to Improve a REINFORCE Recommender System | Proceedings of the 14th ACM International Conference on Web Sea

ABSTRACT Reinforcement Learning (RL) techniques have been sought after as the next-generation tools to further advance the field of recommendation research. Different from classic applications of RL, recommender agents, especially those deployed on commerc

dl.acm.org

 

 

  • Google의 지난 RL기반 추천 알고리즘 REINFORCE Recommender System (포스팅) 의 후속 논문
  • RL을 Recsys에 쓸 경우, RL이 다뤄온 일반적인 문제들에 비해 state와 action dimension이 굉장이 큰 반면 reward signal은 매우 드물에 할당되는 sample afficiency문제가 있음
  • 이를 위해 auxiliary task로서 User response modeling을 하여 learning efficiency를 올림

 

  • 실제 live service (언급은 없지만 전 논문과 `billions of users`를 보면 아마도 Youtube?) 에서 A/B테스트 수행
  • 한 달간의 실험 결과 기존 baseline RL 알고리즘 대비 0.12% 성능이 증가 (비 활동적인 유저에선 0.26%) 한것을 확인

 

  • 하지만 학습 window를 늘리면 오히려 성능이 떨어지는데서 유저들의 contents preference가 빠르게 변하는것이라 추측

 

 

개인적인 생각

  • 마지막 추측은 추천시스템에서의 Sequential recommendation의 중요성이라 볼 수 있을것 같다.

Author : Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, Ed H. Chi
Paper Link :
[WSDM 2019 version] https://dl.acm.org/doi/10.1145/3289600.3290999

[ArXiv 2020 version] https://arxiv.org/abs/1812.02353

Related talk : https://youtu.be/Ys3YY7sSmIA

Talk : https://youtu.be/HEqQ2_1XRTs

 

  • Google의 RL을 사용한 YouTube Recommender System

Author : Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nissan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano
Paper Link : https://arxiv.org/abs/2109.10862v1d
Blog : https://openai.com/blog/summarizing-books/

 

Summarizing Books with Human Feedback

Scaling human oversight of AI systems for tasks that are difficult to evaluate.

openai.com

  • RL을 사용하여 Human feedback으로부터 preference를 반영한(=Reward modeling) summarizing방법(=policy)을 학습
    -> Supervised Learning보다 더 plausible
  • 책 전체를 한번에 요약하기보다 책을 나누어 요약을하고 이를 다시 recursively 요약하는 공통의 RL policy를 학습
    -> Scalability
  • GPT-3의 Human-in-the-loop RL 기반의 fine tunning

https://www.youtube.com/watch?v=lL-nq8zhi18 

  • Reward engineering 문제와 reward exploitation문제를 해결하고자 human perference가 반영된 reward를 sequential query로부터 학습
  • Human-in-the-loop RL

 

  • 관련 papers:
  1. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training (ICML 2021)
    - Paper link: https://arxiv.org/abs/2106.05091
    - Site: https://sites.google.com/view/icml21pebble
    - Code: https://github.com/pokaxpoka/B_Pref
  2. B-Pref: Benchmark for Preference-based RL (NeurIPS 2021, Track)
    - Openreview link: https://openreview.net/forum?id=ps95-mkHF_ d
    - Code: https://github.com/pokaxpoka/B_Pref

Author : Gabriel de Souza Pereira Moreira, Sara Rabhi, Jeong Min Lee, Ronay Ak, Even Oldridge
Paper Link : https://dl.acm.org/doi/abs/10.1145/3460231.3474255

Code: https://github.com/NVIDIA-Merlin/Transformers4Rec

Blog : https://medium.com/nvidia-merlin/transformers4rec-4523cc7d8fa8

 

Transformers4Rec: A flexible library for Sequential and Session-based recommendation

Sequential recommendation algorithms are able to capture sequential patterns in users browsing might help to anticipate the next user…

medium.com

+ Recent posts