AI &amp; Medicine

전체 글

[요약] User Response Models to Improve a REINFORCE RecommenderSystem (Minmin Chen, WSDM 2021) 2021.10.02
[북마크] Top-K Off-Policy Correction for a REINFORCE Recommender System (Minmin Chen, WSDM 2019) 2021.10.02
[북마크] Recursively Summarizing Books with Human Feedback (Jeff Wu, ArXiv 2021) 2021.09.25
[강연] Toward a Tractable Solution for Human-in-the-loop Reinforcement Learning: Algorithm and Benchmark (Kimin Lee, 서울대 AI여름학교 2021) 2021.09.25
[북마크] Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation (RecSys 2021) 2021.09.25

[요약] User Response Models to Improve a REINFORCE RecommenderSystem (Minmin Chen, WSDM 2021)

2021. 10. 2. 16:23

Author : Minmin Chen, Bo Chang, Can Xu, Ed H. Chi
Paper Link : https://dl.acm.org/doi/10.1145/3437963.3441764

User Response Models to Improve a REINFORCE Recommender System | Proceedings of the 14th ACM International Conference on Web Sea

ABSTRACT Reinforcement Learning (RL) techniques have been sought after as the next-generation tools to further advance the field of recommendation research. Different from classic applications of RL, recommender agents, especially those deployed on commerc

dl.acm.org

Google의 지난 RL기반 추천 알고리즘 REINFORCE Recommender System (포스팅) 의 후속 논문
RL을 Recsys에 쓸 경우, RL이 다뤄온 일반적인 문제들에 비해 state와 action dimension이 굉장이 큰 반면 reward signal은 매우 드물에 할당되는 sample afficiency문제가 있음
이를 위해 auxiliary task로서 User response modeling을 하여 learning efficiency를 올림

실제 live service (언급은 없지만 전 논문과 `billions of users`를 보면 아마도 Youtube?) 에서 A/B테스트 수행
한 달간의 실험 결과 기존 baseline RL 알고리즘 대비 0.12% 성능이 증가 (비 활동적인 유저에선 0.26%) 한것을 확인

하지만 학습 window를 늘리면 오히려 성능이 떨어지는데서 유저들의 contents preference가 빠르게 변하는것이라 추측

개인적인 생각

마지막 추측은 추천시스템에서의 Sequential recommendation의 중요성이라 볼 수 있을것 같다.

'AI & RL > Recommender System' 카테고리의 다른 글

[정리] Know Your Action Set: Learning Action Relations for Reinforcement Learning (Ayush Jain, ICLR 2022) (2)	2022.04.26
[북마크] Top-K Off-Policy Correction for a REINFORCE Recommender System (Minmin Chen, WSDM 2019) (0)	2021.10.02
[북마크] Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation (RecSys 2021) (0)	2021.09.25
[요약] Towards Content Provider Aware Recommender Systems (Rouhan Zhan, WWW 2021) (0)	2021.05.09
[참고자료] Reinforcement Learning for Recommender Systerms (0)	2021.02.07

[북마크] Top-K Off-Policy Correction for a REINFORCE Recommender System (Minmin Chen, WSDM 2019)

2021. 10. 2. 15:24

Author : Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, Ed H. Chi
Paper Link :
[WSDM 2019 version] https://dl.acm.org/doi/10.1145/3289600.3290999

[ArXiv 2020 version] https://arxiv.org/abs/1812.02353

Related talk : https://youtu.be/Ys3YY7sSmIA

Talk : https://youtu.be/HEqQ2_1XRTs

Google의 RL을 사용한 YouTube Recommender System

'AI & RL > Recommender System' 카테고리의 다른 글

[정리] Know Your Action Set: Learning Action Relations for Reinforcement Learning (Ayush Jain, ICLR 2022) (2)	2022.04.26
[요약] User Response Models to Improve a REINFORCE RecommenderSystem (Minmin Chen, WSDM 2021) (0)	2021.10.02
[북마크] Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation (RecSys 2021) (0)	2021.09.25
[요약] Towards Content Provider Aware Recommender Systems (Rouhan Zhan, WWW 2021) (0)	2021.05.09
[참고자료] Reinforcement Learning for Recommender Systerms (0)	2021.02.07

[북마크] Recursively Summarizing Books with Human Feedback (Jeff Wu, ArXiv 2021)

2021. 9. 25. 20:29

Author : Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nissan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano
Paper Link : https://arxiv.org/abs/2109.10862v1d
Blog : https://openai.com/blog/summarizing-books/

Summarizing Books with Human Feedback

Scaling human oversight of AI systems for tasks that are difficult to evaluate.

openai.com

RL을 사용하여 Human feedback으로부터 preference를 반영한(=Reward modeling) summarizing방법(=policy)을 학습
-> Supervised Learning보다 더 plausible
책 전체를 한번에 요약하기보다 책을 나누어 요약을하고 이를 다시 recursively 요약하는 공통의 RL policy를 학습
-> Scalability
GPT-3의 Human-in-the-loop RL 기반의 fine tunning

'AI & RL > Human-in-the-Loop RL' 카테고리의 다른 글

[강연] Toward a Tractable Solution for Human-in-the-loop Reinforcement Learning: Algorithm and Benchmark (Kimin Lee, 서울대 AI여름학교 2021) (0)	2021.09.25

[강연] Toward a Tractable Solution for Human-in-the-loop Reinforcement Learning: Algorithm and Benchmark (Kimin Lee, 서울대 AI여름학교 2021)

2021. 9. 25. 20:27

https://www.youtube.com/watch?v=lL-nq8zhi18

Reward engineering 문제와 reward exploitation문제를 해결하고자 human perference가 반영된 reward를 sequential query로부터 학습
Human-in-the-loop RL

관련 papers:

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training (ICML 2021)
- Paper link: https://arxiv.org/abs/2106.05091
- Site: https://sites.google.com/view/icml21pebble
- Code: https://github.com/pokaxpoka/B_Pref
B-Pref: Benchmark for Preference-based RL (NeurIPS 2021, Track)
- Openreview link: https://openreview.net/forum?id=ps95-mkHF_ d
- Code: https://github.com/pokaxpoka/B_Pref

'AI & RL > Human-in-the-Loop RL' 카테고리의 다른 글

[북마크] Recursively Summarizing Books with Human Feedback (Jeff Wu, ArXiv 2021) (0)	2021.09.25

[북마크] Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation (RecSys 2021)

2021. 9. 25. 12:37

Author : Gabriel de Souza Pereira Moreira, Sara Rabhi, Jeong Min Lee, Ronay Ak, Even Oldridge
Paper Link : https://dl.acm.org/doi/abs/10.1145/3460231.3474255

Code: https://github.com/NVIDIA-Merlin/Transformers4Rec

Blog : https://medium.com/nvidia-merlin/transformers4rec-4523cc7d8fa8

Transformers4Rec: A flexible library for Sequential and Session-based recommendation

Sequential recommendation algorithms are able to capture sequential patterns in users browsing might help to anticipate the next user…

medium.com

잘 정립된 Transformer라이브러리의 Sequential Recommender System에의 활용
참고논문: BERT4Rec (ACM CKIM19) https://arxiv.org/abs/1904.06690

'AI & RL > Recommender System' 카테고리의 다른 글

[정리] Know Your Action Set: Learning Action Relations for Reinforcement Learning (Ayush Jain, ICLR 2022) (2)	2022.04.26
[요약] User Response Models to Improve a REINFORCE RecommenderSystem (Minmin Chen, WSDM 2021) (0)	2021.10.02
[북마크] Top-K Off-Policy Correction for a REINFORCE Recommender System (Minmin Chen, WSDM 2019) (0)	2021.10.02
[요약] Towards Content Provider Aware Recommender Systems (Rouhan Zhan, WWW 2021) (0)	2021.05.09
[참고자료] Reinforcement Learning for Recommender Systerms (0)	2021.02.07

PREV 1 ···11 12 13 14 15 16 17 ···23 NEXT

AI & Medicine

전체 글

[요약] User Response Models to Improve a REINFORCE RecommenderSystem (Minmin Chen, WSDM 2021)

개인적인 생각

'AI & RL > Recommender System' 카테고리의 다른 글

[북마크] Top-K Off-Policy Correction for a REINFORCE Recommender System (Minmin Chen, WSDM 2019)

'AI & RL > Recommender System' 카테고리의 다른 글

[북마크] Recursively Summarizing Books with Human Feedback (Jeff Wu, ArXiv 2021)

'AI & RL > Human-in-the-Loop RL' 카테고리의 다른 글

[강연] Toward a Tractable Solution for Human-in-the-loop Reinforcement Learning: Algorithm and Benchmark (Kimin Lee, 서울대 AI여름학교 2021)

'AI & RL > Human-in-the-Loop RL' 카테고리의 다른 글

[북마크] Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation (RecSys 2021)

'AI & RL > Recommender System' 카테고리의 다른 글

+ Recent posts

티스토리툴바