AI &amp; Medicine

전체 글

[북마크] Randomized Ensembled Double Q-Learning: Learning Fast Without a Model (Xinyue Chen, ICLR 2021) 2021.05.24
[요약] Towards Content Provider Aware Recommender Systems (Rouhan Zhan, WWW 2021) 2021.05.09
[북마크] Healthcare’s AI Future: A Conversation with Fei-Fei Li & Andrew Ng (May, 2021) 2021.05.09
Let's Do Inverse RL 프로젝트 2021.03.08 1
[북마크] Learning to be Safe: Deep RL with a Safety Critic (Krishnan Srinivasan, arXiv 2020) 2021.03.08

[북마크] Randomized Ensembled Double Q-Learning: Learning Fast Without a Model (Xinyue Chen, ICLR 2021)

2021. 5. 24. 16:40

Author : Xinyue Chen, Che Wang, Zijian Zhou, Keith Ross
Paper Link : https://arxiv.org/abs/2101.05982

OpenReview : https://openreview.net/forum?id=AY8zfZm0tDd

Code: https://github.com/watchernyu/REDQ

SOTA in Model-free RL

참고자료

https://www.microsoft.com/en-us/research/blog/three-mysteries-in-deep-learning-ensemble-knowledge-distillation-and-self-distillation/?OCID=msr_blog_ensemble_tw&fbclid=IwAR16837BMbhV0f565yolrGn7vJCGrZxCN6ZTH0TXfUSJin3xkhM5bI4tDJI

3 deep learning mysteries: Ensemble, knowledge- and self-distillation

Microsoft and CMU researchers begin to unravel 3 mysteries in deep learning related to ensemble, knowledge distillation & self-distillation. Discover how their work leads to the first theoretical proof with empirical evidence for ensemble in deep learning.

www.microsoft.com

[요약] Towards Content Provider Aware Recommender Systems (Rouhan Zhan, WWW 2021)

2021. 5. 9. 22:36

Author : Ruohan Zhan, Konstantina Christakopoulou, Ya Le, Jayden Ooi, Martin Mladenov, Alex Beutel, Craig Boutilier, Ed H. Chi, Minmin Chen
Paper Link : https://dl.acm.org/doi/10.1145/3442381.3449889

Talk : https://youtu.be/QpHR22q99Bg

Google의 지난 RL기반 추천 알고리즘 REINFORCE Recommender System (포스팅) 의 후속 논문

지금까지 user의 입장만을 고려한 Recommender 알고리즘들과 달리 content를 생성하는 user또한 고려를 하는 전체 플랫폼의 stakeholder를 모두 고려한 recommender system 알고리즘

User의 선호도 뿐만아니라 content provider의 활동을 활성화하는 RL agent ('EcoAgent') 를 학습
Google의 recsys gym환경인 RecSim (https://github.com/google-research/recsim) 에서 검증
소수의 content provider들만 주목받아 대다수의 provider들이 동기를 잃는 ‘superstar economy' 현상을 방지하고자 하는 recommendation 알고리즘

개인적인 생각

Google의 YouTube, Apple의 App store와 같이 컨텐츠의 생산 역시 주요 사용자들이 담당하는 생태계적 플랫폼에 적용가능할것 같음

'AI & RL > Recommender System' 카테고리의 다른 글

[정리] Know Your Action Set: Learning Action Relations for Reinforcement Learning (Ayush Jain, ICLR 2022) (2)	2022.04.26
[요약] User Response Models to Improve a REINFORCE RecommenderSystem (Minmin Chen, WSDM 2021) (0)	2021.10.02
[북마크] Top-K Off-Policy Correction for a REINFORCE Recommender System (Minmin Chen, WSDM 2019) (0)	2021.10.02
[북마크] Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation (RecSys 2021) (0)	2021.09.25
[참고자료] Reinforcement Learning for Recommender Systerms (0)	2021.02.07

[북마크] Healthcare’s AI Future: A Conversation with Fei-Fei Li & Andrew Ng (May, 2021)

2021. 5. 9. 22:32

엔드류 응 교수님과 페이페이 리 교수님의 대담.

공학자 관점에서 바라보는 Healthcare AI에 대한 토론이라, Medical 및 Healthcare에 관심있는 AI 연구자들에게 많은 도움이 될것같다.

'MEDICAL & HEALTHCARE AI' 카테고리의 다른 글

[북마크] G-BERT: Pre-training of Graph Augmented Transformers for Medication Recommendation (Junyuan Shang, IJCAI 2019) (0)	2022.01.17
[요약] Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction (Yikuan Li, npj Digital Medicine 2021) (0)	2022.01.17
[요약] BEHRT: Transformer for Electronic Health Records (Yikuan Li, Scientific Reports 2020) (0)	2022.01.16
[북마크] Transfer Learning in Electronic Health Records through Clinical Concept Embedding (Jose Roberto Ayala Solares, ArXiv 2021) (0)	2022.01.16
[북마크] MedGPT: Medical Concept Prediction from Clinical Narratives (Zeljko Kraljevic, ArXiv 2021) (0)	2022.01.16

Let's Do Inverse RL 프로젝트

2021. 3. 8. 16:22

Inverse Reinforcement Learning

강화학습에서 reward는 굉장히 중요하다. Policy를 학습하는데 있어서, intrinsic이든 extrinsic이든 Agent가 거의 대부분의 정보를 얻는 경로이기 때문이다.

일반적으로 강화학습에서는 사람이 reward를 일일히 정해주지만, 실제로 그 reward에 따라 “desirable” action이 나오지 않을 수도 있다. 또한 이렇게 직접 reward를 정해주는 "reward shaping" 과정은 매우 어려우며, 도메인 지식을 많이 필요로 하면서 손을 많이 타는 과정이다. 특히 원하는 task가 복잡할수록 reward function을 명시적으로 정한다는것이 어렵거나 사실상 불가능해진다.

이런 어려움에서 나온 개념이 Inverse Reinforcement Learning (IRL)이다. IRL은 전문가 혹은 시연자의 optimal 및 suboptimal behavior에서 reward를 거꾸로 추론하거나 크게는 이렇게 추론한 reward기반의 policy를 학습하는것을 말한다. 행동심리학적 관점에서는, 관측한 사람들의 행동에서 사람들이 어떤것을 원하는지를 찾아내고자 하는 알고리즘이라고도 볼 수 있다.

파블로 피카소는 "Good artist copy, great artist steal."라고 말했다. 강화학습 관점에서보면 어떤 행동을 그대로 따라하는것보다 그 행동의 내적 의도를 이해하고 그걸 능가하는 policy를 학습하려는 IRL의 목적과 어느정도 통하는게 있는 말이다.

이런 재밌는 IRL을 한번 같이 공부하고 구현해보자! 하는 의도에서 Reinforcement Learning Korea에서 몇몇 분들과 함께 2018년 10월부터 2019년 2월까지 관련 논문 6개를 읽고 구현을 해보는 단기 사이드 프로젝트를 진행했다.

블로그: reinforcement-learning-kr.github.io/2019/01/22/0_lets-do-irl-guide/

Let's do Inverse RL Guide

RLKorea 블로그

reinforcement-learning-kr.github.io

Github: github.com/reinforcement-learning-kr/lets-do-irl

발표:

[북마크] Learning to be Safe: Deep RL with a Safety Critic (Krishnan Srinivasan, arXiv 2020)

2021. 3. 8. 15:45

Author : Krishnan Srinivasan, Benjamin Eysenbach, Sehoon Ha, Jie Tan, Chelsea Finn
Paper Link : arxiv.org/abs/2010.14603

'AI & RL > Real-world (Safe) RL' 카테고리의 다른 글

[북마크] Safe Reinforcement Learning for Legged Locomotion (Tsung-Yen Yang, ArXiv 2022) (0)	2022.05.10
[요약] Learning robust perceptive locomotion for quadrupedal robots in the wild (Takahiro Miki, Science Robotics 2022) (0)	2022.02.10
[요약] RMA: Rapid Motor Adaptation for Legged Robot (Ashish Kumar, RSS 2021) (0)	2021.07.14
[북마크] Conservative Safety Critics for Exploration (Homanga Bharadhwaj, ICLR 2021) (0)	2020.10.29
[정리] Learning to Walk in the Real World with Minimal Human Effort (Sehoon Ha, 2020) (0)	2020.06.06

PREV 1 ···15 16 17 18 19 20 21 ···23 NEXT

AI & Medicine