'AI & RL' 카테고리의 글 목록

AI & RL

[북마크] Revolutionizing Medical Data Sharing Using Advanced Privacy-Enhancing Technologies: Technical, Legal, and Ethical Synthesis (James Scheibner, JMIR 2020) 2023.10.14
[북마크] Towards Causal Foundation Model: on Duality between Causal Inference and Attention (Jiaqi Zhang, Arxiv 2023) 2023.10.14
[북마크] Passive learning of active causal strategies in agents and language models (Andrew Kyle Lampinen, NeurIPS2023) 2023.10.02
[북마크] (리뷰페이퍼) Understanding Causality with Large Language Models: Feasibility and Opportunities (Cheng Zhang, Arxiv 2023) 2023.10.02
[북마크] Causal Parrots: Large Language Models May Talk Causality But Are Not Causal (Matej Zečević, TMLR 2023) 2023.10.02
[북마크] Zero-shot causal learning (Hamed Nilforoshan, NeurIPS 2023) 2023.09.30
[북마크] Conformal Meta-learners for Predictive Inference of Individual Treatment Effects (Ahmed Alaa, NeurIPS 2023) 2023.09.25
[북마크] Secrets of RLHF in Large Language Models Part I: PPO (Rui Zheng, Arxiv 2023) 2023.09.11
Physical Grounding of LLM (SayTap, RoboCat, Language to Rewards for Robotic Skill Synthesis) 2023.06.22
[요약] Causal Transformer for Estimating Counterfactual Outcomes (Valentyn Melnychuk, ICML 2022) 2022.10.20
[참고자료] Transformer for tabular data 2022.07.25
[북마크] Formal Algorithms for Transformers (DeepMind, 2022) 2022.07.25
[북마크] Pure Transformers are Powerful Graph Learners (Jinwoo Kim, Arxiv 2022) 2022.07.12
[요약] Multi-Game Decision Transformers (Kuang-Huei Lee, arxiv 2022) 2022.06.07
[북마크] Planning with Diffusion for Flexible Behavior Synthesis (Michael Janner, ICML 2022) 2022.05.25
[북마크] Safe Reinforcement Learning for Legged Locomotion (Tsung-Yen Yang, ArXiv 2022) 2022.05.10
[정리] Know Your Action Set: Learning Action Relations for Reinforcement Learning (Ayush Jain, ICLR 2022) 2022.04.26 2
[북마크] Generalized Decision Transformer for Offline Hindsight Information Matching (Hiroki Furuta, ICLR 2022 Spotlight) 2022.04.26
[요약] Do Prompt-Based Models Really Understand the Meaning of their Prompts? (Albert Webson, NAACL 2022) 2022.04.26
[북마크] Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models (방효진 님, ArXiv 2022) 2022.04.03

[북마크] Revolutionizing Medical Data Sharing Using Advanced Privacy-Enhancing Technologies: Technical, Legal, and Ethical Synthesis (James Scheibner, JMIR 2020)

2023. 10. 14. 20:00

Author : James Scheibner, Jean Louis Raisaro, Juan Ramón Troncoso-Pastoriza, Marcello Ienca, Jacques Fellay, Effy Vayena, Jean-Pierre Hubaux

Paper Link: https://www.jmir.org/2021/2/e25120/

참고

Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy (Darphne Ippolito, Arxiv 2023)

[북마크] Towards Causal Foundation Model: on Duality between Causal Inference and Attention (Jiaqi Zhang, Arxiv 2023)

2023. 10. 14. 20:00

Author :(Microsoft Research) Jiaqi Zhang, Joel Jennings, Cheng Zhang, Chao Ma
Paper Link: https://arxiv.org/abs/2310.00809

'AI & RL > Causal Inference' 카테고리의 다른 글

[북마크] Passive learning of active causal strategies in agents and language models (Andrew Kyle Lampinen, NeurIPS2023) (0)	2023.10.02
[북마크] (리뷰페이퍼) Understanding Causality with Large Language Models: Feasibility and Opportunities (Cheng Zhang, Arxiv 2023) (0)	2023.10.02
[북마크] Causal Parrots: Large Language Models May Talk Causality But Are Not Causal (Matej Zečević, TMLR 2023) (0)	2023.10.02
[북마크] Zero-shot causal learning (Hamed Nilforoshan, NeurIPS 2023) (0)	2023.09.30
[북마크] Conformal Meta-learners for Predictive Inference of Individual Treatment Effects (Ahmed Alaa, NeurIPS 2023) (0)	2023.09.25

[북마크] Passive learning of active causal strategies in agents and language models (Andrew Kyle Lampinen, NeurIPS2023)

2023. 10. 2. 21:30

Author :(Google DeepMind) Andrew Kyle Lampinen, Stephanie C Y Chan, Ishita Dasgupta, Andrew J Nam, Jane X Wang
Paper Link: https://arxiv.org/abs/2305.16183
Talk1: https://www.youtube.com/watch?v=XkPv9bk4O3I (http://lxmls.it.pt/2023/slides/andrew.pdf)
Talk2: https://www.youtube.com/watch?v=3Go7yF5n62c

'AI & RL > Causal Inference' 카테고리의 다른 글

[북마크] Towards Causal Foundation Model: on Duality between Causal Inference and Attention (Jiaqi Zhang, Arxiv 2023) (0)	2023.10.14
[북마크] (리뷰페이퍼) Understanding Causality with Large Language Models: Feasibility and Opportunities (Cheng Zhang, Arxiv 2023) (0)	2023.10.02
[북마크] Causal Parrots: Large Language Models May Talk Causality But Are Not Causal (Matej Zečević, TMLR 2023) (0)	2023.10.02
[북마크] Zero-shot causal learning (Hamed Nilforoshan, NeurIPS 2023) (0)	2023.09.30
[북마크] Conformal Meta-learners for Predictive Inference of Individual Treatment Effects (Ahmed Alaa, NeurIPS 2023) (0)	2023.09.25

[북마크] (리뷰페이퍼) Understanding Causality with Large Language Models: Feasibility and Opportunities (Cheng Zhang, Arxiv 2023)

2023. 10. 2. 21:20

Author :(Microsoft) Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan
Paper Link: https://arxiv.org/abs/2304.05524

'AI & RL > Causal Inference' 카테고리의 다른 글

[북마크] Towards Causal Foundation Model: on Duality between Causal Inference and Attention (Jiaqi Zhang, Arxiv 2023) (0)	2023.10.14
[북마크] Passive learning of active causal strategies in agents and language models (Andrew Kyle Lampinen, NeurIPS2023) (0)	2023.10.02
[북마크] Causal Parrots: Large Language Models May Talk Causality But Are Not Causal (Matej Zečević, TMLR 2023) (0)	2023.10.02
[북마크] Zero-shot causal learning (Hamed Nilforoshan, NeurIPS 2023) (0)	2023.09.30
[북마크] Conformal Meta-learners for Predictive Inference of Individual Treatment Effects (Ahmed Alaa, NeurIPS 2023) (0)	2023.09.25

[북마크] Causal Parrots: Large Language Models May Talk Causality But Are Not Causal (Matej Zečević, TMLR 2023)

2023. 10. 2. 21:08

Author : Matej Zečević, Moritz Willig, Devendra Singh Dhami, Kristian Kersting
Paper Link: https://arxiv.org/abs/2301.12292

Code: https://github.com/moritzwillig/causalparrots

TMLR presentation: https://www.youtube.com/watch?v=vbwrhbuvedE

'AI & RL > Causal Inference' 카테고리의 다른 글

[북마크] Passive learning of active causal strategies in agents and language models (Andrew Kyle Lampinen, NeurIPS2023) (0)	2023.10.02
[북마크] (리뷰페이퍼) Understanding Causality with Large Language Models: Feasibility and Opportunities (Cheng Zhang, Arxiv 2023) (0)	2023.10.02
[북마크] Zero-shot causal learning (Hamed Nilforoshan, NeurIPS 2023) (0)	2023.09.30
[북마크] Conformal Meta-learners for Predictive Inference of Individual Treatment Effects (Ahmed Alaa, NeurIPS 2023) (0)	2023.09.25
[요약] Causal Transformer for Estimating Counterfactual Outcomes (Valentyn Melnychuk, ICML 2022) (0)	2022.10.20

[북마크] Zero-shot causal learning (Hamed Nilforoshan, NeurIPS 2023)

2023. 9. 30. 01:49

Author : Hamed Nilforoshan, Michael Moor, Yusuf Roohani, Yining Chen, Anja Šurina, Michihiro Yasunaga, Sara Oblak, Jure Leskovec
Paper Link: https://arxiv.org/abs/2301.12292

'AI & RL > Causal Inference' 카테고리의 다른 글

[북마크] (리뷰페이퍼) Understanding Causality with Large Language Models: Feasibility and Opportunities (Cheng Zhang, Arxiv 2023) (0)	2023.10.02
[북마크] Causal Parrots: Large Language Models May Talk Causality But Are Not Causal (Matej Zečević, TMLR 2023) (0)	2023.10.02
[북마크] Conformal Meta-learners for Predictive Inference of Individual Treatment Effects (Ahmed Alaa, NeurIPS 2023) (0)	2023.09.25
[요약] Causal Transformer for Estimating Counterfactual Outcomes (Valentyn Melnychuk, ICML 2022) (0)	2022.10.20
[요약] Shaking the foundations: delusions in sequence models for interaction and control (Pedro A. Ortega, ArXiv 2021) (0)	2022.02.11

[북마크] Conformal Meta-learners for Predictive Inference of Individual Treatment Effects (Ahmed Alaa, NeurIPS 2023)

2023. 9. 25. 16:34

Author : Ahmed Alaa, Zaid Ahmad, Mark van der Laan
Paper Link: https://arxiv.org/abs/2308.14895

Github: https://github.com/AlaaLab/conformal-metalearners

GitHub - AlaaLab/conformal-metalearners: Codebase for the paper "Conformal Meta-learners for Predictive Inference of Individual

Codebase for the paper "Conformal Meta-learners for Predictive Inference of Individual Treatment Effects" - GitHub - AlaaLab/conformal-metalearners: Codebase for the paper "Conformal...

github.com

'AI & RL > Causal Inference' 카테고리의 다른 글

[북마크] Causal Parrots: Large Language Models May Talk Causality But Are Not Causal (Matej Zečević, TMLR 2023) (0)	2023.10.02
[북마크] Zero-shot causal learning (Hamed Nilforoshan, NeurIPS 2023) (0)	2023.09.30
[요약] Causal Transformer for Estimating Counterfactual Outcomes (Valentyn Melnychuk, ICML 2022) (0)	2022.10.20
[요약] Shaking the foundations: delusions in sequence models for interaction and control (Pedro A. Ortega, ArXiv 2021) (0)	2022.02.11
[정리] Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders (Ioana Bica, ICML 2020) (0)	2022.02.07

[북마크] Secrets of RLHF in Large Language Models Part I: PPO (Rui Zheng, Arxiv 2023)

2023. 9. 11. 01:05

Author : Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang
Paper Link :https://arxiv.org/abs/2307.04964

Github: https://github.com/OpenLMLab/MOSS-RLHF

'AI & RL > Reinforcement Learning' 카테고리의 다른 글

[요약] Decision Transformer: Reinforcement Learning via Sequence Modeling (Lili Chen, NeurIPS 2021) (0)	2021.06.04
[정리] Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem (Michael Janner, NeurIPS 2021 spotlight) (0)	2021.06.04
[요약] From Motor Control to Team Play in Simulated Humanoid Football (DeepMind, ArXiv 2021) 요약 (0)	2021.05.31
[정리] Soft Actor-Critic (Haarnoja, 2018) (2)	2020.01.21
MuJoCo 설치 (윈도우, Linux) [22.6.25 수정] (5)	2020.01.17

Physical Grounding of LLM (SayTap, RoboCat, Language to Rewards for Robotic Skill Synthesis)

2023. 6. 22. 13:26

구글 딥마인드의 LLM활용 로보틱스 논문이 한번에 여러개 나와,

네이버 클라우드 하정우 박사님께서 운영하시는 Weekly AI ArXiv에서 정말 가볍게 소개.

https://github.com/jungwoo-ha/WeeklyArxivTalk/issues/87#issuecomment-1606018653

[20230625] Weekly AI ArXiv 만담 시즌2 - 21회차 · Issue #87 · jungwoo-ha/WeeklyArxivTalk

Zoom webinar: https://navercorp.zoom.us/j/92208940283 News 6.25 73주년 순국선열들의 희생에 감사드립니다. Conferences EMNLP 2023 deadline: 모두 수고하셨습니다. CVPR 2023: 모두들 의미있는 학회 보내셨나요~ Best Paper: V

github.com

https://saytap.github.io/

SayTap: Language to Quadrupedal Locomotion

Use foot contact pattern to bridge LLM and low-level controller.

saytap.github.io

https://www.deepmind.com/blog/robocat-a-self-improving-robotic-agent?utm_source=twitter&utm_medium=social&utm_campaign=robocat

RoboCat: A self-improving robotic agent

Robots are quickly becoming part of our everyday lives, but they’re often only programmed to perform specific tasks well. While harnessing recent advances in AI could lead to robots that could help in many more ways, progress in building general-purpose

www.deepmind.com

https://language-to-reward.github.io/

Language to Reward for Robotic Skill Synthesis

Project page for Language to Reward for Robotic Skill Synthesis.

language-to-reward.github.io

https://sites.google.com/view/agile-catching

'AI & RL > Foundation Model' 카테고리의 다른 글

[북마크] Formal Algorithms for Transformers (DeepMind, 2022) (0)	2022.07.25
[요약] Do Prompt-Based Models Really Understand the Meaning of their Prompts? (Albert Webson, NAACL 2022) (0)	2022.04.26
[북마크] Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models (방효진 님, ArXiv 2022) (0)	2022.04.03

[요약] Causal Transformer for Estimating Counterfactual Outcomes (Valentyn Melnychuk, ICML 2022)

2022. 10. 20. 00:36

Author: Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel
Paper Link: https://arxiv.org/abs/2204.07258

Code: https://github.com/Valentyn1997/CausalTransformer

ICML slide: https://icml.cc/media/icml-2022/Slides/17693.pdf

ICML presentation: https://slideslive.ch/38983812/causal-transformer-for-estimating-counterfactual-outcomes?ref=recommended

Time-varying confounder를 다루기 위해 CRN에서 LSTM을 사용한것과 달리 여기선 Transformer사용하여 길고 복잡한 sequence 데이터를 더 잘 다루고자 함
- 저자 피셜 Transformer를 causal inference에 최초로 적용한 사례
selection bias를 줄이기 위해 representation을 balancing 하는 접근을 택했고, 그 방법으로는 CRN과 같은 adversarial objectjve를 사용했으나 loss로는 doimain confusion loss를 사용했다는 차이첨이 있음
- CRN의 방식은 학습속도 파라메터에 민감하기 때문에 자신들의 접근이 더 낫다 설명하고 실험으로 검증

[Domain confusion loss]

[Gradient reversal]

실험 결과 time dependent confounder 및 long term prediction 모두에서 기존 방법들 대비 높은 counterfactual 정확성을 보여줌

'AI & RL > Causal Inference' 카테고리의 다른 글

[북마크] Zero-shot causal learning (Hamed Nilforoshan, NeurIPS 2023) (0)	2023.09.30
[북마크] Conformal Meta-learners for Predictive Inference of Individual Treatment Effects (Ahmed Alaa, NeurIPS 2023) (0)	2023.09.25
[요약] Shaking the foundations: delusions in sequence models for interaction and control (Pedro A. Ortega, ArXiv 2021) (0)	2022.02.11
[정리] Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders (Ioana Bica, ICML 2020) (0)	2022.02.07
[북마크] Learning "What-if" Explanations for Sequential Decision-Making (Ioana Bica, ICLR 2021) (0)	2022.01.27

[참고자료] Transformer for tabular data

2022. 7. 25. 10:07

논문 리스트

TabTransformer: Tabular Data Modeling Using Contextual Embeddings (Xin Huang, arXiv 2022)

Paper Link: https://arxiv.org/abs/2012.06678
Talk: https://www.youtube.com/watch?v=-ZdHhyQsvRc
AWS Code: https://github.com/awslabs/autogluon/tree/master/tabular/src/autogluon/tabular/models/tab_transformer
Other Repo 1: https://github.com/lucidrains/tab-transformer-pytorch
Other Repo 2: https://github.com/timeseriesAI/tsai/blob/main/tsai/models/TabTransformer.py

by AWS
Embedding layer로서 Transformer를 사용
Tabular데이터에 대해 각 column에 대하여 Column Embedding 수행 후 Trasformer를 사용해 context embeddings를 생성
생성된 context embeddings는 concat하여 MLP classifier로 들어감
Column Embedding
- 한 column이 d개의 클래스를 가지고 있을땐 missing value도 인덱스를 부여해 0부터 d+1까지의 lookup table로 인코딩
- one-hot보다 parametric embedding을 학습하는것이 더 나은 성능을 보여줌
- scalar column에 대해서는 3가지 방법의 re-scaling (quantiles, normalization, log)과 quantization을 방법을 모두 사용하여
- column identifier와 feature value를 따로 embedding하여 concat
- embedding dimension의 4, 28을 각각 column과 value dim으로 사용
Transformer 아키텍쳐
- Transformer hiddem dim: 32
- Transformer layer: 6
- Transformer multi-head: 8

MET: Masked Encoding for Tabular Data (Kushal Majmundar, Arxiv 2022)

Paper Link: https://arxiv.org/abs/2206.08564

by Google Research India
Masked-AutoEncoder(MAE)방식의 SSL을 사용하여 tabular data의 embedding을 학습
MAE Contributions:
1. Downstream 테스크에 embedding을 전달할때 column별 context embedding을 average가 아닌 concatnation하여 전달
2. 입력 데이터에 adversarial perturbation을 추가
인코더와 디코더 모두 Transformer사용
mask되지 않은 column에 한하여, column identifier로서 학습가능한 e크기이 embedding과 feature value의 scalar를 concatation해 e+1 차원의 embedding이 생성되어 Transformer 인코더의 입력으로 들어감
mask된 column은 column identifier와 학습가능한 special token으로서의 mask scalar를 contatation하여 masked embedding을 생성
Transformer 인코더를 커쳐나온 context embedding에 masked embedding을 합쳐 Transformer 디코더에 넣어 전체 column을 복원
Downstream task에 전달할때 contexted embedding column들에 대하여 average가 아닌 concat하여 전달

Tabular Transformers for Modeling Multivariate Time Series (Inkit Padhi, ICASSP 2021)

Paper Link: https://arxiv.org/abs/2011.01843
Code: https://github.com/IBM/TabFormer

by IBM
Tabular 데이터에 대한 BERT 및 GPT스타일의 sequence encodeing
TabBERT
- 시간에 따른 각 row를 Field Transforer를 사용하여 row embeeding한 다음 token으로서 BERT에 입력
- Mask는 row단위가 아닌 row의 field단위로 mask하여 이를 예측하도록 학습
TabGPT
- 각 row들을 [SEP]로 분리하면서 연속되게 이어서 입력으로 주며, 현재의 row가 들어갔을때 미래의 row들을 예측하도록 학습
Continuous column은 quantization을 수행하여 categorical column으로 변환

TAPEX: Table Pre-training via Learning a Neural SQL Executor (Qian Liu, ICLR 2022)

Paper Link: https://arxiv.org/abs/2107.07653
Code: https://github.com/microsoft/Table-Pretraining

by Microsoft
아키텍쳐로 BART를 사용

Revisiting Deep Learning Models for Tabular Data (Yura Gorishniy, NeurIPS 2021)

Paper Link: https://arxiv.org/abs/2106.11959v2
Code: https://github.com/Yura52/tabular-dl-revisiting-models

by Yandex
Feature Tokenizer를 통과한 토큰들과 [CLS]토큰을 사용한 prediction
각 column별로 weight와 bias가 있어 이를 개별 embedding
catetorical column의 경우 lookup table에서 각 카테고리에 해당하는 벡터를 onehot vector와 곱해준 뒤 각 column에 해당하는 bias vector를 더해줌

On Embeddings for Numerical Features in Tabular Deep Learning (Yura Gorishniy, arXiv 2022)

Paper Link: https://arxiv.org/abs/2203.05556v1
Code: https://github.com/Yura52/tabular-dl-num-embeddings

by Yandex
Tabular 데이터의 numerical feature에 대한 feature binning을 어떻게 하는게 좋은지에 대한 연구
Token화 한 tabular 데이터를 Transforemr에 태워 prediction 테스크 수행
scalar를 바로 넣어주는것 보다 one-hot의 개선된 버전인 PLE(piecewise linear encodding)을 사용할 경우 CatBoost보다 나은 성능을 보여주기도 함

Revisiting Pretraining Objectives for Tabular Deep Learning (Ivan Rubachev, arXiv 2022)

Paper Link: https://arxiv.org/abs/2207.03208
Code: https://github.com/puhsu/tabular-dl-pretrain-objectives

by Yandex

그 외 깃헙 레포 리스트

https://github.com/yandex-research/rtdl (위 Yandex사의 3개 논문 종합 레포)

'AI & RL > Representation Learning' 카테고리의 다른 글

[북마크] Pure Transformers are Powerful Graph Learners (Jinwoo Kim, Arxiv 2022) (0)	2022.07.12
[북마크] Understanding How Encoder-Decoder Architectures Attend (Kyle Aitken, NeurIPS 2021) (0)	2021.11.14
[북마크] Understanding the World Through Action (Sergey Levine, CoRL 2021) (0)	2021.10.27
[요약] The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning (Yujin Tang, NeurIPS 2021 Spotlight) (0)	2021.09.13
[참고자료] Transformer VAE (0)	2021.08.19

[북마크] Formal Algorithms for Transformers (DeepMind, 2022)

2022. 7. 25. 09:25

Author: Mary Phuong, Marcus Hutter
Paper Link: https://arxiv.org/abs/2207.09238
- 요약예정

'AI & RL > Foundation Model' 카테고리의 다른 글

Physical Grounding of LLM (SayTap, RoboCat, Language to Rewards for Robotic Skill Synthesis) (0)	2023.06.22
[요약] Do Prompt-Based Models Really Understand the Meaning of their Prompts? (Albert Webson, NAACL 2022) (0)	2022.04.26
[북마크] Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models (방효진 님, ArXiv 2022) (0)	2022.04.03

[북마크] Pure Transformers are Powerful Graph Learners (Jinwoo Kim, Arxiv 2022)

2022. 7. 12. 19:28

Author: Jinwoo Kim, Tien Dat Nguyen, Seonwoo Min, Sungjun Cho, Moontae Lee, Honglak Lee, Seunghoon Hong
Paper Link: https://arxiv.org/abs/2207.02505

Code: https://github.com/jw9730/tokengt

'AI & RL > Representation Learning' 카테고리의 다른 글

[참고자료] Transformer for tabular data (0)	2022.07.25
[북마크] Understanding How Encoder-Decoder Architectures Attend (Kyle Aitken, NeurIPS 2021) (0)	2021.11.14
[북마크] Understanding the World Through Action (Sergey Levine, CoRL 2021) (0)	2021.10.27
[요약] The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning (Yujin Tang, NeurIPS 2021 Spotlight) (0)	2021.09.13
[참고자료] Transformer VAE (0)	2021.08.19

[요약] Multi-Game Decision Transformers (Kuang-Huei Lee, arxiv 2022)

2022. 6. 7. 15:52

Author: Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski, Igor Mordatch
Paper Link: https://arxiv.org/abs/2205.15241

Website: https://sites.google.com/view/multi-game-transformers

Code: yet

Summary

multi task 문제에 대해 Decision Transformer 기반의 sequence modeling이 가장 좋은 성능을 보여줌.
large-scale language model이나 vision model에서 보아왔던 경향성과 유사한 특성을 확인함
1. Large-scale generalist RL agent에 대해 모델사이즈와 성능 사이의 power-law 관계를 확인
2. Pretrained RL agent에 대해 처음주어진 적은데이터로 빠른 finetunning이 되는것을 확인
offline 학습데이터에는 expert의 시연만 있는것이 아니므로, 이를 활용하면서도 높은 reward를가진 sequence를 생성하고자 language modeling에서 쓰이는 guided generation 접근을 decision transformer에 적용했으며 이때 가장 좋은 성능을 보여줌.

Methods

Decision Transformer를 베이스로 하고 있으나 Decision Transformer와 달리 여기에선 return와 reward도 모델링 하여 multi-task에 대한 모델의 representation을 높힘.
하지만 Trajectory transformer와는 달리 obervation은 모델링하지 않았으며 future works로 남겨둠.

high-rewarding behavior를 생성하기위해 language model에서 사용되는 discriminator-guided generation 문제와 유사한 접근을 적용.
여기선 GeDi 논문 (link) 에서 사용된 방법을 참고하여 binary classifier $P(expert^t \mid \cdots )$를 활용.
이 접근을 사용함으로써 더 많은 데이터셋을 활용하여 모델의 환경에 대한 이해도를 높힐 수 있는 동시에, Decision Transformer처럼 매뉴얼한 return conditioning을 줄 필요가 없어짐.

총 41게임에 대해 학습을 하고 5가지 서로다른 특징의 held out 게임들에 대해 OOD generalized 성능 평가 수행.
총 4.1B 스텝, 160B 토큰으로 학습.

key questions 및 experiment results

How do different online and offline methods perform in the multi-game regime?

single agent만드로 사람 수준에 대해 126%의 Atari 게임 플레이 성능을 보여줌.
아래 그래프에서 앞의 두 specialist agent보단 낮지만 그에 근접한 성능을 확인.

How do different methods scale with model size?

large language model에서 보여준 power law와 유사한 경향을 Transformer 기반 RL agent의 성능에서 확인함.
모델의 파라메터가 커질수록 ID 및 OOD 모두에서 성능이 증가하며, 동일한 token에 대한 학습속도가 빨라짐.

How effective are different methods at transfer to novel games?

5개 서로 다른 성격의 held out 게임들에 대한 finetunning 실험에서, pretrained DT가 전반적으로 뛰어난 성능을 보여줌.
학습데이터와 비교했을때 1%의 적은 데이터만으로도 fintunning이 가능.

Does Multi-Game Decision Transformer improve upon training data?

데이터셋 중에서 게임별 상위 3개의 성능과 비교해보았을때, 대체로 데이터셋보다 크게 발전하는 성능을 보여줌.

Does expert action inference improve upon behavioral cloning?

optimal action으로 학습한 Behavior Cloning보다 전반적으로 뛰어난 성능을 보여줌 (31/41).

Does training on expert and non-expert data bring benefits over expert-only training?

최근 DeepMind에서 RL뿐만아니라 딥러닝 전반적인 태스크를 다루는 generalist agent인 Gato를 발표함.
Gato와 차이점은 Gato가 expert의 데이터만 사용했으며 expert trajectory를 prompt로서 필요로하지만, Multi-Game DT는 expert가 아닌 데이터도 사용했으며 prompt가 필요 없다는 것.
실험 결과 1) Behavior Cloning에서 expert 데이터만 쓰는것이 성능을 향상시켰으며, 2) Multi-Game DT는 전체 데이터를 썼을때 오히려 성능이 향상했으며, 3) 전체 데이터로 학습한 Multi-Game DT는 expert 데이터만을 사용한 BC보다 나은 성능을 보여줌.
(Gato 논문을 자세히 읽어보진 못했지만, expert BC transformer가 Gato의 학습 아키텍처와 유사한것으로 생각됨.)

Are there benefits to specifically using transformer architecture?

목표 return을 conditioning 해주는 Upside-Down RL (UDRL)이 Decision Transformer류의 특징.
UDRL이 Transformer에 사용되었을 때, 즉 sequence modeling에 적용되었을때 큰 이득이 있음을 확임함.

What does Multi-Game Decision Transformer attend to?

Attention anlysis결과 agent가 입력 이미지 패치들 중에서 게임을 하는데 의미있는 패치에 집중하는것을 확인 함.

'AI & RL > Meta & Multi-Task RL' 카테고리의 다른 글

[북마크] Planning with Diffusion for Flexible Behavior Synthesis (Michael Janner, ICML 2022) (0)	2022.05.25
[북마크] Generalized Decision Transformer for Offline Hindsight Information Matching (Hiroki Furuta, ICLR 2022 Spotlight) (0)	2022.04.26
[요약] AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning (ICLR 2022 Spotlight) (0)	2021.11.10
[요약] Transformers are Meta-Reinforcement Learners (ICLR 2022 under review->reject) (0)	2021.10.07
[요약] A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning (Mingde Zhao, NeurIPS 2021) (0)	2021.10.02

[북마크] Planning with Diffusion for Flexible Behavior Synthesis (Michael Janner, ICML 2022)

2022. 5. 25. 23:28

Author: Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine
Paper Link: https://arxiv.org/abs/2205.09991

Site: https://diffusion-planning.github.io/

Code: https://github.com/jannerm/diffuser

'AI & RL > Meta & Multi-Task RL' 카테고리의 다른 글

[요약] Multi-Game Decision Transformers (Kuang-Huei Lee, arxiv 2022) (0)	2022.06.07
[북마크] Generalized Decision Transformer for Offline Hindsight Information Matching (Hiroki Furuta, ICLR 2022 Spotlight) (0)	2022.04.26
[요약] AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning (ICLR 2022 Spotlight) (0)	2021.11.10
[요약] Transformers are Meta-Reinforcement Learners (ICLR 2022 under review->reject) (0)	2021.10.07
[요약] A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning (Mingde Zhao, NeurIPS 2021) (0)	2021.10.02

[북마크] Safe Reinforcement Learning for Legged Locomotion (Tsung-Yen Yang, ArXiv 2022)

2022. 5. 10. 10:51

Author: Tsung-Yen Yang, Tingnan Zhang, Linda Luu, Sehoon Ha, Jie Tan, Wenhao Yu
Paper Link: https://arxiv.org/abs/2203.02638

Site: https://sites.google.com/view/saferlleggedlocomotion/

Google AI Blog: https://ai.googleblog.com/2022/05/learning-locomotion-skills-safely-in.html

'AI & RL > Real-world (Safe) RL' 카테고리의 다른 글

[요약] Learning robust perceptive locomotion for quadrupedal robots in the wild (Takahiro Miki, Science Robotics 2022) (0)	2022.02.10
[요약] RMA: Rapid Motor Adaptation for Legged Robot (Ashish Kumar, RSS 2021) (0)	2021.07.14
[북마크] Learning to be Safe: Deep RL with a Safety Critic (Krishnan Srinivasan, arXiv 2020) (0)	2021.03.08
[북마크] Conservative Safety Critics for Exploration (Homanga Bharadhwaj, ICLR 2021) (0)	2020.10.29
[정리] Learning to Walk in the Real World with Minimal Human Effort (Sehoon Ha, 2020) (0)	2020.06.06

[정리] Know Your Action Set: Learning Action Relations for Reinforcement Learning (Ayush Jain, ICLR 2022)

2022. 4. 26. 17:58

Author: Ayush Jain, Norio Kosaka, Kyung-Min Kim, Joseph J Lim
Paper Link: https://openreview.net/forum?id=MljXVdp4A3N

Site: https://sites.google.com/view/varyingaction

Code: https://github.com/clvrai/agile

0. Abstract

지능이 있는 개체는 현재 행할 수 있는 action들의 종류에 따라서 유동적으로 task를 풀 수 있는 반면, 보편적인 RL은 고정된 action set을 가정하고 있음.
예를 들어 목공 수리를 하는 task에서 '못질을 하는 action'은 '망치를 드는 action'이 있을때만 의미가 있음.
본 연구에서는, 이러한 action사이의 상관관계를 활용하기위해 graph attention network를 활용가능한 action들에 적용하는 방법을 제안함.
결과적으로 이 관계성 접근방법을 사용할 경우, value-based 및 pollicy-based RL알고리즘 모두 에서 서로 연관된 action을 활용하는것이 가능함을 확인했으며
action sapce가 변하는 추천시스템 및 물리적 reasoning과 같은 문제에서 기존의 비관계성 아키텍처들보다 뛰어난 성능을 보이는것을 확인함.

1. Introduction

액자를 벽에 거는 task가 있을때, 망치가 있을땐 못을 사용하면 되지만 후크가 있을땐 접착테잎을 사용해야함.

즉, 최선의 의사결정은 환경 뿐만이 아니라 현재 활용가능한 action에도 의존.
기존 RL은 fixed action space를 가정하고 있기때문에 최근엔 RL에서도 변화하는 action space 혹은 unseen action 문제를 다루고자하는 연구가 발표되고 있으나, 위 예시와 같은 action들 사이의 interdependence에대한 학습을 다루진 않았음.
Varing action sapce에서의 interdepence문제로는 매일 추천해야할 기사의 set이 바뀌는 recommender system 혹은 공구/목적/스킬에 따른 물리적 reasoning이 있음
본 연구에서는 graph attention network (GAT)를 핵심요소로 하는 AGILE, Action Graph for Interdependence Learning 이라는 policy 아키텍처를 제안하며,
1. 입력으로서의 action set의 요약과 2. action들 사이의 관계정보 학습을 그 목적으로 함.

2. Related Work

2.1 Stochastic Action Sets

정해진 전체 action pool에서 활용가능한 action set이 랜덤하게 샘플되는 경우를 stochastic action sets라고 하며, 기존 연구에서는 활용불가능한 action의 경우 확률분포의 출력을 masking하는 방식 등을 사용함.
하지만 전체 action pool이 미리 정해져 있는것은 추천시스템과 같이 unseen item을 자주 받는 경우엔 실용성이 떨어짐.
또한 기존 연구와 같이 매 timestep마다 action set이 바뀌는것 역시 실용성이 떨어짐.
이에 본 연구에서는 한 episode에서는 샘플된 action set이 유지되도록하며, unseen action을 마주하는 상황에 대한 한계를 다루고자 action representation을 사용하는 방법을 활용함.

2.2 Action Representations

넓은 action space, transfer learning, shared structural action 등의 문제를 위해 action representation을 활용하는 방법이 기존 논문들에서 사용됨.
본 연구에서는 agent가 base action pool에 대한 사전지식을 활용하는것을 피하고 unseen action 문제를 다루고자 action representation을 사용함.

2.3 List-wise Action Space

action의 선택이 large set $\mathcal{I}$에서 $k$아이템의 subset인, 즉 combinatorial action space $\begin{pmatrix} \mathcal{I} \\ k \end{pmatrix}$에서 최적의 action list를 찾는 강화학습 문제를 list-wise RL 혹은 slate RL이라고 함.
본 연구에서는 제안하는 AGILE policy 아키텍처가 list-wise RL에도 적용가능함을 보여주고자 함.

2.4 Relational Reinforcement Learning

GNN은 관계성이 중요한 task를 다루는 기존 RL연구들에서 사용 됨.
본 연구에서는 task를 풀기위해 action들 사이의 interaction이 중요한 문제들을 다루고자하며, 이 과정에서 의미있는 action interaction을 모델링 하기 위해 graph attention network (GAT)를 활용함.

3. Problem Formulation

위에서 든 벽에 액자를 거는 예시와 같이, 주어진 action set에서 최선의 행동을 위한 action간의 interdependence를 학습하는것이 여기서 풀고자하는 문제의 핵심.

3.1 Reinforcement Learning with Varying Action Space

강화학습으로 문제를 접근하기위해 다음의 MDP를 정의

$\left\{ \mathcal{S}, \mathbb{A}, \mathcal{T}, \mathcal{R}, \gamma \right\}$

이때 $\mathbb{A}$ 는 countably infinite한 action의 집합.
무한한 action set을 handling하고자, 여기선 추가적으로 $a \in \mathbb{A}$에 대한 $D$-차원의 action representation $c_a \in \mathbb{R}^D $ 를 정의.
Action의 subset인 $\mathcal{A} \subset \mathbb{A}$와 이에 해당하는 representation $\mathcal{C}$ 가 주어졌을 때, 여기서 agent의 목적은 unseen action들또한 포함한 subset에 대해 다음의 retrun 을 최대화하는 policy $\pi\left ( a \mid s, \mathcal{A} \right )$을 학습하는것

$\mathbb{E}_{\mathcal{A}\subset\mathbb{A}}\left [ \sum_t \gamma^{t-1}r_t \right ]$

3.2 Challenges of Varying Action Space

Varing action sapce에서의 interdepence문제에는 챌린지가 있으며 그 대응안은 다음과 같음.
1. 모든 action이 다 주어지는것이 아니므로 policy framework는 유연해야함. 이를 위해 action representation $\mathcal{C}$을 사용.
2. 현재 환경에서 활용 가능한 action set이 변하는 경우에는 기존의 state space $\mathcal{S}$만으로는 환경의 상태를 완전히 표현할 수 없으므로, $\mathcal{S}$는 Markovian이 아님. 이에, action set representation을 추가한 hyper state space $\mathcal{S}^{'}$를 새롭게 정의. $\mathcal{S}^{'}=\left\{ s\circ \mathcal{C}_{\mathcal{A}} : s \in \mathcal{S, A} \subset \mathbb{A} \right\}$
3. 활용가능한 action들 사이의 interdependence가 학습되어야함. 구체적으로, 최적의 agent는 미래의 활용 가능한 action들 $c_{a_i} \forall a_i \in \mathcal{A}$ 과 현재의 활용가능한 action들 $c_{a_t}$ 의 특성간의 관계를 explicitly 모델링 할 수 있어야 됨.

4. Approach

제안하는 접근의 핵심은 GNN을 사용하여 action representation set를 임베딩하는 동시에 action들 사이의 관계를 학습하는 것.

4.1 AGILE: Action Graph for Independence Learning

Action representation $\mathcal{C}$의 리스트와 state encoding이 주어졌을때, 이 를 각각 concat하여 fully-connected action graph를 만듬.
여기에 graph attention network (GAT)를 사용하여 action간의 관계를 attention weight로서 추론함.
예) Figure 2.에서 대포와 불은 높은 attention weight를 가짐
Utility network는 state encoding, GAT의 결과인 relational action representation, 그리고 meal-pooling을 한 action summary를 사용하여 RL알고리즘의 Q나 logit을 계산.

Action Graph:

State에 따라 action들 사이의 관계가 달라짐.
(예: 스크류드라이버와 전동드릴은 나사못과의 관계가 유사하지만, 가구조립시엔 드라이버가 좀 더 선호되며 벽에 사용할땐 드릴이 좀 더 선호됨)
즉, action representation만을 고려하여 graph를 만드는것이 아닌, state를 결합한 새로운 action representation $c^{'}_{a_i}=\left ( s,c_{a_i} \right )$를 각 노드의 feature로 하여 fully-connected action graph $\mathcal{G}$를 구성함.
이 후 실험파트에서, 이와같이 state를 추가할 경우에 더 optimal에 가까운 결과를 보여주는것을 다룸.

Graph Attention Network (GAT):

만들어진 action graph $\mathcal{G}$를 GAT의 입력으로 넣어, 주어진 action set 중에서 서로 연관성이 큰 action들에 더 포커스를 하도록 학습.
GAT에 대한 설명은 고려대학교 DMQA 연구실 소속이었던 강현규님의 세미나를 참고.

충분한 propagation이 가능하도록하기위해 ELU를 사이에 연결한 2개의 graph attention layer를 구성.
이 후 실험파트에서, 두번째 레이어 다음의 residual connection은 중요한 반면 multi-head attention은 영향이 없는것을 결과로서 다룸.

Action Set Summary:

GAT의 출력은 relational action representation $\mathcal{C}^R=\left\{ c^R_{a_0}, \cdots ,c^R_{a_k} \right\}$이며, 각 action representation은 가능한 다른 action과 그 관계에 대한 정보를 내포하고 있음.
앞서 MDP정의에서 다룬바와 같이, 현재 주어진 action set을 state변수로 고려하는 목적에서 action set 정보를 요약하기위해 mean-pooling을 다음과 같이 수행함. $\overline{c}^R=\frac{1}{K}\sum_{i=1}^{K} c^R_{a_i}$

Action Utility:

앞서 계산관 값들은 RL네트워크에 전달하기 위한 utility score를 계산하는 utility network 아키텍처 $\pi_u$에 입력으로 전달 됨. 즉, $\pi_a\left ( c^R_a, s, \overline{c}^R \right )$

4.2 Training AGILE framework with Reinforcement Learning

AGILE 아키텍처의 학습은 PPO, DQN, CDQN을 사용하여 end-to-end로 각각 수행.
(RL에 대한 자세한 내용은 이 포스팅에선 생략. CQDN은 list-wise action 문제에 대한 RL알고리즘.)

5. Environments

AGILE알고리즘의 실험은 세가지 서로 다른 특징을 가진 환경에서 진행.
1) Dig Lava Grid World: 샘플링된 skill을 사용한 최단경로 찾기
2) CREATE: 물건을 목적지로 옮기기위해 샘플링된 도구들을 physical reasoning하여 선택하기.
3) Recommender Systems:

5.1 Dig Lava Grid Navigation

최단 경로 찾기의 대표적인 toy 환경인 기존의 2D Grid World에서 디폴트 5개 action 외에 4개의 추가 action 중 2개가 랜덤하게 더 주어질 경우, 이를 활용하여 최단 경로를 개선하는 문제.
용암에 들어간 후 다음 스텝에서 용암의 색깔에 맞는 땅파기 스킬을 사용하면 용암이 사라지나, 두 스텝 연속으로 용암에 들어 있으면 해당 에피소드는 실패.
RL알고리즘으론 PPO 사용.

5.2 Chain Reaction Tool Environment: CREATE

2차원 공간 상에서 주어진 도구들을 사용하여 빨간색공을 목적지로 옮기는 문제 .
기본도구와 기본도구를 활용하기위한 동력장치가 각각 샘플링되어 주어졌을 때, 이를 사이의 물리적 관계를 파악하고
RL알고리즘으론 PPO사용.

5.3 Recommender Systems

RecSys, 즉 추천시스템은 variying action space RL문제로 볼수 있음.
(예: 매일 새롭운 기사나 유투브 영상이 올라오며 이중에서 추천이 이루어짐)
Complementary Product Recommendation (CPR): 추천을 할 때 high level에선 관련이 있지만 low level에서는 다양성을 늘리는것을 말하며 long-term 관점에서 서비스를 운영할 때 매우 중요한 부분.
(예: primary category는 셔츠 및 바지와 같은것이라면 subcategory는 색깔)

$CPR = \frac{Entropy\;of\;subcategory}{Entropy\;of\;category}$

본 연구에서는 user preference에 더해 현실적인 시나리오를 만들고자 listwise item이 잘 추천된지에 대한 metric으로 CPR을 추가적으로 사용.
CPR의 정의에 따라, CPR을 최적화 하려면 추천 agent는 가장 보편적인 primary 아이템을 찾아
이 후 실험에서는, 추천 시뮬레이션 환경에서는 user의 클릭수로 함축적으로 CPR를 최대화하고, 실제 서비스 데이터에서는 reward로서 명시적으로 CPR을 최대화.

5.3.1 Simulated Recommender System: RecSim

사용자 인터렉션을 시뮬레이션하고 강화학습을 적용하기위해 구글에서 개발한 환경인 RecSim (Github)을 listwise recommendation task로 확장. (RecSim에 대한 자세한 내용은 DataBro 님의 포스팅 [1], [2] 참조)
train과 test 아이템이 각 250개씩 있으며, 매 에피소드마다 20개의 아이템이 샘플링되어 agent에 주어지고 agent는 매 step마다 6개의 아이템을 추천.
user의 state는 각 아이템에 따른 preference를 embedding한 vector로 구성.
user는 preference에 따라 추천된 아이템중 하나를 클릭하거나 아무것도 클릭하지않음.
이때 아이템을 클릭을 할 확률은 아래와 같은 기본 user preference에 CPR metric이 추가로 반영된 score를 계산한 뒤, softmax를 거친 categorical distribution을 따름

$\textrm{score}_{item}=\alpha_{user}*\left<e_u,e_i\right>+\alpha_{metric}*m$

$p_{item}=\frac{e^{s_{item}}}{\sum e^{s_{item}}}$

$R = f_{click\,or\,skip}\left ( p_{item} \right )$

즉, 클릭을 많이 한다는것은 추천한 listwise action의 CPR이 높은것을 함축적으로 내포.
클릭을 할 경우와 skip을 할 경우 각각 1과 0의 reward를 반환.
이 user preference embedding과 item embedding은 user의 이전 step response (클릭 여부)를 반영하여 매 step마다 새롭게 업데이트.
user preference vector와 action representation이 모두 주어지는 fully observable 조건.
추천한 아이템 리스트의 CPR은 아이템간의 카테고리 coherence가 높은 동시에 subcategory diversity가 클수록 증가한다고 볼 수 있으며, 이러한 추천을 하려면 샘플링되어 주어진 아이템 간의 관계를 reasoning할 수 있어야 함.
CDQN을 사용하여 user session내에서 클릭수를 최대화.

5.3.1 Real-Data Recommender System

LINE의 온라인 광고추천 서비스에서 2021년 8월 말 2주간 offline 데이터를 수집하고, 이를 학습하여 9월 초 2주간 offline 테스트를 진행함.
user는 '지역/나이/직업'으로 representation되며, 아이템은 'text/이미지/보상포인트'를 feature로 포함.
학습 데이터는 68,775명의 user와 57개 아이템을 포함했으며, 테스트는 82,445명의 user와 58개의 아이템을 포함.
reward function은 user의 클릭수와 추천리스트의 CPR값으로 구성.
학습엔 CDQN을 사용했으며 test reward로 평가.

6. Experiments

본 연구의 실험파트는 다음 5가지 의문에 대한 분석을 수행 하기위해 설계됨.
Varying action space 측면에서,
1) AGILE이 action을 독립적으로 다루거나 고정된 action set을 다루던 기존 접근보다 얼마나 효과적인가?
2) AGILE의 relational action representation이 action set summary나 action utility score의 계산에 있어서 얼마나 효과적인가?
3) AGILE의 attention이 action relation을 유의미하게 표현하는가?
4) AGILE의 GNN에서 attention이 필수적인가?
5) state-dependent action relation 은 general varying action space task문제를 푸는데 있어 중요한가?

6.1 Effectiveness of AGILE in Varying Action Spaces

action을 독립적으로 다루거나 고정된 action set을 다루는 기존 알고리즘들을 baseline으로 AGILE을 평가하고자함.
relational action feature와 summary의 효과를 평가하고자 이에 대한 ablation test를 수행함.

6.1.1 Baselines

Mask-Output: 고정된 action space를 가정하고 불가능한 action과 관련된 Q-네트워크나 policy의 output은 masking.
Mask-Input-Output: Mask-Output에 더해, 각 action이 활용가능한지 아닌지의 binary정보를 입력에 넣어줌
Utility-Policy: action representation과 각 action에 대한 utility policy를사용하여 unseen action을 다루나, graph는 사용하지 않음.
Simple DQN: 기본 DQN을 의미. action간의 상호관계를 고려하지않고 가장 높은 Q-value를 가지는 top k를 선택.

6.1.2 Ablations

Summary-LSTM: relational action representation을 고려하지 않으며 summary도 GAT 대신 bi-LSTM을 사용.
Summary-Deep Set: relational action representation을 고려하지 않으며 summary도 GAT 대신 deep set 아키텍처를 사용. (deep set에 대한 내용은 해당 논문 참조)
Summary-GAT: GAT가 summary에만 사용되고 relational action representation은 고려되지 않는 경우.

6.1.3 Results

Figure 4.는 baseline의 학습과정(위)과 unseen action이 포함된 테스팅(아래)에서의 결과.
모든 환경에서 AGILE이 명확히 더 나은 결과를 보여주었음. (Real-Data RecSys에선 눈에띄는 차이는 아님)
이로부터, varing action space에서는 활용 가능한 action의 존재와 그 관계를 아는것이 optimal action을 찾는데 매우 중요함을 확인.

Figure 5.는 relation은 고려하지 않고 주어진 활용가능한 action들의 정보 (action summary) 만 고려한 ablation의 test 결과.
action의 갯수가 적고 관계성이 단순한 Dig Lava Grid에선 큰 차이가 없음.
Recsys에서는 보편적인 아이템을 추천하면 CPR이 높아질 확률이 큰데, action summary만으로도 보편적인 아이템의 을 찾는게 가능하여 관계성 정보까지 사용하는 AGILE대비 큰 차이는 아닌 5-20%의 성능 상승을 보여줌.
반면 CREATE는 기구와 동력원 사이의 관계성이 매우 복잡하고 범위가 넓어서, summary만으론 이 관계를 파악하기가 어려워 여러 환경 중 AGILE이 가장 큰 성능 차이를 보임.

6.2 Does the Attention in AGILE Learn Meaningful Action Relations?

Figure 6.은 학습된 agent의 퍼포먼스를 좀더 정성적으로 분석해본 결과.
(a)는 CREATE에서의 attention map으로 spring을 선택한 후엔 trampoline에 대한 attention이 매우 강해지는것을 확인할 수 있음.
(b)는 Grid World에서의 Suumary-GAT의 attention map으로, 오른쪽으로 가는 action과 분홍색 용암을 퍼내는 action사이에 attention이 매우 높은것을 확인 할 수 있음.
(c)는 RecSim에서의 user와의 interaction으로, AGILE이 추천하는 아이템들은 6개 중에 5개를 동일한 가장 보편적인 카테고리인 7을 선택하여 CPR을 최대화 하는것을 확인 가능 함.

6.3 Additional Analysis

6.3.1 Importance of Attention in the Graph Network

Figure 7.에서는 AGILE의 GAT를 또다른 GNN아키텍처인 graph convolutional network (GCN)으로 바꿨을때의 결과를 비교함.
앞서 살펴본 바와 같이 action간의 관계가 단순한 환경인 Grid World와 RecSys에서 GCN은 optimal 성능을 보여주는것을 확인.
action간의 관계가 복잡한 CREATE에서는 GAT를 사용한것 대비 크게 성능이 떨어지는것을 확인 함.
RecSym에서도 추가적으로 item간의 pair를 만들어 이를 맞출때만 클릭이 되도록 환경의 action관계성의 복잡도를 올릴 경우, GAT를 쓰는것이 GCN을 쓰는것보다 더 나은 성능을 보여주는것을 확인 함.
저자들은 이에 대해, GAT가 graph를 더 sparse하게 만들어 RL알고리즘의 학습을 쉽게 하는것이며 fully-connected GCN은 이러한 것이 어려울것이라 가정.

6.3.2 Importance of State-Dependent Learning of Action Relations

AGILE-Only Action: GAT에서 state를 action과 concat하여 새로운 action representation을 만들지않고 action만 사용한 경우.
Figure 7.에서는 AGILE-Only Action의 학습결과도 함께 비교함.
Grid World와 CREATE와 같이 state에따라 action의 관계성이 달라지는 환경에서는 state를 concat하지 않을경우 성능이 떨어지는것을 확인함.
반면 CPR이 user의 state와는 독립적으로 가장 일반적인 category를 아는것만을 필요로 하므로 RecSim에서는 state-dependence의 영향이 적은것을 확인함.

7. Conclusion

varying action space RL문제에서 action간의 관계성을 활용가능한 AGILE아키텍처를 제안.
AGILE은 GAT를 사용하여 action들 사이의 상호의존성을 학습하는것이 가능함을, real-data 추천시스템을 포함한 4개 환경에서 검증 및 확인함.

'AI & RL > Recommender System' 카테고리의 다른 글

[요약] User Response Models to Improve a REINFORCE RecommenderSystem (Minmin Chen, WSDM 2021) (0)	2021.10.02
[북마크] Top-K Off-Policy Correction for a REINFORCE Recommender System (Minmin Chen, WSDM 2019) (0)	2021.10.02
[북마크] Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation (RecSys 2021) (0)	2021.09.25
[요약] Towards Content Provider Aware Recommender Systems (Rouhan Zhan, WWW 2021) (0)	2021.05.09
[참고자료] Reinforcement Learning for Recommender Systerms (0)	2021.02.07

[북마크] Generalized Decision Transformer for Offline Hindsight Information Matching (Hiroki Furuta, ICLR 2022 Spotlight)

2022. 4. 26. 17:55

Author: Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu
Paper Link: https://openreview.net/forum?id=CAjxVodl_v

Code: https://github.com/frt03/generalized_dt

'AI & RL > Meta & Multi-Task RL' 카테고리의 다른 글

[요약] Multi-Game Decision Transformers (Kuang-Huei Lee, arxiv 2022) (0)	2022.06.07
[북마크] Planning with Diffusion for Flexible Behavior Synthesis (Michael Janner, ICML 2022) (0)	2022.05.25
[요약] AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning (ICLR 2022 Spotlight) (0)	2021.11.10
[요약] Transformers are Meta-Reinforcement Learners (ICLR 2022 under review->reject) (0)	2021.10.07
[요약] A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning (Mingde Zhao, NeurIPS 2021) (0)	2021.10.02

[요약] Do Prompt-Based Models Really Understand the Meaning of their Prompts? (Albert Webson, NAACL 2022)

2022. 4. 26. 09:12

Author: Albert Webson, Ellie Pavlick
Paper Link(arXiv): https://arxiv.org/abs/2109.01247

Paper Link(NAACL): https://openreview.net/forum?id=BhGMkxhZrW9

Code: https://github.com/awebson/prompt_semantics

[NAVER AI Lab하정우 박사님 Weekly arXiv 소개내용 참고]

Large LM에서 prompt-based learning이 잘되는것이 prompt에 포함된 task instruction의 효과라고 생각해왔는데 이게 정말 그러한지를 실험적으로 분석함.

Prompt는 위와 같은 task와 연관되거나 관련없는 등 다양한 카테고리의 템플릿을 사용.

위 표는 실험 결과로서, 체크표시는 instructive한 prompt가 그렇지 못한 prompt대비 통계적으로 유의하게 차이가 나는 경우를 의미.
결과적으로 intrunction과는 성능이 크게 차이 없다는 놀라운 사실과 함께, 마지막 컬럼을 통해 prompt가 있기만 하면 few-shot 성능은 좋아진다는 것을 확인. 즉, LLM은 prompt의 instruction을 이해한것이 아니라는 기존의 생각과 반하는 결과. 특히 GPT-3는 체크표시가 없음.

'AI & RL > Foundation Model' 카테고리의 다른 글

Physical Grounding of LLM (SayTap, RoboCat, Language to Rewards for Robotic Skill Synthesis) (0)	2023.06.22
[북마크] Formal Algorithms for Transformers (DeepMind, 2022) (0)	2022.07.25
[북마크] Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models (방효진 님, ArXiv 2022) (0)	2022.04.03

[북마크] Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models (방효진 님, ArXiv 2022)

2022. 4. 3. 22:28

Author: Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, Phillip Isola
Paper Link: https://arxiv.org/abs/2203.17274

Page: https://hjbahng.github.io/visual_prompting/

Code: yet.

'AI & RL > Foundation Model' 카테고리의 다른 글

Physical Grounding of LLM (SayTap, RoboCat, Language to Rewards for Robotic Skill Synthesis) (0)	2023.06.22
[북마크] Formal Algorithms for Transformers (DeepMind, 2022) (0)	2022.07.25
[요약] Do Prompt-Based Models Really Understand the Meaning of their Prompts? (Albert Webson, NAACL 2022) (0)	2022.04.26

PREV 1 2 3 4 NEXT

AI & RL

'AI & RL > Causal Inference' 카테고리의 다른 글

'AI & RL > Causal Inference' 카테고리의 다른 글

'AI & RL > Causal Inference' 카테고리의 다른 글

'AI & RL > Causal Inference' 카테고리의 다른 글

'AI & RL > Causal Inference' 카테고리의 다른 글

'AI & RL > Causal Inference' 카테고리의 다른 글

'AI & RL > Reinforcement Learning' 카테고리의 다른 글

'AI & RL > Foundation Model' 카테고리의 다른 글

'AI & RL > Causal Inference' 카테고리의 다른 글

논문 리스트

'AI & RL > Representation Learning' 카테고리의 다른 글

'AI & RL > Foundation Model' 카테고리의 다른 글

'AI & RL > Representation Learning' 카테고리의 다른 글

Summary

Methods

key questions 및 experiment results

'AI & RL > Meta & Multi-Task RL' 카테고리의 다른 글

'AI & RL > Meta & Multi-Task RL' 카테고리의 다른 글

'AI & RL > Real-world (Safe) RL' 카테고리의 다른 글

'AI & RL > Recommender System' 카테고리의 다른 글

'AI & RL > Meta & Multi-Task RL' 카테고리의 다른 글

[NAVER AI Lab하정우 박사님 Weekly arXiv 소개내용 참고]

'AI & RL > Foundation Model' 카테고리의 다른 글

'AI & RL > Foundation Model' 카테고리의 다른 글

티스토리툴바