Parallel Reinforcement Learning with Communication and Adaptivity Constraints

Collaborative Research: AF: Small: Parallel Reinforcement Learning with Communication and Adaptivity Constraints

PI : Qin Zhang
NSF CCF-2006591 (October 2020 - September, 2023)

Abstract

Reinforcement learning has witnessed great research advancement in recent years and achieved successes in many practical applications. However, reinforcement-learning algorithms also have the reputation for being data- and computation-hungry for large-scale applications. This project will address this issue by studying the important question of how to make reinforcement-learning algorithms scalable via introducing multiple learning agents and allowing them to collect data and learn optimal strategies collaboratively. The outcomes of this project will have impacts on numerous areas where reinforcement learning is used at a scale, e.g., multi-phase clinical trials, training autonomous-driving algorithms, crowdsourcing tasks, pricing, and assortment optimization for stores at different locations.

Papers

Communication-Efficient Collaborative Best Arm Identification
with N. Karpov
Proc. AAAI Conference on Artificial Intelligence (AAAI 23). Washington, D.C., USA. February 2023.
Meta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control.
B. Fang, Z. Peng, H. Sun, and Q. Zhang
Proc. International Joint Conference on Neural Networks (IJCNN 22). Hybrid conference, July 2022.
Instance-Sensitive Algorithms for Pure Exploration in Multinomial Logit Bandit
with N. Karpov
Proc. AAAI Conference on Artificial Intelligence (AAAI 22). Virtual conference, February 2022.
Variance-Dependent Best Arm Identification
P. Lu, C. Tao, and X. Zhang
Proc. Conference on Uncertainty in Artificial Intelligence (UAI 21). Virtual conference, July 2021.
Near-Optimal MNL Bandits Under Risk Criteria
G. Xi, C. Tao, and Y. Zhou
Proc. AAAI Conference on Artificial Intelligence (AAAI 21). Virtual conference, February 2021.
Batched Coarse Ranking in Multi-Armed Bandits
with N. Karpov
Proc. Annual Conference on Neural Information Processing Systems (NeurIPS 20). Virtual conference, December 2020.
Collaborative Top Distribution Identifications with Limited Interaction (preliminary full version, 47 pages)
with N. Karpov and Y. Zhou
Proc. IEEE Symposium on Foundations of Computer Science (FOCS 20). Virtual conference, November 2020.
- We have shown a strong separation on the top-1 arm identification and top-k arm identifications in the collaborative learning model.
Multinomial Logit Bandit with Low Switching Cost (preliminary full version, 25 pages)
with K. Dong, Y. Li, and Y. Zhou
Proc. International Conference on Machine Learning (ICML 20). Virtual conference, July 2020.

Educational and Other Development

Internships through Global Talent Attraction Program at IU Bloomington

A python library for bandit algorithms:

banditpylib

Personnel

PI: Qin Zhang
PhD student: Nikolai Karpov
PhD student: Chao Tao
PhD student: Boli Fang