Collaborative Research: AF: Small: Parallel Reinforcement Learning with Communication and Adaptivity Constraints

PI : Qin Zhang
NSF CCF-2006591 (October 2020 - September, 2023)


Reinforcement learning has witnessed great research advancement in recent years and achieved successes in many practical applications. However, reinforcement-learning algorithms also have the reputation for being data- and computation-hungry for large-scale applications. This project will address this issue by studying the important question of how to make reinforcement-learning algorithms scalable via introducing multiple learning agents and allowing them to collect data and learn optimal strategies collaboratively. The outcomes of this project will have impacts on numerous areas where reinforcement learning is used at a scale, e.g., multi-phase clinical trials, training autonomous-driving algorithms, crowdsourcing tasks, pricing, and assortment optimization for stores at different locations.


  1. Batched Mean-Variance Bandits
    B. Fang
    Proc. International Conference on Pattern Recognition (ICPR 22). Hybrid conference, August 2022.

  2. Meta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control.
    B. Fang, Z. Peng, H. Sun, and Q. Zhang
    Proc. International Joint Conference on Neural Networks (IJCNN 22). Hybrid conference, July 2022.

  3. Instance-Sensitive Algorithms for Pure Exploration in Multinomial Logit Bandit
    with N. Karpov
    Proc. AAAI Conference on Artificial Intelligence (AAAI 22). Virtual conference, February 2022.

  4. Variance-Dependent Best Arm Identification
    P. Lu, C. Tao, and X. Zhang
    Proc. Conference on Uncertainty in Artificial Intelligence (UAI 21). Virtual conference, July 2021.

  5. Near-Optimal MNL Bandits Under Risk Criteria
    G. Xi, C. Tao, and Y. Zhou
    Proc. AAAI Conference on Artificial Intelligence (AAAI 21). Virtual conference, February 2021.

  6. Batched Coarse Ranking in Multi-Armed Bandits
    with N. Karpov
    Proc. Annual Conference on Neural Information Processing Systems (NeurIPS 20). Virtual conference, December 2020.

  7. Collaborative Top Distribution Identifications with Limited Interaction (preliminary full version, 47 pages)
    with N. Karpov and Y. Zhou
    Proc. IEEE Symposium on Foundations of Computer Science (FOCS 20). Virtual conference, November 2020.

    • We have shown a strong separation on the top-1 arm identification and top-k arm identifications in the collaborative learning model.

  8. Multinomial Logit Bandit with Low Switching Cost (preliminary full version, 25 pages)
    with K. Dong, Y. Li, and Y. Zhou
    Proc. International Conference on Machine Learning (ICML 20). Virtual conference, July 2020.

Educational and Other Development

Internships through Global Talent Attraction Program at IU Bloomington

A python library for bandit algorithms: