Papers and Preprints

  1. GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [Slide]
    Han Zhong*, Wei Xiong*, Sirui Zheng, Liwei Wang, Zhaoran Wang, Zhuoran Yang and Tong Zhang, Preprint.

  2. Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
    Chenlu Ye, Wei Xiong, Quanquan Gu and Tong Zhang, Preprint.

  3. Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game [Slide]
    Wei Xiong*, Han Zhong*, Chengshuai Shi, Cong Shen, Liwei Wang, and Tong Zhang, ICLR 2023.

  4. A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Game
    Wei Xiong, Han Zhong, Chengshuai Shi, Cong Shen, and Tong Zhang, ICML 2022.

  5. Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
    Han Zhong*, Wei Xiong*, Jiyuan Tan*, Liwei Wang, Tong Zhang, Zhaoran Wang, and Zhuoran Yang, ICML 2022.

  6. An Alternative Analysis of High-Probability Generalization Bound for Uniformly Stable Algorithms
    Wei Xiong, Yong Lin, and Tong Zhang, Project Report, Not intended for publication.

  7. PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction [Slide]
    Haishan Ye*, Wei Xiong*, and Tong Zhang, Under Minor Revision at TPAMI.

  8. Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization [Code]
    Chengshuai Shi, Wei Xiong, Cong Shen, and Jing Yang, NeurIPS, 2021.

  9. (Almost) Free Incentivized Exploration from Decentralized Learning Agents [Code]
    Chengshuai Shi, Haifeng Xu, Wei Xiong, and Cong Shen, NeurIPS, 2021.

  10. Distributional Reinforcement Learning for Multi-Dimensional Reward Functions
    Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, and Tie-Yan Liu, NeurIPS, 2021.

  11. Decentralized multi-player multi-armed bandits with no collision information [Code*]
    Chengshuai Shi, Wei Xiong, Cong Shen, and Jing Yang, AISTATS, 2020.

(* equal contribution or alphabetical order)

The Code* implementes many SOTA and baseline MPMAB algorithms, which is a nice work of Cindy Trinh.