Foundation of Communication-Efficient Distributed Computation and Monitoring

CAREER:Foundation of Communication-Efficient Distributed Computation and Monitoring

PI : Qin Zhang
NSF CCF-1844234 (June 2019 - May 2024)

Abstract

Through the massive use of mobile devices, data clouds, and the rise of the Internet of Things, large amounts of data have been generated, digitized, and analyzed for the benefit of society. As data are often collected and maintained at different sites, communication has become necessary for nearly every computational task. Moreover, decision makers naturally want to maintain a centralized view of all the data in a timely manner, which requires frequent queries on the distributed data and, in the extreme, continuous monitoring of the query output. The cost of communication has naturally become the bottleneck for such applications. This project aims to develop communication-efficient solutions for distributed computation and monitoring.

This project targets three fundamental aspects of distributed computation: (1) the tradeoffs between the communication cost and the number of rounds of the computation in distributed one-shot computation, (2) the power of data partitioning, and (3) the connections between distributed one-shot computation and continuous monitoring.

Papers

Robust Statistical Analysis on Streaming Data with Near-Duplicates in General Metric Spaces
Q. Zhang
Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 25), to appear. Berlin, Germany, June 2025.
Breaking the \(\log\left({1}/{\Delta_2}\right)\) Barrier: Better Batched Best Arm Identification with Adaptive Grids
with T. Jin and D. Zhou
Proc. International Conference on Learning Representations (ICLR 25), to appear. Singapore, April 2025.
Quantum Data Sketches
Q. Zhang and M. Heidari (by contribution)
Proc. International Conference on Database Theory (ICDT 25), to appear. Barcelona, Spain, March 2025.
- We propose succinct quantum data sketches to support basic database operations such as search and selection.
- We view our work as an initial step towards the development of quantum data management model.
Parallel Best Arm Identification in Heterogeneous Environments
with N. Karpov
Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 24), pages to appear. Nantes, France, June 2024.
- We have shown a separation between the homogeneous CL model and the heterogeneous CL model for adaptive algorithms .
Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits
with N. Karpov
Proc. AAAI Conference on Artificial Intelligence (AAAI 24). Vancouver, Canada. February 2024.
MinJoin++: A Fast Algorithm for String Similarity Joins under Edit Distance
with N. Karpov and H. Zhang
VLDB Journal. Published in August 2023.
Communication-Efficient Collaborative Best Arm Identification
with N. Karpov
Proc. AAAI Conference on Artificial Intelligence (AAAI 23). Washington, D.C., USA. February 2023.
SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins.
with N. Karpov
Proc. International Conference on Very Large Databases (VLDB 23). Vancouver, Canada. August-September 2023.
Collaborative Top Distribution Identifications with Limited Interaction (preliminary full version, 47 pages)
with N. Karpov and Y. Zhou
Proc. IEEE Symposium on Foundations of Computer Science (FOCS 20). Virtual conference, November 2020.
- We have shown a strong separation on the top-1 arm identification and top-k arm identifications in the collaborative learning model.
Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits (preliminary full version, 37 pages) [conf. talk]
with C. Tao and Y. Zhou
Proc. IEEE Symposium on Foundations of Computer Science (FOCS 19), pages 126-146. Baltimore, MD, U.S.A., November 2019.
- We conduct a systematic study of the best arm identification problem in the setting of collaborative learning with limited interaction.
- We obtain almost tight round-speedup tradeoffs for both fixed-time and fixed-confidence settings, and show a complexity separation for the two variants.
- We develop two new techniques for proving round lower bounds for multi-agent collaborative learning.
MinJoin: Efficient Edit Similarity Joins via Local Hash Minima
with H. Zhang
Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 19) (oral presentation), pages 1093-1103. Anchorage, AK, U.S.A., August 2019.
Distributed and Streaming Linear Programming in Low Dimensions
with S. Assadi and N. Karpov
Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 19), pages 236-253. Amsterdam, The Netherlands, June-July, 2019.
Invited to the special issue for PODS 2019 in ACM Transactions on Database Systems (TODS).

Educational and Other Development

A Trilogy of Courses:

Code library for distributed algorithms:

KSummaries

Personnel

PI: Qin Zhang
PhD student: Nikolai Karpov
PhD student: Chao Tao
PhD student: Haoyu Zhang