Computer Science Department,
Indiana University Bloomington,
Luddy Hall, RM 3044,
700 North Woodlawn Avenue,
Bloomington, IN 47408-3901, USA
Before joining IU, I spent a couple of great years at Theory Group, IBM Almaden Research Center,
and Center for Massive Data Algorithmics, Aarhus University.
I obtained my PhD at Department of Computer Science and Engineering, HKUST.
[Home] [CV] [Publication] [Activities]
We are recruiting! For prospective students: If you are interested in working with me as a PhD student, in Algorithms, Databases, and/or Machine Learning,
please send an email to email@example.com. Fully funded RA positions available.
Internship for international undergraduate students. Please contact me first by email.
I am organizing Theory Seminar. If you wish to give a talk at our seminar, or to join our mailing list please contact me.
The 67th Midwest Theory Day at IU Bloomington (April 15-16, 2017).
Theoretical Foundations for Big Data: communication-efficient distributed computation/monitoring, streaming/sketching algorithms, lower bounds.
Applied Areas: algorithms for fundamental problems in databases, data mining, machine learning and bioinformatics.
- Program Committees (selected):
- Yan Song (PhD candidate since 2018)
- Boli Fang (PhD candidate since 2017; co-advising with Prof. David Crandall)
- Nikolai Karpov (PhD candidate since 2017; co-advising with Prof. Cenk Sahinalp)
- Chao Tao (PhD candidate since 2015; co-advising with Prof. Yuan Zhou)
- Haoyu Zhang (PhD 2020; first employment Facebook)
- Jiecao Chen (PhD 2019; first employment Google AI)
Some Papers [ Full List ][ DBLP ]
Collaborative Top Distribution Identifications with Limited Interaction (preliminary full version, 47 pages)
with N. Karpov and Y. Zhou
Proc. IEEE Symposium on Foundations of Computer Science (FOCS 20), to appear. Durham, NC, U.S.A., November 2020.
- We have shown a strong separation on the top-1 arm identification and top-k arm identifications in the collaborative learning model.
Collaborative Learning with Limited Interaction:
Tight Bounds for Distributed Exploration in Multi-Armed Bandits (preliminary full version, 37 pages) [conf. talk]
with C. Tao and Y. Zhou
Proc. IEEE Symposium on Foundations of Computer Science (FOCS 19), to appear. Baltimore, MD, U.S.A., November 2019.
- We conduct a systematic study of the best arm identification problem in the setting of collaborative learning with limited interaction.
- We obtain almost tight round-speedup tradeoffs for both fixed-time and fixed-confidence settings, and show a complexity separation for the two variants.
- We develop two new techniques for proving round lower bounds for multi-agent collaborative learning.
Distributed Partial Clustering (preliminary full version, 20 pages) [conf. talk]
with S. Guha and Y. Li
Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 17), pages 143-152. Washington D.C., U.S.A., July 2017.
Invited to special issue for SPAA 2017 papers in ACM Transactions on Parallel Computing (TOPC)
Best Paper Award
- We give the first systematic study of communication/round-efficient distributed partial clustering, by providing almost tight bounds for partial k-center, k-median, and k-means in both deterministic and stochastic settings.
- As a byproduct, we develop the first algorithms for the partial k-median and k-means objectives that run in subquadratic time.
This follow-up work (NIPS 18) gives a more practical algorithm for distributed clustering with outliers.
Edit Distance: Sketching, Streaming and Document Exchange (preliminary full version, 30 pages) [conf. talk]
with D. Belazzougui
Proc. IEEE Symposium on Foundations of Computer Science (FOCS 16), pages 51-60. New Brunswick, NJ, U.S.A., October, 2016.
- We achieve nearly optimal communication with almost linear time recovery for document exchange.
- We obtain the first exact sketching and streaming algorithms for edit distance with sublinear size/space under small distance thresholds.
- Our technique of aligning strings using multiple random walks may be of independent interest.
Subspace Embeddings and Lp Regression Using Exponential Random Variables (preliminary full version, 27 pages) [conf. talk]
with D. P. Woodruff
Proc. Conference on Learning Theory (COLT 13), pages 546-567. Princeton, NJ, U.S.A., June 2013.
- We propose a unified algorithmic framework that improves the running time of Lp-regression for all p = [1, ∞)\2.
Tight Bounds for Distributed Functional Monitoring (preliminary full version, 50 pages) [conf. talk]
with D. P. Woodruff
Proc. ACM Symposium on Theory of Computing (STOC 12), pages 941-960. New York, NY, U.S.A., May 2012.
- We resolve several fundamental questions in the area of distributed functional monitoring (a.k.a. distributed streaming).
- Surprisingly, the total communication required to keep monitoring a function is often similar to the corresponding one-shot computation!
- A new technique called "composition" is proposed for randomized multiparty communication complexity.
This follow-up work (SODA 14) resolves the communication complexity of distinct elements in all parameters.
Lower Bounds for Number-in-Hand Multiparty Communication Complexity, Made Easy (preliminary full version, 22 pages) [conf. talk]
with J. M. Phillips and E. Verbin
Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA 12), pages 486-501. Kyoto, Japan, January 2012.
Journal version in SIAM Journal of Computing (SICOMP), volume 45, issue 1, pages 174-196, February 2016
- A new technique called "symmetrization" is proposed for randomized multiparty communication complexity.
Optimal Sampling From Distributed Streams (preliminary full version, 25 pages) [talk]
with G. Cormode, S. Muthukrishnan and K. Yi
Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 10), pages 77-86. Indianapolis, IN, U.S.A., June 2010.
Invited to Journal of the ACM (JACM), volume 59, issue 2, pages 10:1-10:25, April 2012 [Link].
- In the distributed streaming model, random sampling is even easier than counting.
The Limits of Buffering: A Tight Lower Bound for Dynamic Membership in the External Memory Model (preliminary full version, 18 pages) [talk]
with E. Verbin
Proc. ACM Symposium on Theory of Computing (STOC 10), pages 447-456. Cambridge, MA, U.S.A., June 2010.
Journal version in SIAM Journal of Computing (SICOMP), volume 42, issue 1, pages 212-229, January 2013
Note: In all papers above, authors are ordered alphabetically.
- We resolve several fundamental problems in the area of external memory data structures.
- For many basic problems, buffering is impossible to achieve in the external memory model with sublogarithmic query time.
- The external memory model is clearner than the RAM model in certain perspectives.