Research Interests: Overview


My main areas are database (DB) and data mining (DM), and I also work in machine learning (ML) and high-performance computing (HPC). While I introduce myself to non-CS collaborators/colleagues and the general public as working on Big Data (volume, velocity, C++), Data Science and AI (value, Python libraries), these buzzy phrases do not tell much about what I do compared with the more professional area division in Computer Science. My research group features solid CS fundamental techniques including (1) strong programming skills including parallel/distributed programming, (2) advanced algorithm design skills, and (3) the capability of machine learning model design and multivariate optimization.

In general, my research can be divided into 3 broad directions: systems, algorithms, and machine learning. Below, I review some selected research topics



Systems


New Hardware: Supported by my NSF RII Track-4 award and DOE ECRP (Early Career Research Program) award, I am currently actively exploring algorithms and systems that utilize new hardware such as GPUs, ALCF AI testbed (e.g., Cerebras CS-2), and AWS Trainium to speed up data processing.
    • k-Core Decomposition with GPU [ICDE'23]


T-thinker: Big data programming frameworks such as Hadoop and Spark are dominantly designed for IO-bound execution and are only suitable for operations with a low time complexity. However, a lot of real problems have a very high time complexity, and when IO-bound systems are applied, the performance is a catastrophe. I pioneered the T-thinker (or, think-like-a-task) programming paradigm to address compute-heavy problems efficiently. The key design of T-thinker is to expose an explicit task-based divide-and-conquer API to users, in contrast to the existing iterative computation paradigms.
    • Programming Paradigm & Tutorial [PPoPP'19], [IEEE BigData'20]
    • Dense Subgraph Mining (G-thinker) [ICDE'22], [VLDB Journal'22], [VLDB Journal'22], [PVLDB'21], [ICDE'20], [ER'19], [EuroSys'18]
    • General-Purpose Frequent Pattern Mining (Transaction DB) [ACM TODS'22], [VLDB Journal'22], [ICDE'20], [LSGDA@VLDB'20]
    • Frequent Subgraph Pattern Mining (Single Graph) [SIGMOD'23]
    • Decision Trees [ICDE'22]
    • Tutorial [IEEE BigData'20]

     


Pregel-Like Systems (BigGraph@CUHK): Pioneered by Google's Pregel (SIGMOD'10), many Pregel-like systems have been developed featuring a think-like-a-vertex programming paradigm and vertex-centric message passing model for iterative graph computing. My works on Pregel-like systems are among the first and most impactful ones, and our systems have been widely used by other researchers.
    • Cost Model and Algorithm Design [PVLDB'14]; Applications in de novo Genome Assembly [ICDE'18], [IEEE/ACM TCBB'21]
    • Novel Computing Model (Block-Centric) [PVLDB'14]
    • Message Reduction Techniques [WWW'15]
    • Online Query Answering [PVLDB'16], [SIGMOD'16]
    • Out-of-Core Execution [TPDS'18]
    • Fault Tolerance [ICPP'19]
    • System Performance Comparison [PVLDB'15], [SoCC'17]
    • Comprehensive Survey & Tutorial [Communications of the ACM], [Foundations and TrendsĀ® in Databases], [SIGMOD'16], [SpringerBriefs in Computer Science], [Encyclopedia of Big Data Technologies]



Algorithms & Machine Learning


Geospatial Data Management
    • Euclidean Space. Querying/Searching: [CIKM'11], [EDBT'12], [TKDE'14], [KAIS'15]; Mining (co-location patterns) [ICDE'19], [TKDE'23 (to appear)]
    • Road Network (Querying) [PVLDB'11], [ICDE'13], [KAIS'15]
    • Land Surface (Querying) [CIKM'12]
    • Trajectory. Querying: [KAIS'15]; Deep Learning: [KDD'21], [ICDM'21]
    • Transportation Simulation [SIGSPATIAL'22], [IEEE BigData'21], [BTSD@IEEE BigData'21], [IEEE BigData'19], [BTSD@IEEE BigData'19]
    • Flood Extent Mapping on Earth Imagery (Machine Learning) [SDM'23], [SIGSPATIAL'22], [KDD'22], [TIST'22], [KDD'21], [SDM'21], [AAAI'20]


Graph Data Management
    • Subgraph Matching/Counting [SIGMOD'23], [ICDE'23], [VLDB Journal'22], [ICDE'20]
    • Dense Subgraphs. Quasi-cliques: [ICDE'22], [VLDB Journal'22], [PVLDB'21]; k-plexes: [SIGMOD'22]
    • Frequent Subgraph Patterns [SIGMOD'23], [VLDB Journal'22], [ICDE'20]
    • Temporal/Dynamic Graphs [ICDE'23], [DASFAA'17], [KDD'16]
    • Network Alignment [TKDD'22], [IEEE BigData'22], [EMNLP'21]


Uncertain Data Management
    • Top-k & Ranking Queries [DASFAA'11]
    • Data Mining [ACM TODS'22], [ICDE'19], [TKDE'14], [EDBT'12]
    • Spatial Queries [EDBT'12], [TKDE'15]