My main areas are database (DB) and data mining (DM), and I also work in machine learning (ML) and high-performance computing (HPC). While I introduce myself to non-CS collaborators/colleagues and the general public as working on Big Data (volume, velocity, C++), Data Science and AI (value, Python libraries), these buzzy phrases do not tell much about what I do compared with the more professional area division in Computer Science. My research group features solid CS fundamental techniques including (1) strong programming skills including parallel/distributed programming, (2) advanced algorithm design skills, and (3) the capability of machine learning model design and multivariate optimization.
In general, my research can be divided into 3 broad directions: systems, algorithms, and machine learning. Below, I review some selected research topics
Systems
New Hardware: Supported by my NSF RII Track-4 award and DOE ECRP (Early Career Research Program) award, I am currently actively exploring algorithms and systems that utilize new hardware such as GPUs, ALCF AI testbed (e.g., Cerebras CS-2), and AWS Trainium to speed up data processing.
• k-Core Decomposition with GPU [ICDE'23]
T-thinker: Big data programming frameworks such as Hadoop and Spark are dominantly designed for IO-bound execution and are only suitable for operations with a low time complexity. However, a lot of real problems have a very high time complexity, and when IO-bound systems are applied, the performance is a catastrophe. I pioneered the T-thinker (or, think-like-a-task) programming paradigm to address compute-heavy problems efficiently. The key design of T-thinker is to expose an explicit task-based divide-and-conquer API to users, in contrast to the existing iterative computation paradigms.
• Programming Paradigm & Tutorial [PPoPP'19], [IEEE BigData'20] • Dense Subgraph Mining (G-thinker) [ICDE'22], [VLDB Journal'22], [VLDB Journal'22], [PVLDB'21], [ICDE'20], [ER'19], [EuroSys'18] • General-Purpose Frequent Pattern Mining (Transaction DB) [ACM TODS'22], [VLDB Journal'22], [ICDE'20], [LSGDA@VLDB'20] • Frequent Subgraph Pattern Mining (Single Graph) [SIGMOD'23] • Decision Trees [ICDE'22] • Tutorial [IEEE BigData'20]