Home News Publications Projects CV About

Kai-Cheng Yang


Hi! I'm Kai-Cheng Yang (杨凯程), the pronunciation is KY-cheng YAHNG. I also go by Kevin.

I'm a third year Ph.D student in Informatics at School of Informatics, Computing, and Engineering in Indiana University Bloomington. I mainly work with Filippo Menczer, Yong-Yeol Ahn and Brea L. Perry. Check out the Projects and Publications sections for what I have been working on.

Before joining the Ph.D program at IU, I received my bachelor and master degree in theoretical physics from Lanzhou University in China.


  • Nov 10, 2019: Our paper Scalable and Generalizable Social Bot Detection through Data Selection has been accepted by AAAI-20.
  • Oct 28, 2019: Our paper Co-prescription network reveals social dynamics of opioid doctor shopping has been published by PLOS ONE.
  • Oct 18, 2019: Our paper BotSlayer: real-time detection of bot amplification on Twitter has been published by The Journal of Open Source Software.
  • Sep 26, 2019: Our Editors' Suggestion paper Advantage of being multicomponent and spatial: Multipartite viruses colonize structured populations with lower thresholds has been published by Physical Review Letters.
  • Sep 13, 2019: BotSlayer is now under public Beta testing.
  • Feb 26, 2019: Our paper Bot Electioneering Volume: visualizing social bot activity during elections has been accepted by CyberSafety 2019. I will give a short presentation in the workshop hosted at San Francisco this May.
  • Jan 7, 2019: Our paper Arming the public with AI to counter social bots has been accepted by Human Behavior and Emerging Technologies.
  • Nov 20, 2018: Our paper The spread of low-credibility content by social bots has been published in Nature Communications.


  1. Yang, K. C., Varol, O., Hui, P. M., & Menczer, F. (2020). Scalable and Generalizable Social Bot Detection through Data Selection. Accepted by AAAI-20.
  2. Perry, B., Yang, K. C., Kaminski, P., Odabas, M., Park, J., Martel, M., Oser, C., Freeman, P., Ahn, Y. Y., & Talbert, J. (2019). Co-prescription network reveals social dynamics of opioid doctor shopping. PLOS ONE, 14(10), p.e0223849.
  3. Hui, P. M., Yang, K. C., Torres-Lugo, C., Monroe, Z., McCarty, M., Serrette, B. D., Pentchev, V., & Menczer, F. (2019). BotSlayer: real-time detection of bot amplification on Twitter. The Journal of Open Source Software, 01706.
    Media Coverage: IU news | IDS news | Science Node
    DOI | GitHub
  4. Zhang, Y. J., Wu, Z. X., Holme, P., & Yang, K. C.(2019). Advantage of being multicomponent and spatial: Multipartite viruses colonize structured populations with lower thresholds. Physical Review Letters 123(13), 138101. (Editors' Suggestion)
  5. Yang, K. C., Hui, P. M., & Menczer, F. (2019). Bot Electioneering Volume: visualizing social bot activity during elections. Companion Proceedings of WWW ’19. pp. 214–217.
    DOI | arXiv
  6. Yang, K. C., Varol, O., Davis, C. A., Ferrara, E., Flammini, A., & Menczer, F. (2019). Arming the public with artificial intelligence to counter social bots. Human Behavior and Emerging Technologies. 2019;e115.
    DOI | arXiv
  7. Shao, C., Ciampaglia, G. L., Varol, O., Yang, K. C., Flammini, A., & Menczer, F. (2018). The spread of low-credibility content by social bots. Nature Communications, 9(1), 4787.
    Media Coverage: IU news | ScienceNews | MIT Technology Review
  8. Yang, K. C., Wu, Z. X., Holme, P., & Nonaka, E. (2017). Expansion of cooperatively growing populations: Optimal migration rates and habitat network structures. Physical Review E, 95(1), 012306.


1. Social bots

Social bots which are algorithm controlled social media accounts that automatically post/share contents and initiate interactions with other users. This project aims to build public available bot detection tool and study the behavior and impact of social bots on various social events.


Botometer is a machine learning tool that extracts over 1000 different features from a Twitter account and evaluates its likelihood of being social bot. Currently Botometer is handling over 250,000 requests every day and serves as the foundation for many researches.

Contribution: maintaining, training data annotation and model retraining

Botometer | Paper | Dataset


BotometerLite is a light version Botometer. Using a minimal set of features, BotometerLite can perform bot detection for the Twitter Firehose volume in real time with just a desktop. With novel evaluation system and model selection method, BotometerLite is able to achieve comparable accuracy with Botometer. Because of the simplified design, it becomes possible to interpret BotometerLite's results.

Paper | Dataset

Bot Electioneering Volume

BEV is a tool that visualizes the activity of likely bots on Twitter around the 2018 US midterm elections. It allows to explore how active bots are on a daily basis in efforts to influence online discourse about the elections. It also shows what topics are being targeted by likely bots.

BEV | Paper

Human perception of social bots

So far, studies of social bots have largely been conducted in computational perspectives. How social media users perceive social bots and a series of related questions remain unclear. In this human subject research project we use experimental design to understand social media users' perception towards social bots. We also characterize the effect of human biases on the efficacy of social bot detection task.

2. Bad actors on social media

This project aims to study various malicious behaviors on social media.


BotSlayer is an application that helps track and detect potential manipulation of information spreading on Twitter. Equipped with BotometerLite and newly developed algorithms, BotSlayer is able to detect coordinated amplification by likely bots in real time. BotSlayer is free and easy to install. With some simple configuration, everyone can have a customized instance running in the could. BotSlayer is under public Beta testing now.
We also provide an open source version called BotSlayer-CE.

BotSlayer | Paper | BotSlayer-CE | IU news | IDS news | Science Node


Hoaxy is a tool that visualizes the spread of fake news and related fact checking articles on Twitter. With the incorporation of Botometer, Hoaxy can also visualize the bot-like activities involved in the spread of the articles.

Contribution: maintaining, developing API for Hoaxy to fetch Botometer scores

Hoaxy | Paper

3. Opioid drug doctor shopping

Doctor shopping refers to the behavior of visiting multiple physicians to obtain controlled substance. This project aims to characterize the doctor shopping behavior for opioid drugs in response to the severe opioid crisis in US using 11 years’ longitudinal medical records from over 20 million patients.

Network prominence indicates drug seeking behavior

Traditional methods for identifying drug seeking behavior focus on each patient's medical history individually. Typical criteria involves the number of different prescriber, visits of different pharmacies and total drug dose in certain time period. Our analysis shows such type of methods has become less useful as the patients are intentionally altering their behaviors to avoid being spotted by those methods. This project tends to utilize social network analysis to identify drug seeking behavior which has proven to be very effective and harder to trick.


Geographical characterization of doctor shoppers

Doctor shoppers are people that visit multiple physicians to obtain multiple prescriptions of controlled substances. The opioid doctor shoppers have been found to be more likely to overdose leading to the ever severer opioid crisis in US. The project intends to apply computational methods to over 9 years of longitudinal medical records from a large group of patients to characterize the geographic related behaviors of doctor shoppers.

Medical diagnosis embedding

Word2vec is applied to large scale of medical records to find a distributed representation of the diagnoses. The embedding can effectively reduce the dimensions needed to encode all the diagnoses therefore serves as a preprocessing step for other machine learning tasks. Besides, the embedding itself can reveal interesting relationship between diagnoses.

4. Spread of population

The Spread of Multipartite Viruses on Complex Networks

Multipartite viruses have a genome divided into different disconnected viral particles. A majority of multipartite viruses infect plants; very few target animals. To understand why, we utilize a simple network-based susceptible-latent-infectious-recovered model. We show both analytically and numerically that, provided that the average degree of contact exceeds a critical value, even in the absence of explicit microscopic advantage, multipartite viruses have a lower threshold to colonizing network-structured populations, in comparison to the case of a well-mixed population. We corroborate this finding further on two-dimensional lattice networks, better representing contact structures typical of plants. Our work therefore provides a potentially promising perspective from the point view of network epidemiology in understanding factors promoting multipartitism among plant viruses.



Site created by Kai-Cheng Yang. Powered by Bootstrap, Vue.js, Vue ScrollTo, Hoever.css, Font Awesome and Academicons.