Files
Abstract
Machine learning has become one of the hottest topics in the last decade and is a driver of the current state of the art in various fields. In this study, we used machine learning techniques to integrate different frequency telemetry data and transcriptomic data for predicting and understanding malaria parasite infection. A python API was developed to query and download different data types stored in a cloud-based relational database. A deep neural network (DNN) model was built to predict pre-infection and the liver stage of malaria infection by using raw electrocardiogram (ECG) data. The ECG data were grouped into 24 hours of a day according to its recording time and then fed into the DNN model separately. The classification results varied largely with different ECG groups that recorded during different hours. The ECG recordings from H21 have the highest prediction F1 score of 0.831 and 0.781 on the validation and independent testing data set, respectively. While the classification accuracy dropped to 60% level with the ECG data recorded during daytime hours. Generally, the DNN model performed much better with the ECG data recorded during nighttime hours than daytime. We further proved that there were true ECG signal changes in early liver stage by integration of accelerometer data to the DNN model. We found that the differences of the model prediction accuracy with different ECG data groups were highly associated with thecorresponding accelerometer data of the hosts, which indicated that activity could impact the stability of ECG signals, thus affecting the DNN model prediction performance. Furthermore, the differential expression and gene set enrichment analysis of transcriptomic data showed the disruption of circadian rhythms in the liver stage. The disruption of circadian rhythms provided an intrinsic reason for cardiac involvement in malaria parasite infections.