Comparison of machine learning techniques to predict missing cyanobacteria data and trophic states of lakes

Luthra, Priyanka

Luthra, Priyanka

2017

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Cyanobacteria are harmful blue green algae that deplete oxygen level of lakes and release toxins that adversely affect aquatic and human life. Hence, cyanobacteria monitoring is crucial. Scientists use satellite data to track cyanobacteria concentration in lakes. However, cloud cover and fog hinder satellite data collection, thereby creating a need to forecast the missing data values. We investigate machine learning approaches on 10 years of satellite data of 99 lakes in South East US. We formulate the missing data problem as a classification problem, and we compare performance of various classifiers which utilize historical data of other lakes for classification. In addition, we conduct a spatio-temporal analysis wherein we leverage matrix factorization techniques to predict missing data. We achieve 88.9% accuracy with Random Forests for 5% data missing from target lake, and observe that Random Forest and k Nearest Neighbors are highly effective to combat missing data problem.

Details

Record ID

18063

Record Created

2024-12-05

Title

Comparison of machine learning techniques to predict missing cyanobacteria data and trophic states of lakes

Author

Luthra, Priyanka

Contributor

Ramaswamy, Lakshmish Advisor
Bhandarkar, Suchendra M. Committee Member
Rasheed, Khaled Committee Member

College or School

Computer Sciences

Date

2017

Publisher

University of Georgia

Content Type

Thesis

Language

English

Dissertation/ Thesis Note

Graduate

Degree Type

Master of Science (MS)

Name of Granting Institution

University of Georgia, Winter 2017

Year Degree Granted

2017

Keywords

Machine learning; Classification; Missing data; Cyanobacteria; Non-negative Matrix Factorization; kNN; Random Forests

Record Appears in

Electronic Theses and Dissertations > Graduate Thesis
All Resources

System Control Number

9949333401902959

PDF

Statistics

Download Full History