Files
Abstract
Dataset sizes are growing rapidly, so it is very important to be able to efficiently model and transfer large datasets over the network. In this thesis, we have addressed some of the issues involved by presenting the GlycoVault Data Transfer Module (GDaTM), which is implemented using some of the latest technologies for effectively modeling and transferring large datasets. Transfer of large datasets goes hand in hand with data storage, which is used to store the transferred data. We have conducted a meta analysis comparing different types of database technologies as well as experiments comparing the performance of database insertions and retrievals for two types of data stores. We have also conducted experiments comparing various means of data transfer, including multi-part vs. streaming and with vs. without compression. We have also compared two well-known data serialization and deserialization APIs. Lastly, we have analyzed alternative data stores, including Scalable SQL, NoSQL and combinations of both.