Files
Abstract
Complete processing can rapidly become intractable for extremely large inputs to computational tasks, and sampling is often employed to improve efficiency. In many such applications, sampling is used to estimate features of a population by enumerating features for a subset of that population. This work introduces the idea of sampling for large input proxies (SLIP) for use in problems where solutions depend on problem specific features (PSF). Random samplingschemes are introduced which retain such features for several applications, and approximations of the original input's features consequently can take significantly less time. In addition, these proxies can actually be used as substitute inputs for more efficient computation for some tasks. SLIP is applied to two sorting related problems as well as two data mining tasks (classification and clustering). With different types of inputs from different domains, SLIP provides a general framework to improve performance in several applications with little loss in accuracy.