I/O considerations in efficient heterogeneous data distributions

Nakazawa, Mario

I/O considerations in efficient heterogeneous data distributions

Nakazawa, Mario

2005

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Large scale scientic applications can take a signicant amount of time when run sequentially.Parallelism is one way to reduce their execution time. Two key problems in efciently running aprogram in parallel on a distributed memory architecture are (1) scheduling the application and(2) distributing the work so as to simultaneously minimize the time spent in communication andin local processing on each node. The input to these programs can be very large, and they areincreasingly likely to run on a heterogeneous architectureboth situations make it likely that theywill need to access the disk while running, incurring I/O costs. Ignoring these costs can greatlydegrade performance because they are typically an order of magnitude more expensive than mes-sage passing latencies or cache misses.We determined using a simulator that I/O-awareness in gang scheduling can increase throughputand reduce turnaround times of applications by as much as a factor of three. This result motivatedus to develop a runtime system that solves the data distribution problem for applications runningon a heterogeneous cluster of machines. This dissertation describes two components of this systemwe implemented: (1) the search mechanism, and (2) the computational model it uses to evaluateeach candidate distribution. An exhaustive search is computationally intractable, so we simpliedthe problem by assuming a monotonic relationship between a data distribution and the resultingapplication execution time. Our search algorithm,GBS,whichis optimalwe callin solving thissimplied problem. We also developed a computation model called MHETA, which is used asan evaluation function by the search algorithm. The model integrates an applications structuralinformation and instrumented measurements to generate a predicted execution time given an inputdistribution. In our experimental test bed consisting of four scientic applications on 17 emulatedarchitectures,GBSon average produces distributions within 5% of the optimal within one secondof running. We also show that MHETA is on average at least 97% accurate in its predictions,indicating thatGBSthealgorithm (with MHETA) successfully nds effective data distributions fora parallel application running on a heterogeneous cluster on the y.INDEXWORDS :Computation models, Data distributions, Scheduling, I/O Awareness,Heterogeneous architecturesI/O CONSIDERATIONSIN EFFICIENTHETEROGENEOUS DATA DISTRIBUTIONSbyMARIO NAKAZAWAB.A. University of Pennsylvania, 1994A Dissertation Submitted to the Graduate Facultyof The University of Georgia in Partial Fulllmentof theRequirements for the DegreeDOCTOROF PHILOSOPHYATHENS ,GEORGIA2005

Details

Record ID

19825

Record Created

2024-12-05

Title

I/O considerations in efficient heterogeneous data distributions

Author

Nakazawa, Mario

Contributor

Lowenthal, David K. Advisor
Bhandarkar, Suchendra Committee Member
Kraemer, Eileen Committee Member

College or School

College of Engineering

Department

School of Computing

Date

2005

Publisher

University of Georgia

Content Type

Dissertation

Language

English

Dissertation/ Thesis Note

Doctoral

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia, Spring 2005

Year Degree Granted

2005

Keywords

Computation models; Data distributions; Scheduling; I/O Awareness; Heterogeneous architectures

Record Appears in

Electronic Theses and Dissertations > Doctoral Dissertation
College of Engineering
All Resources
Doctoral

System Control Number

9949333262702959

PDF

Statistics

Download Full History