Processing large-scale multi-dimensional data in parallel and distributed environments
Title | Processing large-scale multi-dimensional data in parallel and distributed environments |
Publication Type | Journal Articles |
Year of Publication | 2002 |
Authors | Beynon M, Chang C, Catalyurek U, Kurc T, Sussman A, Andrade H, Ferreira R, Saltz J |
Journal | Parallel Computing |
Volume | 28 |
Issue | 5 |
Pagination | 827 - 859 |
Date Published | 2002/05// |
ISBN Number | 0167-8191 |
Keywords | Data-intensive applications, Distributed computing, Multi-dimensional datasets, PARALLEL PROCESSING, Runtime systems |
Abstract | Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments. |
URL | http://www.sciencedirect.com/science/article/pii/S0167819102000972 |
DOI | 10.1016/S0167-8191(02)00097-2 |