Motivation for Declarative Parallel Programming
Introduction
As modern multi-core processors evolve there are not only projected to be more cores on each chip in near future, but the chips are also expected to contain multiple types of cores, such as GPGPUs. In order to get additional computational power it is common for multiple such processing nodes to be networked together into a cluster. It is likely that programmers will soon find themselves programming massively parallel machines to achieve high performance in their applications, which will be extremely difficult and error prone, if not impossible, with the current software techniques and practices of writing parallel programs.
A Practical Approach
This research is motivated by the observation that it takes several years to decades for entirely new languages and programming models to become mainstream, while the problem of making scalable parallel programming accessible is upon us with no silver bullet in sight. A much more pragmatic approach is to build on the parallel programming models that are already well established and proven. What these models lack today are maintainability and performance portability on the rapidly changing parallel architectures, which is the what this research attempts to address.
A great amount of effort in writing parallel programs goes into carefully tuning them for a specific architecture. Unfortunately, no sooner than the application gets optimized for a specific platform, it is time to move to a more modern and completely different parallel machine. Consider parallel programs written with MPI, which is the dominant way to write parallel programs with distributed address spaces. Tuning MPI programs involves carefully overlapping communication and computation by strategically placing sends and receives. To ensure best performance the programmer must use the asynchronous variety of communication primitives, placing the communication calls far from where the data will be used. This makes the programs much harder to debug and maintain. Moreover, the amount of overlap is platform dependent. Another optimization technique is to combine multiple small messages. Once again, the tradeoffs between the amount of coalescing and overloading the send / receive queues is dependent on the platform as well as the MPI implementation. Further complication arises when the MPI program needs to be optimized for a cluster of multi-core nodes—an increasingly common occurrence.
This research aims to develop a parallel programming technique that will effectively separate the computation and communication specifications, thus providing a declarative approach to parallel programming, while retaining the benefits of using the most optimized and highly tuned communication libraries, such as MPI. One way to think about this approach is by comparing it to the parser-generation tool yacc that cleanly separates the specification of the language and the actions that need to be taken upon parsing. Such a separation makes it possible to map the parallel program effectively and efficiently on to a distributed memory cluster, a shared memory machine, or a hybrid cluster containing both. Since parallelism is explicitly specified, programmers, who have knowledge at the algorithmic levels, can control data and task distribution. At the same time they are relieved of platform-specific optimizations that can be performed by a compiler. Furthermore, the declarative specification makes it possible to equip the compiler with techniques to automatically partially verify the correctness of the program, e.g., by ensuring that the specifications do not lead to deadlocks.
Goals
- Developing a declarative syntax to specify communication and computation separately.
- Developing a compiler to automatically translate the declarative specification into code that uses lower-level libraries, such as MPI or threads.
- Developing automatic verification techniques.