Introduction

Computer Languages

To date programming languages remain the most effective way to communicate with a computer. In fact, a language is an elegant way to communicate—humans have developed and refined languages for hundreds of years for communication. Human languages have reached a high level of sophistication and artistry. Computer languages, on the other hand, are (still) utilitarian. Part of the reason is the need for computer languages to be unambiguous, which is in turn dictated by the current limitations of software and hardware technologies.

If you talk to most computer scientists they will argue for computer languages that are as close to mathematical descriptions of algorithms as possible. Computer scientists tend to have a predilection for languages that utilize a minimal set of constructs, which is often equated with a certain “elegance”. This is a different elegance from the kind you find in natural languages—this kind of elegance is a direct descendent of the elegance you find in math and a consequence of scientists' fascination with “Occam Razor”.

Computers are no longer the tools only for scientists and engineers. So, one could argue in favor of computer languages that are not necessarily rooted in math, but are designed to make tasks for specific types of users easier to express. For example, imagine a language that allows you to talk about scores, tones, chords, etc. Musicians would be at home using such a language to interact with computers, rather than having to use, say Scheme. Such languages are called “domain-specific languages.” A good example of a language that is specific to a domain and that has proved to be extremely popular is SQL.

We could go on and on discussing the relative merits of different languages, but there is one inescapable fact: we need a tool to translate a language into a form that a computer can understand. The need for such a tool is obvious. Computer hardware is usually designed to understand very low-level simple instructions for the purposes of making it widely usable. The tool that carries out this translation is a compiler. Indeed, a compiler is responsible for instilling meaning into the constructs of a language. In this way, compilation is at the heart of computing.

In this course we will study compilation techniques for procedural languages. What is the difference between procedural and declarative languages? Why would you prefer one to the other? Almost all contemporary computer hardware implement procedural architectures. This means that irrespective of the language you start with, at some point in the translation process the compiler must make the transition to procedural instructions. Often the compilers will make this transition sooner in the translation process than later. Gaining a good understanding of compilation techniques is essential to understanding the link between software and hardware, and hence between software and its performance.

Compilers

In its simplest form a compiler translates a source program into a target program. Typically the source language is a programming language such as Fortran, C++, Java, or ML. The target language is usually the instruction set of a computer system. This does not always need to be the case.

Instead of generating instructions for a computer system if the compiler executes those instructions and outputs the results the compiler becomes an interpreter.
If the target language is the same as the source language the compiler is called a "source-to-source" compiler. Such source-level compilers are common for automatically translating a sequential program into a parallel program in the same language. Another example of a source-level compiler is one that reformats the source to make it conform to a typographical convention and / or more readable by humans. Such compilers are often called “pretty-printers”.
The target language need not be instructions for a computer system. Sometimes the target language may be another “high-level“ language. One example is f2c that converts Fortran into C. Another example is converting MATLAB to C or Fortran to gain performance.

The basic compilation technology has been around since the 1950s when Fortran was developed, along with a Fortran compiler. If all you care about is correctness of translation then the technology is well established and widely available. Things get interesting when you care not only about correctness but also performance.

The definition of performance can vary with the context. Very often performance means running time of an application. Sometimes it can also refer to the memory footprint, especially when memory requirements of the computation exceed the available main memory. It is often the case that decreasing an application's memory footprint also improves its runtime performance. Why? Almost all current research in language translation, i.e., compilation, is concerned with improving performance.

We can ask the following question: What are the obstacles to achieving maximal performance? We need to understand what we mean by “maximal performance” and what prevents an application from achieving that maximal performance.

Focus on scientific applications
Causes of performance problems: memory hierarchy, redundancies.
Solving memory hierarchy problem: register allocation, rewriting loops for cache reuse
Eliminating computational redundancies: common sub-expressions, redundant memory accesses, dead-code (spatial redundancy)
Replacing an operation by an equivalent cheaper operation: strength reduction
Impossible to avoid compilers? VLIW instruction sets, complex architectures (out of order execution, non-uniform execution times, deep memory hierarchies, multi-core processors)

Parallel Processing

One way to improve performance of an application is parallel processing. It is obvious that if we are able to execute parts of an application concurrently on multiple processors then we can potentially gain in runtime performance. If the multiple processors also bring their own main memories then this provides us a way of accommodating larger amounts of data in memory.

There is another reason parallel processing is of increasing interest—CPU clock wall.

The above graph was published in an article by Herb Sutter. It is interesting to note that after sustained exponential growth the clock frequencies have suddenly flattened. An exponential growth would have meant the current processors being somewhere around 10-20 GHz. The current highest-speed processors are no more than 4 GHz. As a result even mainstream processors now have multiple cores (multiple identical CPUs) on a single chip.

Of course, one form or another of parallelism has existed for years, even decades, now. What is new is that parallelism now pervades the entire hierarchy of computational units, from the single chip up to wide-area computational grids. This pervasiveness of parallelism introduces several challenges on how to program these highly complex systems.

Challenges

The pervasiveness of parallelism poses several challenges. The two main issues from the perspective of writing, or generating, parallel programs are the amount of parallelism in an application and the locality. The amount of parallelism is a measure of how many computations can be executed in parallel without changing the meaning of the program. “Locality” refers to the quick availability of data when it is needed. The issue of locality comes up at several levels, including registers, cache memory, as well as main memory when the several independent machines cooperate to execute a task.

The technology developed for automatic parallelization over the past several years has been quite successful in discovering and extracting parallelism, but the success in maintaining good locality has been far less spectacular. Additionally, several other challenges remain. Some of the open issues include:

How do you communicate with a parallel computer? In other words, do we need a new programming paradigm?
As the computational power multiplies with wide availability of parallelism, what should be our definition of “high or good performance?” There has been a debate on the peak performance vs sustained performance.
Where is the human factor? We are now truly at the beginning of an era of pervasive computing. This is, of course, parallelism in another guise. Computing is a tool for humans (until computers take over the world) and the efficacy of a tool is measured by its effectiveness in improving productivity. Will this mean finally breaking from the procedural paradigm just as we broke away from assembly-level programming? This indicates an increasing level of responsibility on compilers.

Road-map

We will cover the following topics in this course, although not necessarily in this order.

Overview of parallelism at various levels.
Overview of compilation.
Basic concepts for automatic parallelization.
Some techniques of automatic parallelization.
Fundamental optimization techniques for procedural programming.

Arun Chauhan