We come to Pipelining. Pipelining is a way to improve the efficiency of a processor.
The idea is to execute several commands in parallel, thus using the resources more effectively. As mentioned previously, the command consists of several phases: Fetch, Read, Execute, Write.
In theory, four commands could be pipelines through the processor at once, each in a different phase.
Problems: Structure hazards (a command requires exlusive hardware access), data hazards (two commands share a data dependancy) and control hazards (conditional jumps that depend on earlier commands).
Solution for Control Hazard
- Since the problem hinges on conditional statements not being ready before the next command requires them, why not fill the intervening space with several independent commands? These commands may do something useful, avoiding the waste of cycles, while the condition test cycle is waiting for completion. Only when it is complete, the actual jump is performer.
- Other CPUs may try to predict the conditional jump - or, in case of multiple processors, even execute both at once. In the case of a long loop, for example, it may save much time to just assume the condition will return true and start the next cycle before the condition has actually been tested. The extra cycles that have to be discarded are less of a performance hit than it would have been to wait for the condition test in each cycle of the loop.
These solutions are implemented by the compiler as it translates the program to machine code.
Vector Computer
Vector Processors are optimized for, strangely enough, vector computing. A vector is like an array: It is a set of values of equal type. Calculations on a vector usually have to be performed on each element in turn - independently.
In a normal processor, this would require loading each cell of the vector into the register, loading the command that changes it, and making the change.
In a vector processor, the whole vector can be processed at once with a single command. The processor will see to it that the command is executed on all elements of the vector. Needless to say, this is much quicker.
Vectors are loaded into a "vector register" - all elements at once. Using pipelining, the independent operations are then applied to each element in turn.
We have four problems.
Backwards dependency and forwards anti-dependency.
1. A cell is dependent on previous cells that are not yet updated. This happens when several commands are in the vectorized loop, and an earlier command relies on previous values set later in the cycle, but in earlier cycles.
Think of vectorization as a reordering from
CYCLE 1 CMD A
CYCLE 1 CMD B
CYCLE 1 CMD C
CYCLE 2 CMD A
CYCLE 2 CMD B
CYCLE 2 CMD C
CYCLE 3 CMD A
CYCLE 3 CMD B
CYCLE 3 CMD C
into
CYCLE 1 CMD A
CYCLE 2 CMD A
CYCLE 3 CMD A
CYCLE 1 CMD B
CYCLE 2 CMD B
CYCLE 3 CMD B
CYCLE 1 CMD C
CYCLE 2 CMD C
CYCLE 3 CMD C
Suddenly, it becomes very important at what
position in the cycle the command was. All earlier commands are executed before all later commands, and may not depend on them.
The two problems therefore are when a command assumes that another command later in the cycle was already executed for previous cycles (backward dependency) or when a command assumes that another command earlier in the cycle was not yet executed for later cycles (forward anti-dependancy).
Symmetric Multiprocessing
Several processors share the same memory and cache.
NUMA
Each CPU has its own memory.
Flynn
We finally come to the classification according to Flynn. This was developed back in '66 and is still the most popular thing today.
We describe only command and data streams.
Each can be Single or Multiple.
SISD (single instruction, single data) is the traditional von Neumann principle.
SIMD. Multiple data means that the processor can perform several calculations at once from the same command. A vector computer does this.
MISD. Effectively paradoxical, never implemented yet. The idea is to perform several instructions on the same set of data, which is operated on in a kind of assembly line. This may be used in:
MIMD. Several data and several instruction streams. Dual core or other multiple processor machines where the processors work independently, processing both their own commands and their own data. Highly scalable. Done by connecting several SISD processors.