Computer architecture
File Management
Submitted by Arancaytar on Tue, 07/11/2006 - 09:40 – No commentsFile Systems save their data in files.
They consist of:
- Files (duh)
- Folders
- Partitions
File contains data or program code. Folder groups files, partitions is a physical space on the hard drive, like a virtual drive.
Physical: One disk
Logical: A partition on a disk
File is the smallest unit that may be written to the peripheral memory.
Attributes:
Name, type, size, location
Additionally:
Accessed/modified/changed timestamps.
Access rights. owner.
Operations: open, close, create, delete, read, write, seek.
Folder groups files. A 1:n relation, can be implemented forwards (folder contains references to files) as in UNIX, DOS. Or backwards (each file contains reference to folder like in MySQL) as in BeOS, BFS.
A folder acts like a pointer, referencing certain physical locations on the drive.
Operations: open, close, create, delete (only empty, otherwise recursive), read, write. Has attributes like a files does.
Partitions: Up to now, we referenced a hard drive by cylinder, track and sector. Now we use logical partitioning.
The first partition is the MBR, Master Boot Record, telling BIOS where to look for the OS. Partitions usually consist of cylinders.
Partitions are used for dual-booting, logical structuring. Use of several file systems on one disk - partitions are completely independent of file systems, acting "outside" the system.
Maximal number of paritions: 63 on IDE, 15 on SCSI. Linux Loader LILO. Boot loader that can start several OSs and also boot from Logical Partitions. Usually, only primary partitions can be used.
BIOS loads MBR. MBR points to OS loader. (Boot sector).
There may be only one file system per partition - in fact, a partition's main attribute is its file system. File system is administrated by OS: OS thus needs a driver to handle a certain file system (Linux can't write NTFS usually).
Addendum: Had to leave because the exam was half an hour earlier than I thought. This section, fortunately, was not on the exam.
- Arancaytar's blog
- Add new comment
- 536 reads
Memory Management
Submitted by Arancaytar on Tue, 07/11/2006 - 06:49 – No commentsTwo more topics: Memory Management and File Management.
Memory Management can use real/physical and virtual memory. MM is responsible for allocating the available memory and for controlling each task's access to the address space.
Real memory management
- means only the RAM memory physically available.- "swapping" export.
- whole programs are exported at once to the swap file (Hard Drive)
- any task currently running is completely present in RAM, any that isn't is completely on the HD.
- Programs obviously can't be bigger than the RAM.
Virtual Memory Management
- The physical address space is extended with virtual memory on the hard drive (page file).Problem with PMM: Fragmentation of the RAM.
Allocation/search for empty space: First Fit (the first free space gets taken), Next Fit (start searching on the spot where the last program was written (doesn't try to write time and again to a system part before), Best Fit (avoid fragmentation: Use the smallest memory block possible).
Disadvantages: Must have room for one whole program. A non-working process must be chucked out completely.
Virtual Memory Management
Along comes VMM. We make more space on the hard drive and then do the same thing a cache does, only in reverse. The cache is used to preload certain things we think will soon be needed into the small fast memory. The pagefile is used to save stuff we think won't be needed soon into the large slow memory (HD). Locality principles (spatial and temporal) still apply.The process is known as "paging" because pages at one time are transferred. This is done by the OS without User Input.
Memory literally begins at address 0 and is technically infinite.
In VMM, the memory is like a new Level 4 cache lying below the other caches. Here, virtual memory is loaded into the DRAM cache. Uses a page table and pages and page frames.
Pre-paging: Preloads pages that are likely to be used soon.
Demand paging: Only when necessary.
Random depaging: Chuck out a random page when we need more physical space.
FIFO: First in first out. Oldest page is chucked out. Problem: Things that are constantly used have to be reloaded in regular intervals.
Least Recently Used. Problem: We need to keep track of access times, slowing down the whole thing. So you see, it's really like caching.
Page fault: A page is needed and missing. We first block the process, then use a page replacement algorithm to find the page to chuck. If it was changed, we need to writeback changes (remember writeback?) We load the page into the L4 Cache. Then we restart the blocked process.
Look-Ahead: Try to guess which page will be accessed latest and replace it. In practice, we are not clairvoyant unfortunately.
Look-Back: Much easier. Except we need to keep track, as mentioned.
Pageflutter
A process with not enough pages may cause very many pagefaults. Worst is trying to access a page just chucked out. Processes with no locality do this. A lot of overhead results.Page buffer: We constantly keep a certain number of free pages. Export is done ahead of time. We have won some time when pages must be rewritten.
Working set: The immediate pages that the process needs most often. Hard to predict. Must not be too large (overlap) or too small (pagefluttering). Also requires timetracking.
Pages can be replaced locally or globally, ie from one process to another.
Local allocation means one space for each program. Can be either equal or dynamic.
Advantage: More flexible use of resources, no fragmentation, no limit on process size.
Disadvantage: A lot of overhead for pagetables. CPU overhead too.
UNIX uses demand paging. Swapping is also used on occasion.
- Arancaytar's blog
- Add new comment
- 605 reads
Operating Systems
Submitted by Arancaytar on Tue, 07/11/2006 - 02:05 – No commentsYay yay yay.
This might be the absolutely last entry. It takes up the entire second section of the course, but it's far shorter.
An operating system is the software that controls the execution of other programs on the hardware and also administers the various resources.
This includes service programs (txt editor etc)
The hierarchy looks like this:
Application
|
V
Compiler
|
V
Operating System
|
V
Firmware
|
V
Hardware
The OS forms the connection between the higher, more abstract levels of the machine and the lower, more physical ones.
The OS is responsible for ensuring correct data, robust applications (ie, avoiding crashes). It must be easily maintainable and usable. It also should use the hardware resources efficiently, allowing high speed.
DIN 44330: OS forms the foundation of the system, controls and monitors processes.
Process Handling
Implicit/literal addressing: The load command contains no target register (accumulaor). Or: All addresses are explicitly named.Relative/absolute addressing: We either handle offsets or absolute addresses.
Indirect addressing. The cell we address contains another address that is then used to get the cell we want.
One process is also called a Task.
A job is a single action that is performed once - like a mouse click on a certain button.
Multitasking: Several processes run at once. Preemptive: OS decides when to switch. Cooperative: Process decides when to hand over the reins. Latter very susceptible, of course.
Multiuser: Like multitasking, but different users may initiate different tasks that also can't see each other.
Process hierarchies: A process spawns a child process. In UNIX, these tasks are handled with multitasking. In DOS, the parent process waits for the child to finish.
The different states of a process: Calculating, blocked, ready. A process can go from calculating to either, but it cannot begin calculating from blocked, nor can it be blocked while ready.
Thread: A logical, single unit within a task. A thread shares the memory of its task parent, but is handled as its own task with a certain CPU allotment. Several threads of the same task may communicate. Processes may not (directly that is).
Multiprocessing/Multithreading: Several threads may run at once (not shared, but at the same time on multiple CPUs). ASMP, SMP. Asymmetric dedicates one CPU to the system. Symmetric shares program code and system processes on different CPUs.
The process table tracks information about different processes.
Synchronizing processes isn't easy. Dependancies can crop up unexpectedly.
-----
Synchronizing with Lock
If several processes run the same Shellscript at once, there may be a part of the Shellscript that has to run "alone" without any interference.Thus, we invent what MySQL calls the Table Lock: We first check if the lock is active. If it isn't, we activate it. Then we run, then we remove the lock.
UNIX uses the lockfile command for this.
Busy waiting: If the file is locked, we keep checking till it isn't.
Only one process may be in its critical block at one time. No assumptions about the processor (speed etc). Processes not in their critical block may not preempt other processes. Processes must not wait indefinitely before entering critical phase.
Semaphore bits are used to signal the entry and exit from critical blocks.
Deadlocks: Several processes somehow manage to wait for each other indefinitely.
Think of two philosophers next to each other. One grabs the fork, the other grabs the knife. Both are now waiting for the other to relinquish their grip on the cutlery implement, but neither is willing to let go (they're philosophers, they're stubborn).
Deadlocks can be ignored, recognized or avoided. Ignoring is the easiest of course.
Batch jobs must declare their required resources prior to starting. Further resources may not be locked down.
Round Robin: fair, but dumb algorithm. Small packets of time increase overhead, but larger pockets increase response delay.
Priority Scheduling: each process gets a priority score. This is allocated by the system either statically (at the beginning) or dynamically.
Processes with much user i/o get preference (improving user response time), high cpu usage is frowned on, waiting processes are ignored entirely. Priorities must be updated constantly.
Static priority works thuswise: The process gets a weight. When it has gotten a time share, its score is decreased, but it increases the faster the higher the time share is.
Shortest job first. If you know how long a batch job will take, execute the shortest first.
First come first served: May be bad.
Think of the supermarket queue. Have you had this situation where the housewife getting weekly supplies for her eight family members is before you in line, but waves you past because you only have a chocolate bar - or vice versa? Likewise. Utilitarianism here: Rather have the long job wait a short while than have the short job wait a long while.
- Arancaytar's blog
- Add new comment
- 930 reads
Onward with the Processor!
Submitted by Arancaytar on Tue, 07/11/2006 - 01:10 – No commentsWe come to Pipelining. Pipelining is a way to improve the efficiency of a processor.
The idea is to execute several commands in parallel, thus using the resources more effectively. As mentioned previously, the command consists of several phases: Fetch, Read, Execute, Write.
In theory, four commands could be pipelines through the processor at once, each in a different phase.
Problems: Structure hazards (a command requires exlusive hardware access), data hazards (two commands share a data dependancy) and control hazards (conditional jumps that depend on earlier commands).
Solution for Control Hazard
- Since the problem hinges on conditional statements not being ready before the next command requires them, why not fill the intervening space with several independent commands? These commands may do something useful, avoiding the waste of cycles, while the condition test cycle is waiting for completion. Only when it is complete, the actual jump is performer.
- Other CPUs may try to predict the conditional jump - or, in case of multiple processors, even execute both at once. In the case of a long loop, for example, it may save much time to just assume the condition will return true and start the next cycle before the condition has actually been tested. The extra cycles that have to be discarded are less of a performance hit than it would have been to wait for the condition test in each cycle of the loop.
These solutions are implemented by the compiler as it translates the program to machine code.
Vector Computer
Vector Processors are optimized for, strangely enough, vector computing. A vector is like an array: It is a set of values of equal type. Calculations on a vector usually have to be performed on each element in turn - independently.
In a normal processor, this would require loading each cell of the vector into the register, loading the command that changes it, and making the change.
In a vector processor, the whole vector can be processed at once with a single command. The processor will see to it that the command is executed on all elements of the vector. Needless to say, this is much quicker.
Vectors are loaded into a "vector register" - all elements at once. Using pipelining, the independent operations are then applied to each element in turn.
We have four problems.
Backwards dependency and forwards anti-dependency.
1. A cell is dependent on previous cells that are not yet updated. This happens when several commands are in the vectorized loop, and an earlier command relies on previous values set later in the cycle, but in earlier cycles.
Think of vectorization as a reordering from
CYCLE 1 CMD A
CYCLE 1 CMD B
CYCLE 1 CMD C
CYCLE 2 CMD A
CYCLE 2 CMD B
CYCLE 2 CMD C
CYCLE 3 CMD A
CYCLE 3 CMD B
CYCLE 3 CMD C
into
CYCLE 1 CMD A
CYCLE 2 CMD A
CYCLE 3 CMD A
CYCLE 1 CMD B
CYCLE 2 CMD B
CYCLE 3 CMD B
CYCLE 1 CMD C
CYCLE 2 CMD C
CYCLE 3 CMD C
Suddenly, it becomes very important at what position in the cycle the command was. All earlier commands are executed before all later commands, and may not depend on them.
The two problems therefore are when a command assumes that another command later in the cycle was already executed for previous cycles (backward dependency) or when a command assumes that another command earlier in the cycle was not yet executed for later cycles (forward anti-dependancy).
Symmetric Multiprocessing
Several processors share the same memory and cache.
NUMA
Each CPU has its own memory.
Flynn
We finally come to the classification according to Flynn. This was developed back in '66 and is still the most popular thing today.
We describe only command and data streams.
Each can be Single or Multiple.
SISD (single instruction, single data) is the traditional von Neumann principle.
SIMD. Multiple data means that the processor can perform several calculations at once from the same command. A vector computer does this.
MISD. Effectively paradoxical, never implemented yet. The idea is to perform several instructions on the same set of data, which is operated on in a kind of assembly line. This may be used in:
MIMD. Several data and several instruction streams. Dual core or other multiple processor machines where the processors work independently, processing both their own commands and their own data. Highly scalable. Done by connecting several SISD processors.
- Arancaytar's blog
- Add new comment
- 510 reads
The von Neumann Computer
Submitted by Arancaytar on Mon, 07/10/2006 - 23:30 – No commentsNote to Google users:
Thanks to Drupal's search-engine friendly set-up, this page is receiving fairly good rankings for any searches on the von-Neumann computer, and accordingly a lot of traffic. Please be aware that I am not an expert on this topic, and the text below is a haphazard summary of the syllabus I wrote here to revise for my examination.
If you actually want a good source, you'd be better off with Von Neumann architecture on Wikipedia.
We're getting close to the end. Note that I'm occasionally adding the German terms for some devices; this will help me remember them since the exam is, in fact, in German.
The von Neumann computer is divided into two distinct system parts: The CPU and the RAM memory. The rest (HD, input and output devices) are peripheral devices. CPU and memory are connected via two cables called "buses". The first is the address bus. It goes only in one direction; from the CPU to the memory. Here, the CPU sends requests for reading or writing certain memory cells.
The second is the data bus. It goes in two directions. Here, the memory sends read data back to the CPU, or the CPU sends data to write to the memory. The data bus is also connected to the peripheral devices.
CPU
The CPU is divided into the Control Unit (Steuerwerk), the Arithmetic Logic Unit (Rechenwerk) and the registers. Registers are high speed memory cells within the CPU that serve different functions. The Befehlszähler contains only the memory address of the next command. It is either incremented or - in case of a jump instruction like a loop or an if - is changed to a different value. The Befehlsregister contains the Word of Command itself (note that a "Word", in techspeak, is a code of a certain fixed bit length, in this case usually 16 bit long).
ALU
The ALU has two input registers which can be used to perform a certain arithmetic or boolean operation. The result is stored in the output register.
Commands
The CPU knows two types of command: Those involving only register operations, and those involving memory operations.
Three examples to illustrate:
Command Protocol
This is what the processor does with a command:
- Load the command addressed in the Befehlszähler from the memory into the Befehlsregister.
- Update the Befehlszähler (when exactly this is done with a conditional jump is another matter. Does it take another command cycle after the condition is tested to perform the jump?)
- Decode the command currently in the command register.
- If the command requires a memory cell (most do), find its address.
- Read the required cell into a register if necessary.
- Execute the command.
- Load the command addressed in... and so forth. This is done many millions of times each second. The life of a computer is a monotonous and stressful one.
Interpreter
The thing to note about these commands is that they are not hardwired into the computer. They are byte codes of data, which can be parsed and interpreted by any program. In effect, you can use Java to write a "virtual CPU" that interprets a certain program's code, looks up its memory accesses in a virtual sandbox memory (or allows it to access the physical memory) and acts in all ways like a hardware processor would, except not as fast due to the overhead.
Interpreters can be used at a much lower level too - at the operating system or even below it, extending the processor with "macros" that it interprets and passes on to the physical CPU. This allows for complex commands in the machine code that are executed by the interpreter without the user realizing the difference.
The advantage is the option of building a simpler (thus faster) CPU with fewer hardwired commands, while still allowing for a large and complex command set. The overhead is negligible - around 100%, which is very much balanced by the advantages.
RISKY KISS - er, RISC vs CISC
Can you tell it's 1 AM and I'm beginning to get tired?
There's a definite difference between
and
The latter is a so-called "high language", one that allows for very complex commands (for loops, arrays, even object-oriented programming). The first does only what you tell it to - step by tiny step, using only register cells and memory addresses. There is no reason why this should be any different - let the compilers worry about the machine code, let the humans write in a way that does not fry their brains.
This thought was not always prevalent.
In the late 70s, processors were made very complex. The mission was to close the gap between the high language and machine code, making it possible to write Assembly in the same way one would code Java.
Then, a new processor was developed. This implemented what was dubbed the RISC architecture, making use of only very few, simple commands as opposed to the CISC processors with their interpreters.
Naturally, the RISC processor required several times as many cycles to do what a CISC processor had a single command for. But the RISC processor needed no interpreter, which allowed it to decode commands up to ten times as fast.
- Arancaytar's blog
- Add new comment
- 2217 reads
The final challenge
Submitted by Arancaytar on Mon, 07/10/2006 - 10:44 – No commentsIn 22.5 hours, I will be in the last exam this year - Computer Architecture. The topic is quite interesting, but hopelessly extensive. It also requires gratuitous memorizing, which isn't my forté.
Distinguish the four bus systems PCI, SCSI, USB and Firewire, naming three distinctive properties of each.
In other words, what I'm expected to memorize from the script is that:
- PCI has a 32 bit parallel transmission
- PCI has a signal frequency of 133 MHz
- PCI transmits 133 Mbit/s, version 2.0 twice that
- There are eight versions of the SCSI bus: SCSI-I, SCSI-II, Fast SCSI, Wide SCSI, Ultra SCSI, Ultra Wide SCSI, Ultra 2 SCSI and Ultra 160 SCSI. It starts at 5 Mbit/s and doubles for each version except II and Ultra, ending in 160 Mbit/s.
- SCSI reaches 3 meters, but as of Ultra Wide SCSI this has changed to 1.5 meters.
- USB cable can reach 5 meters
- A single USB host can service two devices, but by interconnecting hubs it is possible to connect up to 127 devices.
- A SCSI bus must be terminated (with a terminating resistor), and the distance between devices must be over 10 cm (of cable).
- USB can transmit up to 500 Mbit/s
- Firewire can reach 4.5 meters
- Firewire can currently transmit 400 Mbit/s, the new specification up to 1600 Mbit/s, and a new spec for 4800 Mbit/s is in planning
- Firewire can service 63 devices.
It's literally a jumble of trivia knowledge. Which is definitely going to be on the exam - it was on the practice exam as well.
Yay.
Edit on 2006-07-13: It was on the exam. In the exact same wording as it was phrased in the practice test. I'm one lucky SOB for learning this. And a paranoid one. Everyone else assumed this question was a random exception, and the real exam would make much more sense.
- Arancaytar's blog
- Add new comment
- 483 reads



