Computer Organization and Design: The Hardware / Software Interface

I want to know more about the hardware / software interface of computers and computer architrecture, so I am reading this textbook. I also was failing to really understand some stuff that I was reading about operating systems, so I hope this helps me with that. This is just an overview of the topic - I don't care about the details right now.

Date Created:

Last Edited:

2 524

References

Computer Organization and Design, The Hardware/Software Interface, 5th Edition, by David A Patterson and John L Hennessy
MIPS assembly language reference

Definitions

Personal Computers (PCs) are a computer designed for use by an individual, usually incorporating a graphics display, a keyboard, and a mouse
server - a computer used for running larger programs for multiple users, often simultaneously, and typically accessed only via a network.
supercomputer - a class of computers with the highest performance and cost; they are configured as servers and typically cost tens to thousands of millions of dollars.
terabyte - bytes
embedded computer - a computer inside another device for running one predetermined application or collection of software.
Personal Mobile Device (PMD) - small wireless devices to connect to the Internet; they rely on batteries for power, and software is installed by downloading apps. Conventional examples are smart phones and tablets.
Cloud computing - refers to large collections of servers that provide services over the Internet; some providers rent dynamically numbers of servers as a utility.
Software as a Service (SaaS) - delivers software and data as a service over the Internet, usually via a thin program such as a browser that runs on local client devices, instead of binary code that must be installed, and runs wholly on that device. Examples include web search and social networking.
Systems software - software that provides services that are commonly useful, including operating systems, compilers, loaders, and assemblers
Operating system - supervising program that manages the resources of a computer for the benefit of the program that can run on that computer.
Compiler - a program that translates high-level language statements into assembly language statements
Binary Digit - also called a bit. One of the two numbers in base 2 that are the components of information.
Instruction - A command that computer hardware understands and obeys.
assembler - a program that translates a symbolic version of instructions into the binary version
assembly language - a symbolic representation of machine instructions
machine language - a binary representation of machine instructions
high level programming language - a portable language such as C, C++ is composed of words and algebraic notation that can be translated by a compiler into assembly language
input device - a mechanism through which the computer is fed information, such as a keyboard
output device - a mechanism that conveys the result of a computation to a user, such as a display, or to another computer
Liquid crystal display - a display technology using a thin layer of liquid polymers that can be used to transmit or block light according to whether a charge is applied
Active matrix display - a liquid crystal display using a transistor to control the transmission of light at each individual pixel
pixel - the smallest individual picture element. Screens are composed of hundreds of thousands to millions of pixels, organized in a matrix.
integrated circuit - also called a chip. A device combining dozens to millions of transistors.
central processing unit (CPU) - Also called the processor. The active part of the computer, which contains the datapath and control and which adds numbers, test numbers, signals I/O devices to activate and so on.
datapath - The component of the processor that performs arithmetic operations
control - The component of the processor that commands the datapath, memory, and I/O devices according to the instructions of the program
memory - The storage area in which programs are kept when they are running and that contains the data needed by the running programs.
dynamic random access memory (DRAM) - memory built as an integrated circuit; it provides random access to any location. Access times are 50 nano seconds and cost per gigabyte in 2012 was 10.
cache memory - a small, fast memory that acts as a buffer for a slower, larger memory.
static random access memory - also memory built as an integrated circuit, but faster and less dense than DRAM
instruction set architecture - also called architecture. An abstract interface between the hardware and the lowest-level software that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on.
application binary interface (ABI) - The user portion of the instruction set plus the operating system interfaces used by application programmers. It defines a standard for binary portability across computers
implementation - hardware that obeys the architecture abstraction
volatile memory - storage, such as dram, that retains data only if it receiving power
nonvolatile memory - a form of memory that retains data even in the absence of a power source and that is used to store programs between runs. A DVD disk is nonvolatile.
Main memory - also called primary memory is used to old programs while they are running; typically consists of DRAM in today's computers
secondary memory - nonvolatile memory used to store programs and data between runs; typically consists of flash memory in PMDs and magnetic disks in servers
magnetic disk - also called hard disk, a form of nonvolatile secondary memory composed of rotating platters coated with magnetic recording material. Because they are rotating mechanical devices, access times are about 5 to 20 milliseconds and cost per gigabyte in 2012 was ~$0.75
flash memory - a nonvolatile semi-conductor memory. It is cheaper and slower than DRAM but more expensive per bit and faster than magnetic disks. Access times are about 5 to 50 microseconds and cost per GB was about $0.75 in 2012
local Area Network (LAN) - a network designed to carry data within a geographically confined area, typically within a single building
Wide Area Network (WAN) - a network extended over hundreds of kilometers that can span a continent
transistor - an on/off switch controlled by an electric signal
very large scale integrated circuit (VLSI) - a device containing hundreds of thousands to millions of transistors
silicon - a natural element that is a semiconductor
semiconductor - A substance that does not conduct electricity well
silicon crystal ingot - a rod composed of silicon crystal that is between 8 and 12 inches in diameter and about 12 to 24 inches long
wafer - A slice from a silicon ingot no more than 0.1 inches thick, used to create chips
defect - a microscopic flaw in a wafer or in patterning steps that can result in the failure of the die containing the defect
die - the individual rectangular sections that are cut from a wafer, more informally known as chips
yield - the percentage of good dies from the total number of dies on the wafer
response time - also called execution time - the total time required for the computer to complete a task, including disk access, memory accesses, I/O activities, operating system overhead, CPU execution time, and so on.
throughput - also called bandwidth - another measure of performance, it is the number of tasks completed per unit time
CPU execution time - also called CPU time - the actual time the CPU spends computing for a specific task
user CPU time - the CPU time spent in a program itself
system CPU time - the CPU time spent in the operating system performing tasks on behalf of the program
clock cycle - also called tick, clock tick, clock period, clock, or cycle - the time for one clock period, usually of the processor clock, which runs at a constant rate
clock period - the length of each clock cycle
Clock cycles per instruction (CPI) - average number of clock cycles per instruction for a program or program fragment
instruction count - the number of instructions executed by the program
instruction mix - a measure of the dynamic frequency of instructions across one or many programs
workload - a set of programs run on a computer that is either the actual collection of applications run by a user or constructed from real programs to approximate such a mix. A typical workload specifies both the programs and the relative frequencies.
benchmarks - programs specifically chosen to measure performance
Amdahl's Law - a rule stating that the performance enhancement possible with a given improvement is limited by the amount that the improved feature is used. It is a quantitative version of the law of diminishing returns.
Million Instructions per second (MIPS) - a measurement of program execution speed based on the number of millions of instructions. MIPS is computed as the instruction count divided by the product of the execution time and .
instruction set - the vocabulary of commands understood by a given architecture
stored-program concept - the idea that instructions and data of many types can be stored in memory as numbers, leading to the stored-program computer
word - the natural unit of access in a computer, usually a group of 32 bits; corresponds to the size of a register in the MIPS architecture
data transfer instruction - a command that moves data between memory and registers
address - a value used to delineate the location of a specific data element within a memory array
alignment restriction - a requirement that data be aligned in memory on natural boundaries
least significant bit - the rightmost bit in a MIPS word
most significant bit - the leftmost bit in a MIPS word
instruction format - a form or representation of an instruction composed of fields of binary numbers
machine language - binary representation used for communication within a computer system
hexadecimal - numbers in base 16
opcode - the field that denotes the operation and format of an instruction
AND - a logical bit by bit operation with two operands that calculates a 1 only if there is a 1 in both operands
OR - a logical bit-by-bit operation with two operands that calculates a 1 if there is a 1 in either operand
NOT -- a logical bit-by-bit operation with one operand that inverts the bits; that is, it replaces every 1 with a 0, and every 0 with a 1
NOR - A logical bit-by-bit operation that calculates the NOT of the OR of the two operands. That is, it calculates a 1 only if there is a 0 in both operands
conditional branch - an instruction that requires the comparison of two values and that allows for a subsequent transfer of control to a new address in the program based on the outcome of the comparison
basic block - a sequence of instructions without branches (except possibly at the end) and without branch targets or branch labels (except possibly at the beginning)
jump address table - also called a jump table - a table of addresses of alternative instruction sequences
procedure - a stored subroutine that performs a specific task based on the parameters with which it is provided
jump and link instruction - an instruction that jumps to an address and simultaneously saves the address of the following instruction in a register
return address - a link to the calling site that allows a procedure to return to the proper address, in MIPS it is stored in register $ra
caller - the program that investigates a procedure and provides the necessary parameter values
callee - a procedure that executes a series of stored instructions based on parameters provided by the caller and then returns control to the caller
program counter (PC) - the register containing the address of the instruction in the program being executed
stack - a data structure for spilling registers organized as last-in-first-out queue
stack pointer - a value denoting the most recently allocated address in a stack that shows where registers should be spilled or where old register values can be found. In MIPS, it is register $sp
push - Add element to stack
pop - Remove element form the stack
global pointer - the register that is reserved to point to the static area
procedure frame -also called activation record - the segment of the stack containing a procedure's saved registers and local variables
frame pointer - a value denoting the location of the saved registers and local variables for a given procedure
text segment - the segment of a UNIX object file that contains the machine language code
PC-relative addressing - an addressing regime in which the address is the sum of the program counter (PC) and a constant in the instruction
addressing mode - one of several addressing regimes delimited by their varies use of operands and/or addresses
data race - two memory accesses form a data race if they are from different threads to same location. at least one is a write, and they occur one after another
pseudo instruction - a common variation of assembly language instructions often treated as if it were an instruction in its own right
symbol table - a table that matches names of labels to the addresses of the memory words that instructions occupy
linker - also called link editor - a systems program that combines independently assembled machine language programs and resolves all undefined labels into an executable file
executable file - a functional program in the format of an object file that contains no unresolved references. It can contain symbol tables and debugging information. A "stripped executable" does not contain information. Relocation information may be included for the loader.
loader - a systems program that places an object program in main memory so that it is ready to execute
dynamically linked libraries (DLLs) - library routines that are linked to a program during execution
just in time compiler - the name commonly given to a compiler that operates at runtime, translating the interpreted code segments into the native code of the computer
dividend - a number being divided
divisor - a number that the dividend is divided by
quotient - the primary result of a division; a number that when multiplied by the divisor and added to the remainder produces the dividend
remainder - the secondary result of a division; a number that when added to the product of the quotient and the divisor produces the dividend
scientific notation - a notation that renders numbers with a single digit to the left of the decimal point
normalized - a number in floating-point notation that has no leading 0s
floating point - computer arithmetic that represents numbers in which the binary point is not fixed
fraction- the value, generally between 0 and 1, places in the fraction field, The fraction is also called the mantissa.
exponent - In the numeric representation system of floating point arithmetic, the value that is placed in the exponent field
overflow (floating-point) - a situation in which a positive exponent becomes too large to fit in the exponent field
underflow (floating-point) - a situation in which a negative exponent becomes too large to fit in the exponent field
double precision - a floating point value represented in two 32-bit words
single precision - A floating point value represented in a single 32-bit word
guard - the first of two extra bits kept on the right during intermediate calculations of floating-point numbers; used to improve accuracy
round - Method to make the intermediate floating-point result fit the floating-point format; the goal of typically to find the nearest number that can be represented in the format
units in the last place (ulp) - the number of bits in error in the least significant bits of the significand between the actual number and the number that can be represented
sticky bit - a bit used in rounding in addition to guard and round that is set whenever there are nonzero bits to the right of the round bit
fused multiply add - a floating point instruction that performs both a multiply and an add, but rounds only once after the add
clocking methodology - the approach used to determine when data is valid and stable relative to the clock
edge-triggered clocking - a clocking scheme in which all state changes occur on a clock edge
control signal - a signal used for multiplexor selection or for direction the operation of a functional unit; contrasts with a data signal, which contains information that is operated on by a functional unit
asserted - the signal is logically high or true
deasserted - the signal is logically low or false
datapath element - a unit used to operate on or hold data within a processor. In the MIPS implementation, the datapath elements include the instruction and data memories, the register file, the ALU, and adders
program counter (PC) - The register containing the address of the instruction in the program being executed.
register file - a state element that consists of a set of registers that can be read or written by supplying a register number to be accessed.
sign-extend - to increase the size of a data item by replicating the higher-order sign bit of the original data item in the high order buts of the larger, destination data item
branch target address - the address specified in a branch, which becomes the new program counter (PC), if the branch is taken. In the MIPS architecture the branch target is given by the sum of the offset field of the instruction and the address of the instruction following the branch
branch taken - a branch where condition is satisfied and the program counter (PC) becomes the branch target. All unconditional jumps are taken branches.
branch not taken - a branch where the branch condition is false and the program counter becomes the address of the instruction that sequentially follows the branch

Computer Abstractions and Technology

Eight Great Ideas:
- Moore's law states that integrated circuit resources double every 18-24 months.
- Use abstractions to represent the design at different levels of representation; lower-level details are hidden to offer a simpler model at higher levels
- Making the common case fast will tend to enhance performance better than optimizing the rare case.
- Parallel performance - get more performance by performing operations in parallel
- Pipelining - a particular pattern of parallelism that is very prevalent in computer architecture
- Prediction - sometimes it is faster on average to guess and start working than wait until you know for sure, assuming that the mechanism to recover from misprediction is not too expensive, and your prediction is relatively accurate
- Hierarchy of memories - the fastest, smallest and most expensive memory per but at the top of the hierarchy and the slowest, largest, and cheapest per bit at the bottom.
- Dependability via Redundancy - make systems dependable by including redundant components that can take over when a failure occurs and to help detect failures.
The underlug hardware in any computer performs the same basic functions: inputting data, outputting data, processing data, and storing data. The five classic components of a computer are input, output, memory, datapath, and control, with the last two sometimes combines and called the processor.

Pixels can be represented as a matrix of bits, called a bit map. The computer hardware support for graphics consists mainly of a raster refresh buffer or frame buffer to store the bit map. The goal of the bit map is to faithfully represent what is on the screen.
The processor is the active part of the computer, following the instructions of a program to the letter. It adds numbers, tests numbers, signals I/O devices to activate, and so on. The processor logically comprises two main components: datapath and control, the respective brawn and brain of the processor. The datapath performs the arithmetic operations, and the control tells the datapath, memory, and I/O devices what to do according to the wishes of the instructions of the program. The memory is where the programs are kept when they are running; it also contains the data needed by the running programs. The memory is built from DRAM chips. In contrast to sequential access memories, such as magnetic tapes, the RAM portion of the term DRAM means that memory accesses take basically the same amount of time no matter what portion of the memory is used.
Cache memory acts as a buffer for DRAM memory. Cache is built using a different memory technology, static random access memory (SRAM). SRAM is faster but less dense, and hence more expensive, than DRAM. SRAM and DRAM are two layers of the memory hierarchy.
One of the most important abstractions is the interface between the hardware and the lowest-level software
Performance of computers can be combined in a variety of ways - but we are often talking about response time or execution time

Energy efficiency has replaced die area as the most critical resource of microprocessor design. Conserving power while trying to increase performance has forced the hardware industry to switch to multicore microprocessors, thereby forcing the software industry to switch to programming parallel hardware. Parallelism is now required for performance.

Instructions: Language of the Computer

We will be using MIPS assembly language in this book

When you see arm-asm in a code block, that means that we are writing MIPS assembly language. It doesn't look like PrismJs supports MIPS language.

Every computer must be able to perform arithmetic

add a, b, c # The sum of b and c is placed in a 
add a, a, d # The sum of b, c, and d is now in a 
add a, a, e # The sum of b, c, d, and e is now in a
# $ +  two characters represets registers
add $t0,$s1,$s2 # register $t0 contains g + h 
add $t1,$s3,$s4 # register $t1 contains i + j 
sub $s0,$t0,$t1 # f gets $t0 – $t1, which is (g + h)–(i + j)
# add constant
addi $3,$s4,4 # $s3 = $s3 + 4

The operands of arithmetic instructions are restricted; they must be from a limited number of special locations built directly in hardware called registers. Registers are primitives used in hardware design that are also visible to the programmer when the computer is completed, so you can think of registers as the bricks of computer construction. The size of a register is the size of a word.
MIPs must include instructions that transfer data between memory and registers. Such instructions are called data transfer instructions. To access a word in memory, the instruction must supply the memory address. Memory is just a large, single dimensional array, with the address acting as the index to that array, starting at 0.

The data transfer instruction that copies data from memory to a register is traditionally called load.
Virtually all architectures today address individual bytes. Therefore, the address of a word matches the address of one of the X bytes within the word, and address of sequential words differ by X, where X is equal to the number of bytes per word, or the number of bits in a word divided by 8.

The instruction complementary to load is traditionally called store; it copies data from a register to memory. The format of a store is similar to that of a load: the name of the operand, followed by the register to be stored, then offset to select the array element, and finally the base register.
The process of putting less commonly used variables (or those needed later) into memory is called spilling registers.
Data access is faster if the data is in registers instead of memory. Registers take less time to access and have higher throughput than memory, making data in registers both faster to access and simpler to use.
Constant operands occur frequently, and by including constants inside arithmetic instructions, operations are much faster and use less energy than if constants were loaded from memory
IF the number that is the proper result of operations cannot be represented by the rightmost hardware bits, overflow is said to have occurred.
Signed and Unsigned - leading 0s mean positive, and leading 1s mean negative. The convention for representing signed binary numbers s called two complement representation. This leftmost bit is often called the sign bit.

Today's computers are built on two key principles:
- Instructions are represented as numbers
- Programs are stored in memory to be read or written, just like data

What distinguishes a computer from a simple calculator is its ability to make decisions. Based on the input data and the values created during computation, different instructions execute. Decision making is commonly represented in programming languages use the if statement, sometimes combined with go to statements and labels. MIPS assembly language includes two decision making instructions, similar to an if statement with a go to.

In the execution of a procedure, the program must follow these six steps:

Put parameters in a place where the procedure can access them
Transfer control to the procedure
Acquire the storage resources needed for the procedure
Perform the desired task
Put the result value in a place where the calling program can access it
Return control to the point of origin, since a procedure can be called from several points in a program

MIPS memory allocation for program and data:

parallel execution is easier when tasks are independent, but they often need to cooperate. Cooperation usually means some tasks are writing new values that other must read.

Design Principles of Hardware Technology

Simplicity favors regularity
1. Can only add two numbers
Smaller is faster
1. Limit the number of registers
Good design demands good compromises

Arithmetic for Computers

Addition and subtraction are pretty simple

Multiplication hardware simply shifts and add, as derived from the paper-and-pencil method learned in grammar school. Compilers even use shift instructions for multiplications by powers of two. With much more hardware we can do the adds in parallel and do them much faster.

The finite word size of computers means that arithmetic operations can create results that are too large to fit in this fixed word size. It's easy to detect overflow in unsigned numbers, although these are almost always ignored because programs don't want to detect overflow for address arithmetic, the most common use of natural numbers.
Floating point numbers are of the form

where F involves the value in the fraction field and E involves the value in the exponent field.. S is the value of the sign bit.

MIPS Representation of Double Precision Floating Point:

Computer arithmetic has become largely standardized over time. Two's complement binary integer arithmetic is found in every computer sold today, and if it includes floating point support, it offers the IEEE 754 binary floating-point arithmetic.
Computer arithmetic is distinguished from paper-and-pencil arithmetic by the constraints of limited precision. Th is limit may result in invalid operations through calculating numbers larger or smaller than the predefined limits. Such anomalies, called “overflow” or “underflow,” may result in exceptions or interrupts, emergency events similar to unplanned subroutine calls.
Floating-point arithmetic has the added challenge of being an approximation of real numbers, and care needs to be taken to ensure that the computer number selected is the representation closest to the actual number. Th e challenges of imprecision and limited representation of floating point are part of the inspiration for the fi eld of numerical analysis. Th e recent switch to parallelism shines the searchlight on numerical analysis again, as solutions that were long considered safe on sequential computers must be reconsidered when trying to find the fastest algorithm for parallel computers that still achieves a correct result.
Data-level parallelism, specifically subword parallelism, offers a simple path to higher performance for programs that are intensive in arithmetic operations for either integer or floating-point data. We showed that we could speed up matrix multiply nearly fourfold by using instructions that could execute four floating point operations at a time.

I am really only going to be giving a high level overview from here on out (higher than what I already have given). I may come back to this in the future, but for now, I just want to know enough to understand some concepts about OSs.

The Processor

Both the datapath and control for a processor can be designed starting with the instruction set architecture and an understanding of the basic characteristics of the technology. A datapath for a MIPS processor could be constructed based on the architecture and the decision to build a single-cycle implementation. Of course, the underlying technology also affects many design decisions by dictating what components can be used in the datapath, as well as whether a single-cycle implementation even makes sense.

Pipelining improves throughput but not the inherent execution time, or instruction latency, of instructions; for some instructions, the latency is similar in length to a single cycle approach. Multiple instruction issue adds additional datapath hardware to allow multiple instructions to begin every clock cycle, but at an increase in effective latency. Pipelining was presented as reducing the clock cycle time of the simple single-cycle datapath. Multiple instruction issue, in comparison, clearly focuses on reducing clock cycle per instruction (CPI).

Scheduling and speculation via prediction both in hardware and software are the primary techniques used to reduce the performance impact of dependencies

Large and Fast: Exploiting Memory Hierarchy

The difficulty of building a memory system to keep pace with faster processors is underscored by the fact that the raw material for main memory, DRAMs, is essentially the same in the fastest computers as it is in the slowest and cheapest.
It is the principle of locality that gives us a chance to overcome the long latency of memory access - and the soundness of this strategy is demonstrated at all levels of the memory hierarchy.
Multilevel caches make it possible to use more cache optimizations more easily because lower level caches are larger than higher level caches and because the lower level cache is not being used as often as the higher level cache - which makes it easier to utilize the lower level cache while the higher level cache is in use.

Parallel Processors from Client to Cloud

Despise this long and checkered past, the information technology industry has now tied its future to parallel computing. Although it is easy to make the case that this effort will fail like many in the phase, there are reasons to be hopeful:
- The SaaS industry is growing in importance
- Parallelism is key to economics of software
- Parallel programming important for multimedia applications
- The use of parallel processing in scientific and engineering computation is popular
- All desktop and server microprocessor manufacturers are building microprocessors to achieve higher performance, so there is no easy path to higher performance for sequential applications.