SlideShare a Scribd company logo
1
Copyright © 2019, Elsevier Inc. All rights reserved.
‫پیشرفته‬ ‫کامپیوتر‬ ‫معماری‬
‫اول‬ ‫جلسه‬
‫پروردگار‬ ‫نام‬ ‫به‬
‫مهر‬ ‫گسترده‬
Copyright © 2019, Elsevier Inc. All rights reserved. 2
Chapter 1
Fundamentals of Quantitative
Design and Analysis
Computer Architecture
A Quantitative Approach, Sixth Edition
3
Copyright © 2019, Elsevier Inc. All rights reserved.
Computer Technology
 Performance improvements:
 Improvements in semiconductor technology

Feature size, clock speed
 Improvements in computer architectures

Enabled by HLL compilers, UNIX

Lead to RISC architectures
 Together have enabled:

Lightweight computers

Productivity-based managed/interpreted
programming languages
Introduction
4
Copyright © 2019, Elsevier Inc. All rights reserved.
Single Processor Performance
Introduction
5
Copyright © 2019, Elsevier Inc. All rights reserved.
Current Trends in Architecture
 Cannot continue to leverage Instruction-Level
parallelism (ILP)
 Single processor performance improvement ended in
2003
 New models for performance:
 Data-level parallelism (DLP)
 Thread-level parallelism (TLP)
 Request-level parallelism (RLP)
 These require explicit restructuring of the
application
Introduction
6
Copyright © 2019, Elsevier Inc. All rights reserved.
Classes of Computers
 Personal Mobile Device (PMD)
 e.g. start phones, tablet computers
 Emphasis on energy efficiency and real-time
 Desktop Computing
 Emphasis on price-performance
 Servers
 Emphasis on availability, scalability, throughput
 Clusters / Warehouse Scale Computers
 Used for “Software as a Service (SaaS)”
 Emphasis on availability and price-performance
 Sub-class: Supercomputers, emphasis: floating-point
performance and fast internal networks
 Internet of Things/Embedded Computers
 Emphasis: price
Classes
of
Computers
7
Copyright © 2019, Elsevier Inc. All rights reserved.
Parallelism
 Classes of parallelism in applications:
 Data-Level Parallelism (DLP)
 Task-Level Parallelism (TLP)
 Classes of architectural parallelism:
 Instruction-Level Parallelism (ILP)
 Vector architectures/Graphic Processor Units (GPUs)
 Thread-Level Parallelism
 Request-Level Parallelism
Classes
of
Computers
8
Copyright © 2019, Elsevier Inc. All rights reserved.
Flynn’s Taxonomy
 Single instruction stream, single data stream (SISD)
 Single instruction stream, multiple data streams (SIMD)
 Vector architectures
 Multimedia extensions
 Graphics processor units
 Multiple instruction streams, single data stream (MISD)
 No commercial implementation
 Multiple instruction streams, multiple data streams
(MIMD)
 Tightly-coupled MIMD
 Loosely-coupled MIMD
Classes
of
Computers
9
1- Single Instruction Single Data(SISD)
 This category is the uniprocessor.
 The programmer thinks of it as the standard sequential
computer,but it can exploit ILP.
10
2-Single Instruction Multiple Data(SIMD)
 The same instruction is executed by multiple processors using
different data streams.
 SIMD computers exploit data-level parallelism by applying the same
operations to multiple items of data in parallel.
 Each processor has its own data memory
 but there is a single instruction memory and control processor ,which
fetches and dispatches instructions.
 vector architectures,
 multimedia extensions to standard instruction sets, and GPUs.
11
3- Multiple Instruction Single Data(MISD)
Nocommercial multiprocessor of this type has been built
to date, but it rounds out this simple classification.
12
4- Multiple Instruction Multiple Data(MIMD)
 Each processor fetches its own instructions and operates
on its own data, and it targets task-level parallelism(TLP)
 DLP (more expensive than SIMD)
 Tightly coupled MIMD architectures:TLP
 Loosely coupled MIMD architectures:RLP
 Clusters
 warehouse-scale computers
13
Copyright © 2019, Elsevier Inc. All rights reserved.
Defining Computer Architecture
 “Old” view of computer architecture:
 Instruction Set Architecture (ISA) design
 i.e. decisions regarding:

registers, memory addressing, addressing modes,
instruction operands, available operations, control flow
instructions, instruction encoding
 “Real” computer architecture:
 Specific requirements of the target machine
 Design to maximize performance within constraints:
cost, power, and availability
 Includes ISA, microarchitecture, hardware
Defining
Computer
Architecture
14
Copyright © 2019, Elsevier Inc. All rights reserved.
Instruction Set Architecture
 Class of ISA
 General-purpose registers
 Register-memory vs load-store
 RISC-V registers
 32 g.p., 32 f.p.
Defining
Computer
Architecture
Register Name Use Saver
x0 zero constant 0 n/a
x1 ra return addr caller
x2 sp stack ptr callee
x3 gp gbl ptr
x4 tp thread ptr
x5-x7 t0-t2 temporaries caller
x8 s0/fp saved/
frame ptr
callee
Register Name Use Saver
x9 s1 saved callee
x10-x17 a0-a7 arguments caller
x18-x27 s2-s11 saved callee
x28-x31 t3-t6 temporaries caller
f0-f7 ft0-ft7 FP temps caller
f8-f9 fs0-fs1 FP saved callee
f10-f17 fa0-fa7 FP arguments callee
f18-f27 fs2-fs21 FP saved callee
f28-f31 ft8-ft11 FP temps caller
15
Copyright © 2019, Elsevier Inc. All rights reserved.
Instruction Set Architecture
 Memory addressing
 RISC-V: byte addressed, aligned accesses faster

An access to an object of size s bytes at byte address A is aligned if
A mod s=0.
 Addressing modes
 RISC-V: Register, immediate, displacement (base+offset)
 Other examples: autoincrement, indexed, PC-relative
 Types and size of operands
 RISC-V: 8-bit, 32-bit, 64-bit
 IEEE 754 floating point in 32-bit (single precision) and 64-bit
(double precision).
 The 80x86 also supports 80-bit floating point (extended
double precision).
Defining
Computer
Architecture
16
Copyright © 2019, Elsevier Inc. All rights reserved.
Floating point instructions for RISC-V.
17
Copyright © 2019, Elsevier Inc. All rights reserved.
IEEE 754 Format
18
Copyright © 2019, Elsevier Inc. All rights reserved.
Instruction Set Architecture
 Operations
 RISC-V: data transfer, arithmetic, logical, control,
floating point
 See Fig. 1.5 in text
 Control flow instructions
 Use content of registers (RISC-V) vs. status bits (x86,
ARMv7, ARMv8)
 Return address in register (RISC-V, ARMv7, ARMv8)
vs. on stack (x86)
 Encoding
 Fixed (RISC-V, ARMv7/v8 except compact instruction
set) vs. variable length (x86)
Defining
Computer
Architecture
19
Copyright © 2019, Elsevier Inc. All rights reserved.
Encoding
20
Copyright © 2019, Elsevier Inc. All rights reserved.
21
Copyright © 2019, Elsevier Inc. All rights reserved.
‫پیشرفته‬ ‫کامپیوتر‬ ‫معماری‬
‫دوم‬ ‫جلسه‬
‫پروردگار‬ ‫نام‬ ‫به‬
‫مهر‬ ‫گسترده‬
Copyright © 2019, Elsevier Inc. All rights reserved. 22
Chapter 1
Fundamentals of Quantitative
Design and Analysis…(Cont.)
Computer Architecture
A Quantitative Approach, Sixth Edition
23
Copyright © 2019, Elsevier Inc. All rights reserved.
Genuine Computer Architecture
 The implementation of a computer
has two components:
 organization
 hardware
24
Copyright © 2019, Elsevier Inc. All rights reserved.
…Genuine Computer Architecture
 Organization

the high-level aspects of a computer’s design,
 the memory system, the memory interconnect, and the
design of the internal processor or CPU (central
processing unit—where arithmetic, logic, branching, and
data transfer are implemented).
 The term microarchitecture is also used instead of
organization.
25
Copyright © 2019, Elsevier Inc. All rights reserved.
…Genuine Computer Architecture
 Two processors with the same instruction set
architectures but different organizations are
the AMD Opteron and the Intel Core i7.

Both processors implement the 80x86 instruction
set, but they have very different pipeline and cache
organizations.
26
Copyright © 2019, Elsevier Inc. All rights reserved.
…Genuine Computer Architecture
 Hardware
 refers to the specifics of a computer:

the detailed logic design

the packaging technology of the computer.
 Often a line of computers contains computers
with :

identical instruction set architectures

very similar organizations,

differ in the detailed hardware implementation.
27
Copyright © 2019, Elsevier Inc. All rights reserved.
…Genuine Computer Architecture
 the Intel Core i7 and the Intel Xeon E7
 nearly identical
 different clock rates
 different memory systems
 the Xeon E7 more effective for server
computers.
28
Copyright © 2019, Elsevier Inc. All rights reserved.
 Computer architects must design a
computer to meet
 functional requirements as well as
price,power,performance,andavailability goals

architects also must determine what the functional
requirements are, which can be a major task.

The requirements may be specific features inspired
by the market.

Application software typically drives the choice of
certain functional requirements by determining how
the computer will be used
…Genuine Computer Architecture
29
Copyright © 2019, Elsevier Inc. All rights reserved.
Summary
of
some
of
the
most
important
functional
requirements
an
architect
faces
30
Copyright © 2019, Elsevier Inc. All rights reserved.
Trends in Technology
 Integrated circuit technology (Moore’s Law)
 Transistor density: 35%/year
 Die size: 10-20%/year
 Integration overall: 40-55%/year
 DRAM capacity: 25-40%/year (slowing)
 8 Gb (2014), 16 Gb (2019), possibly no 32 Gb
 Flash capacity: 50-60%/year
 8-10X cheaper/bit than DRAM
 Magnetic disk capacity: recently slowed to 5%/year
 Density increases may no longer be possible, maybe increase from 7 to 9 platters
 8-10X cheaper/bit then Flash
 200-300X cheaper/bit than DRAM
 Network technology
 Network Performance depends both on the performance of switches and on the
performance of the transmission system.
Trends
in
Technology

Designers often design for the next
technology.

Cost has decreased at about the rate
at which density increases.
31
Copyright © 2019, Elsevier Inc. All rights reserved.
Bandwidth and Latency
 Bandwidth or throughput
 Total work done in a given time
 32,000-40,000X improvement for processors
 300-1200X improvement for memory and disks
 Latency or response time
 Time between start and completion of an event
 50-90X improvement for processors
 6-8X improvement for memory and disks
Trends
in
Technology
32
Copyright © 2019, Elsevier Inc. All rights reserved.
Bandwidth and Latency…
 Performance is the primary differentiator
for microprocessors and networks.
 the greatest gains: 32,000–40,000 in
bandwidth and 50–90 in latency.
 Capacity is generally more important than
performance for memory and disks.
 capacity has improved more,
 bandwidth advances of 400–2400
 gains in latency of 8–9.
33
Copyright © 2019, Elsevier Inc. All rights reserved.
Performance milestones over 25–40 years for
microprocessors
34
Copyright © 2019, Elsevier Inc. All rights reserved.
Performance milestones over 25–40 years for memory
35
Copyright © 2019, Elsevier Inc. All rights reserved.
Performance milestones over 25–40 years for networks,
36
Copyright © 2019, Elsevier Inc. All rights reserved.
Performance milestones over 25–40 years for disks
37
Copyright © 2019, Elsevier Inc. All rights reserved.
Bandwidth and Latency
Log-log plot of bandwidth and latency milestones relative to the first milestone.
latency improved 8–91, **** bandwidth improved about 400–32,000.
Except for networking, there were modest improvements in latency and bandwidth in the other three
technologies in the six years (2011-2017): 0%–23% in latency and 23%–70% in bandwidth.
Trends
in
Technology
38
Copyright © 2019, Elsevier Inc. All rights reserved.
39
Copyright © 2019, Elsevier Inc. All rights reserved.
‫پیشرفته‬ ‫کامپیوتر‬ ‫معماری‬
‫سوم‬ ‫جلسه‬
‫پروردگار‬ ‫نام‬ ‫به‬
‫مهر‬ ‫گسترده‬
Copyright © 2019, Elsevier Inc. All rights reserved. 40
Chapter 1
Fundamentals of Quantitative
Design and Analysis…(Cont.)
Computer Architecture
A Quantitative Approach, Sixth Edition
41
Copyright © 2019, Elsevier Inc. All rights reserved.
Transistors and Wires
 Feature size
 Minimum size of transistor or wire in x or y
dimension
 10 microns in 1971 to .011 microns in 2017
 Transistor performance scales linearly

Wire delay does not improve with feature size!
 Integration density scales quadratically
Trends
in
Technology
 Larger and larger fractions of the clock cycle have been
consumed by the propagation delay of signals on wires .
 but power now plays an even greater role than wire delay.
42
Copyright © 2012, Elsevier Inc. All rights reserved.
Transistors and Wires
43
Copyright © 2012, Elsevier Inc. All rights reserved.
Power and Energy
44
Copyright © 2019, Elsevier Inc. All rights reserved.
Power and Energy concerns
1. what is the maximum power a processor
ever requires?
 voltage indexing methods that allow the
processor to slow down and regulate voltage
within a wider margin.
2. what is the sustained power
consumption( thermal design power (TDP))
it determines the cooling requirement.
3. Which metric is the right one for comparing
processors: energy or power?
45
Copyright © 2019, Elsevier Inc. All rights reserved.
Power and Energy
 Problem: Get power in, get power out
 Thermal Design Power (TDP)
 Characterizes sustained power consumption
 Used as target for power supply and cooling system
 Lower than peak power (1.5X higher), higher than
average power consumption
 Clock rate can be reduced dynamically to limit
power consumption
 Energy per task is often a better measurement
Trends
in
Power
and
Energy
46
Copyright © 2012, Elsevier Inc. All rights reserved.
Power and Energy
 power : energy per unit time
 1 watt = 1 joule per second.
E=P*T
 Which metric is the right one for comparing
processors: energy or power?
 In general, energy is always a better metric

because it is tied to a specific task and the time
required for that task.
47
Copyright © 2012, Elsevier Inc. All rights reserved.
Power and Energy
 if we want to know which of two
processors is more efficient for a given
task, we should compare energy
consumption (not power) for executing the
task.
48
Copyright © 2012, Elsevier Inc. All rights reserved.
Power and Energy
 When is power consumption a useful
measure?
 as a constraint.

for example, a chip might be limited to 100 watts.
49
Copyright © 2012, Elsevier Inc. All rights reserved.
Power and Energy
 Static power
 Dynamic power
50
Copyright © 2012, Elsevier Inc. All rights reserved.
Dynamic Energy and Power
51
Copyright © 2019, Elsevier Inc. All rights reserved.
Dynamic Energy and Power
 Dynamic energy
 Transistor switch from 0 -> 1 or 1 -> 0
 ½ x Capacitive load x Voltage2
 Dynamic power
 ½ x Capacitive load x Voltage2
x Frequency switched
 Reducing clock rate reduces power, not energy
Trends
in
Power
and
Energy
52
Copyright © 2012, Elsevier Inc. All rights reserved.
Dynamic Energy and Power
53
Copyright © 2019, Elsevier Inc. All rights reserved.
Power
 Intel 80386
consumed ~ 2 W
 3.3 GHz Intel
Core i7 consumes
130 W
 Heat must be
dissipated from
1.5 x 1.5 cm chip
 This is the limit of
what can be
cooled by air
Trends
in
Power
and
Energy
54
Copyright © 2012, Elsevier Inc. All rights reserved.
Power
55
Copyright © 2012, Elsevier Inc. All rights reserved.
Reducing Power
56
Copyright © 2012, Elsevier Inc. All rights reserved.
Reducing Power
 Techniques for reducing power:
 Do nothing well: (clock gating)

Most microprocessors today turn off the clock of inactive modules to
save energy and dynamic power
 Dynamic Voltage-Frequency Scaling (DVFS).

Personal mobile devices, laptops, and even servers have periods of
low activity where there is no need to operate at the highest clock
frequency and voltages.
 Low power state for DRAM, disks :

Given that PMDs and laptops are often idle, memory and storage
offer low power modes to save energy
 Overclocking, turning off cores

the 3.3 GHz Core i7 can run in short bursts for 3.6 GHz.

microprocessors can turn off all cores but one and run it

at an even higher clock rate.

For single threaded code, these microprocessors can turn off
all cores but one and run it at an even higher clock rate.
Trends
in
Power
and
Energy
57
Copyright © 2019, Elsevier Inc. All rights reserved.
Reducing Power
 Techniques for reducing power:
 Do nothing well
 Dynamic Voltage-Frequency Scaling
 Low power state for DRAM, disks
Trends
in
Power
and
Energy
58
Copyright © 2012, Elsevier Inc. All rights reserved.
Static Power
59
Copyright © 2019, Elsevier Inc. All rights reserved.
Static Power
 Static power consumption
 25-50% of total power

Currentstatic x Voltage
 Scales with number of transistors
 To reduce: power gating
Trends
in
Power
and
Energy
60
Copyright © 2019, Elsevier Inc. All rights reserved.
Static Power
 large SRAM caches that need power to
maintain the storage values. (The S in
SRAM is for static.)
 The only hope to stop leakage is to turn off
power to the chips’ subsets.
61
Copyright © 2019, Elsevier Inc. All rights reserved.
race-to-halt.
 because the processor is just a portion of
the whole energy cost of a system,
 it can make sense to use a faster, less
energy-efficient processor to allow the rest
of the system to go into a sleep mode. This
strategy is known as race-to-halt.
62
Copyright © 2019, Elsevier Inc. All rights reserved.
Domain specific processors
A computer will consist of
 standard processors to run conventional
large programs such as operating systems
 Domain specific processors
do only a narrow range of tasks, but they do them
extremely well.
 such computers will be much more
heterogeneous than the homogeneous
multicore chips of the past.
63
Copyright © 2019, Elsevier Inc. All rights reserved.
64
Copyright © 2019, Elsevier Inc. All rights reserved.
‫پیشرفته‬ ‫کامپیوتر‬ ‫معماری‬
‫چهارم‬ ‫جلسه‬
‫پروردگار‬ ‫نام‬ ‫به‬
‫مهر‬ ‫گسترده‬
65
Copyright © 2019, Elsevier Inc. All rights reserved
.
10
Copyright © 2019, Elsevier Inc. All rights reserved. 66
Chapter 1
Fundamentals of Quantitative
Design and Analysis…(Cont.)
Computer Architecture
A Quantitative Approach, Sixth Edition
67
Copyright © 2019, Elsevier Inc. All rights reserved.
Trends in Cost
 Although costs tend to be less important in some
computer designs—specifically supercomputers
 cost-sensitive designs are of growing
significance
 learning curve :manufacturing costs
decrease over time.

Example
 Price per megabyte of DRAM has dropped over the long
term. price and cost of DRAM track closely.
 Microprocessor prices also drop over time, but because
they are less standardized than DRAMs, the relationship
between price and cost is more complex.
yield
68
Copyright © 2019, Elsevier Inc. All rights reserved.
Trends in Cost
 Cost driven down by learning curve
 Yield
 DRAM: price closely tracks cost
 Microprocessors: price depends on
volume
 10% less for each doubling of volume
Trends
in
Cost
69
Copyright © 2019, Elsevier Inc. All rights reserved.
Trends in Cost
 key factor in determining cost:
70
Copyright © 2019, Elsevier Inc. All rights reserved.
Cost of an Integrated Circuit
 standard parts—disks, Flash memory, DRAMs,
and so on—are becoming a significant portion of
any system’s cost.
 with PMDs’ increasing reliance of whole systems
on a chip (SOC), the cost of the integrated
circuits is much of the cost of the PMD.
71
Copyright © 2019, Elsevier Inc. All rights reserved.
Trends in Cost
72
Copyright © 2019, Elsevier Inc. All rights reserved.
73
Copyright © 2019, Elsevier Inc. All rights reserved.
74
Copyright © 2019, Elsevier Inc. All rights reserved.
75
Copyright © 2019, Elsevier Inc. All rights reserved.
Integrated Circuit Cost
 Integrated circuit
 Bose-Einstein formula:
 Defects per unit area = 0.016-0.057 defects per square cm (2010)
 N = process-complexity factor = 11.5-15.5 (40 nm, 2010)
 For 28 nm processes in 2017, N is 7.5–9.5. For a 16 nm process,
 N ranges from 10 to 14
Trends
in
Cost
76
Copyright © 2019, Elsevier Inc. All rights reserved.
Integrated Circuit Cost
77
Copyright © 2019, Elsevier Inc. All rights reserved.
Integrated Circuit Cost
78
Copyright © 2019, Elsevier Inc. All rights reserved.
Integrated Circuit Cost :redundancy as a way to
raise yield.
 Given the tremendous price pressures on commodity products such
as DRAM and SRAM, designers have included redundancy as a
way to raise yield.
 DRAMs have regularly included some redundant memory cells so
that a certain number of flaws can be accommodated.
 Designers have used similar techniques in both standard SRAMs
and in large SRAM arrays used for caches within microprocessors.
 GPUs have 4 redundant processors out of 84 for the same reason.
Obviously, the presence of redundant entries can be used to boost
the yield significantly.
79
Copyright © 2019, Elsevier Inc. All rights reserved.
Cost Versus Price
 Margin between the cost to manufacture a
product and the price the product sells for has
been shrinking.
 Those margins pay for
 company’s research and development (R&D),
 marketing,
 sales,
 manufacturing equipment maintenance,
 building rental,
 cost of financing,
 Pretax profits, and taxes.
80
Copyright © 2019, Elsevier Inc. All rights reserved.
Cost of Manufacturing Versus Cost of Operation
 Before
 cost meant the cost to build a computer
 price meant price to purchase a computer.
 With the advent of WSCs,
 capital expenses (CAPEX):

tens of thousands of servers,
 operational expenses (OPEX):

the cost to operate the computers
81
Copyright © 2019, Elsevier Inc. All rights reserved.
(CAPEX) & (OPEX)
82
Copyright © 2019, Elsevier Inc. All rights reserved.
83
Copyright © 2019, Elsevier Inc. All rights reserved.
84
Copyright © 2019, Elsevier Inc. All rights reserved.
85
Copyright © 2019, Elsevier Inc. All rights reserved.
86
Copyright © 2019, Elsevier Inc. All rights reserved.
87
Copyright © 2019, Elsevier Inc. All rights reserved.
Dependability
 Before :
 ICs were one of the most reliable components
of a computer.

their pins may be vulnerable, and faults may occur
over communication channels, the failure rate
inside the chip was very low.
 Now,
 because of feature sizes of 16 nm and
smaller,

Transient faults and permanent faults are
becoming more commonplace.
88
Copyright © 2019, Elsevier Inc. All rights reserved.
Dependability
 Service level agreements (SLAs)

an SLA could be used to decide whether
the system was up or down.
89
Copyright © 2019, Elsevier Inc. All rights reserved.
Dependability
 Systems alternate between two states:
1. Service accomplishment:
where the service is delivered as specified.
2. Service interruption:
where the delivered service is different from the SLA
 Transitions between these two states are
caused by

Failures (from state 1 to state 2)

Restorations (2 to 1).
90
Copyright © 2019, Elsevier Inc. All rights reserved.
Dependability
 Quantifying these transitions leads to the
two main measures of dependability:
 Module reliability
 a measure of the continuous service accomplishment

the time to failure from a reference initial instant.
 Module availability
 a measure of the service accomplishment with respect
to the alternation between the two states of
accomplishment and interruption.
91
Copyright © 2019, Elsevier Inc. All rights reserved.
Dependability
 Module reliability
 Mean time to failure (MTTF)

mean time to failure
 FIT (=1/MTTF)

failures in time
 rate of failures, generally reported as failures per billion
hours of operation
 Mean time to repair (MTTR)
 Mean time between failures (MTBF) = MTTF + MTTR
 Module Availability = MTTF / MTBF
Dependability
92
Copyright © 2019, Elsevier Inc. All rights reserved.
Dependability
 Assume a disk subsystem with the following components
and MTTF:
 10 disks, each rated at 1,000,000-hour MTTF
 1 ATA controller, 500,000-hour MTTF
 1 power supply, 200,000-hour MTTF
 1 fan, 200,000-hour MTTF
 1 ATA cable, 1,000,000-hour MTTF
93
Copyright © 2019, Elsevier Inc. All rights reserved.
Dependability
 Redundancy
 The primary way to cope with failure

in time (repeat the operation to see if it still
is erroneous)

in resources (have other components to
take over from the one that failed).
94
Copyright © 2019, Elsevier Inc. All rights reserved.
Dependability
 Redundancy example
 Assume that one power supply is sufficient to run the disk subsystem
and that we are adding one redundant power supply.
 2 power supplies and independent failures
 MTTF for redundant power supplies
 MTTFone=MTTFpower supply/2
 MTTFpair: the mean time until one power supply fails divided by the chance that
the other will fail before the first one is replaced.
 the probability of a second failure is MTTR over the mean time until the other
power supply fails
 24 hours to notice that a power supply has failed and to replace it
 4150 times more reliable than a single power supply
95
Copyright © 2019, Elsevier Inc. All rights reserved.
Measuring Performance
 Typical performance metrics:
 Response time :execution time
 Throughput
 Speedup of X relative to Y

Execution timeY / Execution timeX
 Execution time
 the time between the start and the completion of an event
 Wall clock time: includes all system overheads

storage accesses, memory accesses, input/output activities, operating
system, …
 CPU time: only computation time
Measuring
Performance
96
Copyright © 2019, Elsevier Inc. All rights reserved.
Benchmarks
 Kernels (e.g. matrix multiply)
 Toy programs (e.g. sorting)
 Synthetic benchmarks (e.g. Dhrystone)
 Benchmark suites (e.g. SPEC06fp, TPC-C)
 Standard test suites
 CPU tests Mathematical operations, compression, encryption, physics.
 2D graphics tests Vectors, bitmaps, fonts, text, and GUI elements.
 3D graphics tests DirectX 9 to DirectX 12 in 4K resolution. DirectCompute &
OpenCL
 Disk tests Reading, writing & seeking within disk files + IOPS
 Memory tests Memory access speeds and latency
97
Copyright © 2019, Elsevier Inc. All rights reserved.
Benchmarks
98
Copyright © 2019, Elsevier Inc. All rights reserved.
Principles of Computer Design
 Take Advantage of Parallelism
 e.g. multiple processors, disks, memory banks,
pipelining, multiple functional units
 ILP,DLP,TLP,RLP
 Principle of Locality
 Reuse of data and instructions
 a program spends 90% of its execution time in only 10% of the
code.
 Focus on the Common Case : energy, resource allocation,
and performance.
 The instruction fetch and decode unit of a processor may be used much more
frequently than a multiplier, so optimize it first.
 Amdahl’s Law
Principles
99
Copyright © 2019, Elsevier Inc. All rights reserved.
Amdahl’s Law
100
Copyright © 2019, Elsevier Inc. All rights reserved.
Amdahl’s Law
101
Copyright © 2019, Elsevier Inc. All rights reserved.
Principles of Computer Design
 The Processor Performance Equation
Principles
102
Copyright © 2019, Elsevier Inc. All rights reserved.
Principles of Computer Design
Principles
 Different instruction types having different
CPIs
103
Copyright © 2019, Elsevier Inc. All rights reserved.
Principles of Computer Design
 Example: Suppose we made the following measurements:
 Frequency of FP operations=25%
 Average CPI of FP operations=4.0
 Average CPI of other instructions=1.33
 Frequency of FSQRT=2%
 CPI of FSQRT=20
 Compare these two design
 decrease the CPI of FSQRT to 2
 decrease the average CPI of all FP operations to 2.5.
104
Copyright © 2019, Elsevier Inc. All rights reserved.
Principles of Computer Design
 Example: Suppose we made the following measurements:
 Frequency of FP operations=25%
 Average CPI of FP operations=4.0
 Average CPI of other instructions=1.33
 Frequency of FSQRT=2%
 CPI of FSQRT=20
 Compare these two design
 decrease the CPI of FSQRT to 2
 decrease the average CPI of all FP operations to 2.5.
105
Copyright © 2019, Elsevier Inc. All rights reserved.
Fallacies and Pitfalls
 All exponential laws must come to an end
 Dennard scaling (constant power density)

Stopped by threshold voltage
 Disk capacity

30-100% per year to 5% per year
 Moore’s Law

Most visible with DRAM capacity

ITRS disbanded

Only four foundries left producing state-of-the-art
logic chips

11 nm, 3 nm might be the limit
106
Copyright © 2019, Elsevier Inc. All rights reserved.
Fallacies and Pitfalls
 Microprocessors are a silver bullet
 Performance is now a programmer’s burden
 Falling prey to Amdahl’s Law
 A single point of failure
 Hardware enhancements that increase
performance also improve energy
efficiency, or are at worst energy neutral
 Benchmarks remain valid indefinitely
 Compiler optimizations target benchmarks
107
Copyright © 2019, Elsevier Inc. All rights reserved.
Fallacies and Pitfalls
 The rated mean time to failure of disks is
1,200,000 hours or almost 140 years, so
disks practically never fail
 MTTF value from manufacturers assume
regular replacement
 Peak performance tracks observed
performance
 Fault detection can lower availability
 Not all operations are needed for correct
execution
108
Copyright © 2019, Elsevier Inc. All rights reserved.
Ad

More Related Content

What's hot (20)

REAL TIME OPERATING SYSTEM PART 1
REAL TIME OPERATING SYSTEM PART 1REAL TIME OPERATING SYSTEM PART 1
REAL TIME OPERATING SYSTEM PART 1
Embeddedcraft Craft
 
10. compute-part-2
10. compute-part-210. compute-part-2
10. compute-part-2
Muhammad Ahad
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors Architectures
Mohammed Hilal
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
Adithya Bhat
 
x86 architecture
x86 architecturex86 architecture
x86 architecture
i i
 
Memory hierarchy
Memory hierarchyMemory hierarchy
Memory hierarchy
Mahesh Kumar Attri
 
Evolution of Microprocessor
Evolution of MicroprocessorEvolution of Microprocessor
Evolution of Microprocessor
Green University of Bangladesh
 
Overclocking
OverclockingOverclocking
Overclocking
Vivek Bajpai
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
Vajira Thambawita
 
Chpt7
Chpt7Chpt7
Chpt7
RohitKeshari
 
Ram presentation
Ram presentationRam presentation
Ram presentation
Kadai McFadden
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
Page Maker
 
RTOS for Embedded System Design
RTOS for Embedded System DesignRTOS for Embedded System Design
RTOS for Embedded System Design
anand hd
 
Power Management in Embedded Systems
Power Management in Embedded Systems Power Management in Embedded Systems
Power Management in Embedded Systems
mentoresd
 
Processor types
Processor typesProcessor types
Processor types
Amr Aboelgood
 
Risc cisc Difference
Risc cisc DifferenceRisc cisc Difference
Risc cisc Difference
Sehrish Asif
 
Overview of ST7 8-bit Microcontrollers
Overview of ST7 8-bit MicrocontrollersOverview of ST7 8-bit Microcontrollers
Overview of ST7 8-bit Microcontrollers
Premier Farnell
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
Sayed Chhattan Shah
 
Vliw
VliwVliw
Vliw
AJAL A J
 
Cache memory
Cache memoryCache memory
Cache memory
Muhammad Imran
 
REAL TIME OPERATING SYSTEM PART 1
REAL TIME OPERATING SYSTEM PART 1REAL TIME OPERATING SYSTEM PART 1
REAL TIME OPERATING SYSTEM PART 1
Embeddedcraft Craft
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors Architectures
Mohammed Hilal
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
Adithya Bhat
 
x86 architecture
x86 architecturex86 architecture
x86 architecture
i i
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
Vajira Thambawita
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
Page Maker
 
RTOS for Embedded System Design
RTOS for Embedded System DesignRTOS for Embedded System Design
RTOS for Embedded System Design
anand hd
 
Power Management in Embedded Systems
Power Management in Embedded Systems Power Management in Embedded Systems
Power Management in Embedded Systems
mentoresd
 
Risc cisc Difference
Risc cisc DifferenceRisc cisc Difference
Risc cisc Difference
Sehrish Asif
 
Overview of ST7 8-bit Microcontrollers
Overview of ST7 8-bit MicrocontrollersOverview of ST7 8-bit Microcontrollers
Overview of ST7 8-bit Microcontrollers
Premier Farnell
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
Sayed Chhattan Shah
 

Similar to dvance computer architecture computer architecture: a quantitative approach chapter 1 Fundamentals of Quantitative Design and Analysis (20)

Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
claudio48
 
Intel Microprocessors- a Top down Approach
Intel Microprocessors- a Top down ApproachIntel Microprocessors- a Top down Approach
Intel Microprocessors- a Top down Approach
Editor IJCATR
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
Abdelrahman Hosny
 
A 64-Bit RISC Processor Design and Implementation Using VHDL
A 64-Bit RISC Processor Design and Implementation Using VHDL A 64-Bit RISC Processor Design and Implementation Using VHDL
A 64-Bit RISC Processor Design and Implementation Using VHDL
Andrew Yoila
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Hannes Tschofenig
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
NomanSiddiqui41
 
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Slide_N
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT Project
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applications
ROHIT89352
 
Ijetr042175
Ijetr042175Ijetr042175
Ijetr042175
Engineering Research Publication
 
chameleon chip
chameleon chipchameleon chip
chameleon chip
Sucharita Bohidar
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
Ef35745749
Ef35745749Ef35745749
Ef35745749
IJERA Editor
 
Ca lecture 03
Ca lecture 03Ca lecture 03
Ca lecture 03
Haris456
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
Khanh Le
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
Ganesan Narayanasamy
 
Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGA
IJERA Editor
 
Embedded systems
Embedded systemsEmbedded systems
Embedded systems
Edgefxkits & Solutions
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
Edge AI and Vision Alliance
 
intel business presentation 77777777777.pptx
intel business presentation 77777777777.pptxintel business presentation 77777777777.pptx
intel business presentation 77777777777.pptx
AnjaliSharma489502
 
Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
claudio48
 
Intel Microprocessors- a Top down Approach
Intel Microprocessors- a Top down ApproachIntel Microprocessors- a Top down Approach
Intel Microprocessors- a Top down Approach
Editor IJCATR
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
Abdelrahman Hosny
 
A 64-Bit RISC Processor Design and Implementation Using VHDL
A 64-Bit RISC Processor Design and Implementation Using VHDL A 64-Bit RISC Processor Design and Implementation Using VHDL
A 64-Bit RISC Processor Design and Implementation Using VHDL
Andrew Yoila
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Hannes Tschofenig
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
NomanSiddiqui41
 
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Slide_N
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT Project
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applications
ROHIT89352
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
Ca lecture 03
Ca lecture 03Ca lecture 03
Ca lecture 03
Haris456
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
Khanh Le
 
Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGA
IJERA Editor
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
Edge AI and Vision Alliance
 
intel business presentation 77777777777.pptx
intel business presentation 77777777777.pptxintel business presentation 77777777777.pptx
intel business presentation 77777777777.pptx
AnjaliSharma489502
 
Ad

Recently uploaded (20)

Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
DED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedungDED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedung
nabilarizqifadhilah1
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdfDavid Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
Guru Nanak Technical Institutions
 
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Journal of Soft Computing in Civil Engineering
 
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
ajayrm685
 
Machine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATIONMachine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATION
DarrinBright1
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
introduction technology technology tec.pptx
introduction technology technology tec.pptxintroduction technology technology tec.pptx
introduction technology technology tec.pptx
Iftikhar70
 
2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt
rakshaiya16
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
Reflections on Morality, Philosophy, and History
 
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning ModelsMode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Journal of Soft Computing in Civil Engineering
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
DED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedungDED KOMINFO detail engginering design gedung
DED KOMINFO detail engginering design gedung
nabilarizqifadhilah1
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdfDavid Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
ajayrm685
 
Machine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATIONMachine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATION
DarrinBright1
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
introduction technology technology tec.pptx
introduction technology technology tec.pptxintroduction technology technology tec.pptx
introduction technology technology tec.pptx
Iftikhar70
 
2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt
rakshaiya16
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
Ad

dvance computer architecture computer architecture: a quantitative approach chapter 1 Fundamentals of Quantitative Design and Analysis

  • 1. 1 Copyright © 2019, Elsevier Inc. All rights reserved. ‫پیشرفته‬ ‫کامپیوتر‬ ‫معماری‬ ‫اول‬ ‫جلسه‬ ‫پروردگار‬ ‫نام‬ ‫به‬ ‫مهر‬ ‫گسترده‬
  • 2. Copyright © 2019, Elsevier Inc. All rights reserved. 2 Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative Approach, Sixth Edition
  • 3. 3 Copyright © 2019, Elsevier Inc. All rights reserved. Computer Technology  Performance improvements:  Improvements in semiconductor technology  Feature size, clock speed  Improvements in computer architectures  Enabled by HLL compilers, UNIX  Lead to RISC architectures  Together have enabled:  Lightweight computers  Productivity-based managed/interpreted programming languages Introduction
  • 4. 4 Copyright © 2019, Elsevier Inc. All rights reserved. Single Processor Performance Introduction
  • 5. 5 Copyright © 2019, Elsevier Inc. All rights reserved. Current Trends in Architecture  Cannot continue to leverage Instruction-Level parallelism (ILP)  Single processor performance improvement ended in 2003  New models for performance:  Data-level parallelism (DLP)  Thread-level parallelism (TLP)  Request-level parallelism (RLP)  These require explicit restructuring of the application Introduction
  • 6. 6 Copyright © 2019, Elsevier Inc. All rights reserved. Classes of Computers  Personal Mobile Device (PMD)  e.g. start phones, tablet computers  Emphasis on energy efficiency and real-time  Desktop Computing  Emphasis on price-performance  Servers  Emphasis on availability, scalability, throughput  Clusters / Warehouse Scale Computers  Used for “Software as a Service (SaaS)”  Emphasis on availability and price-performance  Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks  Internet of Things/Embedded Computers  Emphasis: price Classes of Computers
  • 7. 7 Copyright © 2019, Elsevier Inc. All rights reserved. Parallelism  Classes of parallelism in applications:  Data-Level Parallelism (DLP)  Task-Level Parallelism (TLP)  Classes of architectural parallelism:  Instruction-Level Parallelism (ILP)  Vector architectures/Graphic Processor Units (GPUs)  Thread-Level Parallelism  Request-Level Parallelism Classes of Computers
  • 8. 8 Copyright © 2019, Elsevier Inc. All rights reserved. Flynn’s Taxonomy  Single instruction stream, single data stream (SISD)  Single instruction stream, multiple data streams (SIMD)  Vector architectures  Multimedia extensions  Graphics processor units  Multiple instruction streams, single data stream (MISD)  No commercial implementation  Multiple instruction streams, multiple data streams (MIMD)  Tightly-coupled MIMD  Loosely-coupled MIMD Classes of Computers
  • 9. 9 1- Single Instruction Single Data(SISD)  This category is the uniprocessor.  The programmer thinks of it as the standard sequential computer,but it can exploit ILP.
  • 10. 10 2-Single Instruction Multiple Data(SIMD)  The same instruction is executed by multiple processors using different data streams.  SIMD computers exploit data-level parallelism by applying the same operations to multiple items of data in parallel.  Each processor has its own data memory  but there is a single instruction memory and control processor ,which fetches and dispatches instructions.  vector architectures,  multimedia extensions to standard instruction sets, and GPUs.
  • 11. 11 3- Multiple Instruction Single Data(MISD) Nocommercial multiprocessor of this type has been built to date, but it rounds out this simple classification.
  • 12. 12 4- Multiple Instruction Multiple Data(MIMD)  Each processor fetches its own instructions and operates on its own data, and it targets task-level parallelism(TLP)  DLP (more expensive than SIMD)  Tightly coupled MIMD architectures:TLP  Loosely coupled MIMD architectures:RLP  Clusters  warehouse-scale computers
  • 13. 13 Copyright © 2019, Elsevier Inc. All rights reserved. Defining Computer Architecture  “Old” view of computer architecture:  Instruction Set Architecture (ISA) design  i.e. decisions regarding:  registers, memory addressing, addressing modes, instruction operands, available operations, control flow instructions, instruction encoding  “Real” computer architecture:  Specific requirements of the target machine  Design to maximize performance within constraints: cost, power, and availability  Includes ISA, microarchitecture, hardware Defining Computer Architecture
  • 14. 14 Copyright © 2019, Elsevier Inc. All rights reserved. Instruction Set Architecture  Class of ISA  General-purpose registers  Register-memory vs load-store  RISC-V registers  32 g.p., 32 f.p. Defining Computer Architecture Register Name Use Saver x0 zero constant 0 n/a x1 ra return addr caller x2 sp stack ptr callee x3 gp gbl ptr x4 tp thread ptr x5-x7 t0-t2 temporaries caller x8 s0/fp saved/ frame ptr callee Register Name Use Saver x9 s1 saved callee x10-x17 a0-a7 arguments caller x18-x27 s2-s11 saved callee x28-x31 t3-t6 temporaries caller f0-f7 ft0-ft7 FP temps caller f8-f9 fs0-fs1 FP saved callee f10-f17 fa0-fa7 FP arguments callee f18-f27 fs2-fs21 FP saved callee f28-f31 ft8-ft11 FP temps caller
  • 15. 15 Copyright © 2019, Elsevier Inc. All rights reserved. Instruction Set Architecture  Memory addressing  RISC-V: byte addressed, aligned accesses faster  An access to an object of size s bytes at byte address A is aligned if A mod s=0.  Addressing modes  RISC-V: Register, immediate, displacement (base+offset)  Other examples: autoincrement, indexed, PC-relative  Types and size of operands  RISC-V: 8-bit, 32-bit, 64-bit  IEEE 754 floating point in 32-bit (single precision) and 64-bit (double precision).  The 80x86 also supports 80-bit floating point (extended double precision). Defining Computer Architecture
  • 16. 16 Copyright © 2019, Elsevier Inc. All rights reserved. Floating point instructions for RISC-V.
  • 17. 17 Copyright © 2019, Elsevier Inc. All rights reserved. IEEE 754 Format
  • 18. 18 Copyright © 2019, Elsevier Inc. All rights reserved. Instruction Set Architecture  Operations  RISC-V: data transfer, arithmetic, logical, control, floating point  See Fig. 1.5 in text  Control flow instructions  Use content of registers (RISC-V) vs. status bits (x86, ARMv7, ARMv8)  Return address in register (RISC-V, ARMv7, ARMv8) vs. on stack (x86)  Encoding  Fixed (RISC-V, ARMv7/v8 except compact instruction set) vs. variable length (x86) Defining Computer Architecture
  • 19. 19 Copyright © 2019, Elsevier Inc. All rights reserved. Encoding
  • 20. 20 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 21. 21 Copyright © 2019, Elsevier Inc. All rights reserved. ‫پیشرفته‬ ‫کامپیوتر‬ ‫معماری‬ ‫دوم‬ ‫جلسه‬ ‫پروردگار‬ ‫نام‬ ‫به‬ ‫مهر‬ ‫گسترده‬
  • 22. Copyright © 2019, Elsevier Inc. All rights reserved. 22 Chapter 1 Fundamentals of Quantitative Design and Analysis…(Cont.) Computer Architecture A Quantitative Approach, Sixth Edition
  • 23. 23 Copyright © 2019, Elsevier Inc. All rights reserved. Genuine Computer Architecture  The implementation of a computer has two components:  organization  hardware
  • 24. 24 Copyright © 2019, Elsevier Inc. All rights reserved. …Genuine Computer Architecture  Organization  the high-level aspects of a computer’s design,  the memory system, the memory interconnect, and the design of the internal processor or CPU (central processing unit—where arithmetic, logic, branching, and data transfer are implemented).  The term microarchitecture is also used instead of organization.
  • 25. 25 Copyright © 2019, Elsevier Inc. All rights reserved. …Genuine Computer Architecture  Two processors with the same instruction set architectures but different organizations are the AMD Opteron and the Intel Core i7.  Both processors implement the 80x86 instruction set, but they have very different pipeline and cache organizations.
  • 26. 26 Copyright © 2019, Elsevier Inc. All rights reserved. …Genuine Computer Architecture  Hardware  refers to the specifics of a computer:  the detailed logic design  the packaging technology of the computer.  Often a line of computers contains computers with :  identical instruction set architectures  very similar organizations,  differ in the detailed hardware implementation.
  • 27. 27 Copyright © 2019, Elsevier Inc. All rights reserved. …Genuine Computer Architecture  the Intel Core i7 and the Intel Xeon E7  nearly identical  different clock rates  different memory systems  the Xeon E7 more effective for server computers.
  • 28. 28 Copyright © 2019, Elsevier Inc. All rights reserved.  Computer architects must design a computer to meet  functional requirements as well as price,power,performance,andavailability goals  architects also must determine what the functional requirements are, which can be a major task.  The requirements may be specific features inspired by the market.  Application software typically drives the choice of certain functional requirements by determining how the computer will be used …Genuine Computer Architecture
  • 29. 29 Copyright © 2019, Elsevier Inc. All rights reserved. Summary of some of the most important functional requirements an architect faces
  • 30. 30 Copyright © 2019, Elsevier Inc. All rights reserved. Trends in Technology  Integrated circuit technology (Moore’s Law)  Transistor density: 35%/year  Die size: 10-20%/year  Integration overall: 40-55%/year  DRAM capacity: 25-40%/year (slowing)  8 Gb (2014), 16 Gb (2019), possibly no 32 Gb  Flash capacity: 50-60%/year  8-10X cheaper/bit than DRAM  Magnetic disk capacity: recently slowed to 5%/year  Density increases may no longer be possible, maybe increase from 7 to 9 platters  8-10X cheaper/bit then Flash  200-300X cheaper/bit than DRAM  Network technology  Network Performance depends both on the performance of switches and on the performance of the transmission system. Trends in Technology  Designers often design for the next technology.  Cost has decreased at about the rate at which density increases.
  • 31. 31 Copyright © 2019, Elsevier Inc. All rights reserved. Bandwidth and Latency  Bandwidth or throughput  Total work done in a given time  32,000-40,000X improvement for processors  300-1200X improvement for memory and disks  Latency or response time  Time between start and completion of an event  50-90X improvement for processors  6-8X improvement for memory and disks Trends in Technology
  • 32. 32 Copyright © 2019, Elsevier Inc. All rights reserved. Bandwidth and Latency…  Performance is the primary differentiator for microprocessors and networks.  the greatest gains: 32,000–40,000 in bandwidth and 50–90 in latency.  Capacity is generally more important than performance for memory and disks.  capacity has improved more,  bandwidth advances of 400–2400  gains in latency of 8–9.
  • 33. 33 Copyright © 2019, Elsevier Inc. All rights reserved. Performance milestones over 25–40 years for microprocessors
  • 34. 34 Copyright © 2019, Elsevier Inc. All rights reserved. Performance milestones over 25–40 years for memory
  • 35. 35 Copyright © 2019, Elsevier Inc. All rights reserved. Performance milestones over 25–40 years for networks,
  • 36. 36 Copyright © 2019, Elsevier Inc. All rights reserved. Performance milestones over 25–40 years for disks
  • 37. 37 Copyright © 2019, Elsevier Inc. All rights reserved. Bandwidth and Latency Log-log plot of bandwidth and latency milestones relative to the first milestone. latency improved 8–91, **** bandwidth improved about 400–32,000. Except for networking, there were modest improvements in latency and bandwidth in the other three technologies in the six years (2011-2017): 0%–23% in latency and 23%–70% in bandwidth. Trends in Technology
  • 38. 38 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 39. 39 Copyright © 2019, Elsevier Inc. All rights reserved. ‫پیشرفته‬ ‫کامپیوتر‬ ‫معماری‬ ‫سوم‬ ‫جلسه‬ ‫پروردگار‬ ‫نام‬ ‫به‬ ‫مهر‬ ‫گسترده‬
  • 40. Copyright © 2019, Elsevier Inc. All rights reserved. 40 Chapter 1 Fundamentals of Quantitative Design and Analysis…(Cont.) Computer Architecture A Quantitative Approach, Sixth Edition
  • 41. 41 Copyright © 2019, Elsevier Inc. All rights reserved. Transistors and Wires  Feature size  Minimum size of transistor or wire in x or y dimension  10 microns in 1971 to .011 microns in 2017  Transistor performance scales linearly  Wire delay does not improve with feature size!  Integration density scales quadratically Trends in Technology  Larger and larger fractions of the clock cycle have been consumed by the propagation delay of signals on wires .  but power now plays an even greater role than wire delay.
  • 42. 42 Copyright © 2012, Elsevier Inc. All rights reserved. Transistors and Wires
  • 43. 43 Copyright © 2012, Elsevier Inc. All rights reserved. Power and Energy
  • 44. 44 Copyright © 2019, Elsevier Inc. All rights reserved. Power and Energy concerns 1. what is the maximum power a processor ever requires?  voltage indexing methods that allow the processor to slow down and regulate voltage within a wider margin. 2. what is the sustained power consumption( thermal design power (TDP)) it determines the cooling requirement. 3. Which metric is the right one for comparing processors: energy or power?
  • 45. 45 Copyright © 2019, Elsevier Inc. All rights reserved. Power and Energy  Problem: Get power in, get power out  Thermal Design Power (TDP)  Characterizes sustained power consumption  Used as target for power supply and cooling system  Lower than peak power (1.5X higher), higher than average power consumption  Clock rate can be reduced dynamically to limit power consumption  Energy per task is often a better measurement Trends in Power and Energy
  • 46. 46 Copyright © 2012, Elsevier Inc. All rights reserved. Power and Energy  power : energy per unit time  1 watt = 1 joule per second. E=P*T  Which metric is the right one for comparing processors: energy or power?  In general, energy is always a better metric  because it is tied to a specific task and the time required for that task.
  • 47. 47 Copyright © 2012, Elsevier Inc. All rights reserved. Power and Energy  if we want to know which of two processors is more efficient for a given task, we should compare energy consumption (not power) for executing the task.
  • 48. 48 Copyright © 2012, Elsevier Inc. All rights reserved. Power and Energy  When is power consumption a useful measure?  as a constraint.  for example, a chip might be limited to 100 watts.
  • 49. 49 Copyright © 2012, Elsevier Inc. All rights reserved. Power and Energy  Static power  Dynamic power
  • 50. 50 Copyright © 2012, Elsevier Inc. All rights reserved. Dynamic Energy and Power
  • 51. 51 Copyright © 2019, Elsevier Inc. All rights reserved. Dynamic Energy and Power  Dynamic energy  Transistor switch from 0 -> 1 or 1 -> 0  ½ x Capacitive load x Voltage2  Dynamic power  ½ x Capacitive load x Voltage2 x Frequency switched  Reducing clock rate reduces power, not energy Trends in Power and Energy
  • 52. 52 Copyright © 2012, Elsevier Inc. All rights reserved. Dynamic Energy and Power
  • 53. 53 Copyright © 2019, Elsevier Inc. All rights reserved. Power  Intel 80386 consumed ~ 2 W  3.3 GHz Intel Core i7 consumes 130 W  Heat must be dissipated from 1.5 x 1.5 cm chip  This is the limit of what can be cooled by air Trends in Power and Energy
  • 54. 54 Copyright © 2012, Elsevier Inc. All rights reserved. Power
  • 55. 55 Copyright © 2012, Elsevier Inc. All rights reserved. Reducing Power
  • 56. 56 Copyright © 2012, Elsevier Inc. All rights reserved. Reducing Power  Techniques for reducing power:  Do nothing well: (clock gating)  Most microprocessors today turn off the clock of inactive modules to save energy and dynamic power  Dynamic Voltage-Frequency Scaling (DVFS).  Personal mobile devices, laptops, and even servers have periods of low activity where there is no need to operate at the highest clock frequency and voltages.  Low power state for DRAM, disks :  Given that PMDs and laptops are often idle, memory and storage offer low power modes to save energy  Overclocking, turning off cores  the 3.3 GHz Core i7 can run in short bursts for 3.6 GHz.  microprocessors can turn off all cores but one and run it  at an even higher clock rate.  For single threaded code, these microprocessors can turn off all cores but one and run it at an even higher clock rate. Trends in Power and Energy
  • 57. 57 Copyright © 2019, Elsevier Inc. All rights reserved. Reducing Power  Techniques for reducing power:  Do nothing well  Dynamic Voltage-Frequency Scaling  Low power state for DRAM, disks Trends in Power and Energy
  • 58. 58 Copyright © 2012, Elsevier Inc. All rights reserved. Static Power
  • 59. 59 Copyright © 2019, Elsevier Inc. All rights reserved. Static Power  Static power consumption  25-50% of total power  Currentstatic x Voltage  Scales with number of transistors  To reduce: power gating Trends in Power and Energy
  • 60. 60 Copyright © 2019, Elsevier Inc. All rights reserved. Static Power  large SRAM caches that need power to maintain the storage values. (The S in SRAM is for static.)  The only hope to stop leakage is to turn off power to the chips’ subsets.
  • 61. 61 Copyright © 2019, Elsevier Inc. All rights reserved. race-to-halt.  because the processor is just a portion of the whole energy cost of a system,  it can make sense to use a faster, less energy-efficient processor to allow the rest of the system to go into a sleep mode. This strategy is known as race-to-halt.
  • 62. 62 Copyright © 2019, Elsevier Inc. All rights reserved. Domain specific processors A computer will consist of  standard processors to run conventional large programs such as operating systems  Domain specific processors do only a narrow range of tasks, but they do them extremely well.  such computers will be much more heterogeneous than the homogeneous multicore chips of the past.
  • 63. 63 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 64. 64 Copyright © 2019, Elsevier Inc. All rights reserved. ‫پیشرفته‬ ‫کامپیوتر‬ ‫معماری‬ ‫چهارم‬ ‫جلسه‬ ‫پروردگار‬ ‫نام‬ ‫به‬ ‫مهر‬ ‫گسترده‬
  • 65. 65 Copyright © 2019, Elsevier Inc. All rights reserved . 10
  • 66. Copyright © 2019, Elsevier Inc. All rights reserved. 66 Chapter 1 Fundamentals of Quantitative Design and Analysis…(Cont.) Computer Architecture A Quantitative Approach, Sixth Edition
  • 67. 67 Copyright © 2019, Elsevier Inc. All rights reserved. Trends in Cost  Although costs tend to be less important in some computer designs—specifically supercomputers  cost-sensitive designs are of growing significance  learning curve :manufacturing costs decrease over time.  Example  Price per megabyte of DRAM has dropped over the long term. price and cost of DRAM track closely.  Microprocessor prices also drop over time, but because they are less standardized than DRAMs, the relationship between price and cost is more complex. yield
  • 68. 68 Copyright © 2019, Elsevier Inc. All rights reserved. Trends in Cost  Cost driven down by learning curve  Yield  DRAM: price closely tracks cost  Microprocessors: price depends on volume  10% less for each doubling of volume Trends in Cost
  • 69. 69 Copyright © 2019, Elsevier Inc. All rights reserved. Trends in Cost  key factor in determining cost:
  • 70. 70 Copyright © 2019, Elsevier Inc. All rights reserved. Cost of an Integrated Circuit  standard parts—disks, Flash memory, DRAMs, and so on—are becoming a significant portion of any system’s cost.  with PMDs’ increasing reliance of whole systems on a chip (SOC), the cost of the integrated circuits is much of the cost of the PMD.
  • 71. 71 Copyright © 2019, Elsevier Inc. All rights reserved. Trends in Cost
  • 72. 72 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 73. 73 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 74. 74 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 75. 75 Copyright © 2019, Elsevier Inc. All rights reserved. Integrated Circuit Cost  Integrated circuit  Bose-Einstein formula:  Defects per unit area = 0.016-0.057 defects per square cm (2010)  N = process-complexity factor = 11.5-15.5 (40 nm, 2010)  For 28 nm processes in 2017, N is 7.5–9.5. For a 16 nm process,  N ranges from 10 to 14 Trends in Cost
  • 76. 76 Copyright © 2019, Elsevier Inc. All rights reserved. Integrated Circuit Cost
  • 77. 77 Copyright © 2019, Elsevier Inc. All rights reserved. Integrated Circuit Cost
  • 78. 78 Copyright © 2019, Elsevier Inc. All rights reserved. Integrated Circuit Cost :redundancy as a way to raise yield.  Given the tremendous price pressures on commodity products such as DRAM and SRAM, designers have included redundancy as a way to raise yield.  DRAMs have regularly included some redundant memory cells so that a certain number of flaws can be accommodated.  Designers have used similar techniques in both standard SRAMs and in large SRAM arrays used for caches within microprocessors.  GPUs have 4 redundant processors out of 84 for the same reason. Obviously, the presence of redundant entries can be used to boost the yield significantly.
  • 79. 79 Copyright © 2019, Elsevier Inc. All rights reserved. Cost Versus Price  Margin between the cost to manufacture a product and the price the product sells for has been shrinking.  Those margins pay for  company’s research and development (R&D),  marketing,  sales,  manufacturing equipment maintenance,  building rental,  cost of financing,  Pretax profits, and taxes.
  • 80. 80 Copyright © 2019, Elsevier Inc. All rights reserved. Cost of Manufacturing Versus Cost of Operation  Before  cost meant the cost to build a computer  price meant price to purchase a computer.  With the advent of WSCs,  capital expenses (CAPEX):  tens of thousands of servers,  operational expenses (OPEX):  the cost to operate the computers
  • 81. 81 Copyright © 2019, Elsevier Inc. All rights reserved. (CAPEX) & (OPEX)
  • 82. 82 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 83. 83 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 84. 84 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 85. 85 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 86. 86 Copyright © 2019, Elsevier Inc. All rights reserved.
  • 87. 87 Copyright © 2019, Elsevier Inc. All rights reserved. Dependability  Before :  ICs were one of the most reliable components of a computer.  their pins may be vulnerable, and faults may occur over communication channels, the failure rate inside the chip was very low.  Now,  because of feature sizes of 16 nm and smaller,  Transient faults and permanent faults are becoming more commonplace.
  • 88. 88 Copyright © 2019, Elsevier Inc. All rights reserved. Dependability  Service level agreements (SLAs)  an SLA could be used to decide whether the system was up or down.
  • 89. 89 Copyright © 2019, Elsevier Inc. All rights reserved. Dependability  Systems alternate between two states: 1. Service accomplishment: where the service is delivered as specified. 2. Service interruption: where the delivered service is different from the SLA  Transitions between these two states are caused by  Failures (from state 1 to state 2)  Restorations (2 to 1).
  • 90. 90 Copyright © 2019, Elsevier Inc. All rights reserved. Dependability  Quantifying these transitions leads to the two main measures of dependability:  Module reliability  a measure of the continuous service accomplishment  the time to failure from a reference initial instant.  Module availability  a measure of the service accomplishment with respect to the alternation between the two states of accomplishment and interruption.
  • 91. 91 Copyright © 2019, Elsevier Inc. All rights reserved. Dependability  Module reliability  Mean time to failure (MTTF)  mean time to failure  FIT (=1/MTTF)  failures in time  rate of failures, generally reported as failures per billion hours of operation  Mean time to repair (MTTR)  Mean time between failures (MTBF) = MTTF + MTTR  Module Availability = MTTF / MTBF Dependability
  • 92. 92 Copyright © 2019, Elsevier Inc. All rights reserved. Dependability  Assume a disk subsystem with the following components and MTTF:  10 disks, each rated at 1,000,000-hour MTTF  1 ATA controller, 500,000-hour MTTF  1 power supply, 200,000-hour MTTF  1 fan, 200,000-hour MTTF  1 ATA cable, 1,000,000-hour MTTF
  • 93. 93 Copyright © 2019, Elsevier Inc. All rights reserved. Dependability  Redundancy  The primary way to cope with failure  in time (repeat the operation to see if it still is erroneous)  in resources (have other components to take over from the one that failed).
  • 94. 94 Copyright © 2019, Elsevier Inc. All rights reserved. Dependability  Redundancy example  Assume that one power supply is sufficient to run the disk subsystem and that we are adding one redundant power supply.  2 power supplies and independent failures  MTTF for redundant power supplies  MTTFone=MTTFpower supply/2  MTTFpair: the mean time until one power supply fails divided by the chance that the other will fail before the first one is replaced.  the probability of a second failure is MTTR over the mean time until the other power supply fails  24 hours to notice that a power supply has failed and to replace it  4150 times more reliable than a single power supply
  • 95. 95 Copyright © 2019, Elsevier Inc. All rights reserved. Measuring Performance  Typical performance metrics:  Response time :execution time  Throughput  Speedup of X relative to Y  Execution timeY / Execution timeX  Execution time  the time between the start and the completion of an event  Wall clock time: includes all system overheads  storage accesses, memory accesses, input/output activities, operating system, …  CPU time: only computation time Measuring Performance
  • 96. 96 Copyright © 2019, Elsevier Inc. All rights reserved. Benchmarks  Kernels (e.g. matrix multiply)  Toy programs (e.g. sorting)  Synthetic benchmarks (e.g. Dhrystone)  Benchmark suites (e.g. SPEC06fp, TPC-C)  Standard test suites  CPU tests Mathematical operations, compression, encryption, physics.  2D graphics tests Vectors, bitmaps, fonts, text, and GUI elements.  3D graphics tests DirectX 9 to DirectX 12 in 4K resolution. DirectCompute & OpenCL  Disk tests Reading, writing & seeking within disk files + IOPS  Memory tests Memory access speeds and latency
  • 97. 97 Copyright © 2019, Elsevier Inc. All rights reserved. Benchmarks
  • 98. 98 Copyright © 2019, Elsevier Inc. All rights reserved. Principles of Computer Design  Take Advantage of Parallelism  e.g. multiple processors, disks, memory banks, pipelining, multiple functional units  ILP,DLP,TLP,RLP  Principle of Locality  Reuse of data and instructions  a program spends 90% of its execution time in only 10% of the code.  Focus on the Common Case : energy, resource allocation, and performance.  The instruction fetch and decode unit of a processor may be used much more frequently than a multiplier, so optimize it first.  Amdahl’s Law Principles
  • 99. 99 Copyright © 2019, Elsevier Inc. All rights reserved. Amdahl’s Law
  • 100. 100 Copyright © 2019, Elsevier Inc. All rights reserved. Amdahl’s Law
  • 101. 101 Copyright © 2019, Elsevier Inc. All rights reserved. Principles of Computer Design  The Processor Performance Equation Principles
  • 102. 102 Copyright © 2019, Elsevier Inc. All rights reserved. Principles of Computer Design Principles  Different instruction types having different CPIs
  • 103. 103 Copyright © 2019, Elsevier Inc. All rights reserved. Principles of Computer Design  Example: Suppose we made the following measurements:  Frequency of FP operations=25%  Average CPI of FP operations=4.0  Average CPI of other instructions=1.33  Frequency of FSQRT=2%  CPI of FSQRT=20  Compare these two design  decrease the CPI of FSQRT to 2  decrease the average CPI of all FP operations to 2.5.
  • 104. 104 Copyright © 2019, Elsevier Inc. All rights reserved. Principles of Computer Design  Example: Suppose we made the following measurements:  Frequency of FP operations=25%  Average CPI of FP operations=4.0  Average CPI of other instructions=1.33  Frequency of FSQRT=2%  CPI of FSQRT=20  Compare these two design  decrease the CPI of FSQRT to 2  decrease the average CPI of all FP operations to 2.5.
  • 105. 105 Copyright © 2019, Elsevier Inc. All rights reserved. Fallacies and Pitfalls  All exponential laws must come to an end  Dennard scaling (constant power density)  Stopped by threshold voltage  Disk capacity  30-100% per year to 5% per year  Moore’s Law  Most visible with DRAM capacity  ITRS disbanded  Only four foundries left producing state-of-the-art logic chips  11 nm, 3 nm might be the limit
  • 106. 106 Copyright © 2019, Elsevier Inc. All rights reserved. Fallacies and Pitfalls  Microprocessors are a silver bullet  Performance is now a programmer’s burden  Falling prey to Amdahl’s Law  A single point of failure  Hardware enhancements that increase performance also improve energy efficiency, or are at worst energy neutral  Benchmarks remain valid indefinitely  Compiler optimizations target benchmarks
  • 107. 107 Copyright © 2019, Elsevier Inc. All rights reserved. Fallacies and Pitfalls  The rated mean time to failure of disks is 1,200,000 hours or almost 140 years, so disks practically never fail  MTTF value from manufacturers assume regular replacement  Peak performance tracks observed performance  Fault detection can lower availability  Not all operations are needed for correct execution
  • 108. 108 Copyright © 2019, Elsevier Inc. All rights reserved.
  翻译: