Authors: Jean-Loup Baer
ISBN-13: 9780521769921, ISBN-10: 0521769922
Format: Hardcover
Publisher: Cambridge University Press
Date Published: December 2009
Edition: New Edition
This book gives a comprehensive description of the architecture of microprocessors from simple in-order short pipeline designs to out-of-order superscalars. It discusses topics such as
The policies and mechanisms needed for out-of-order processing, such as register renaming, reservation stations, and reorder buffers
Optimizations for high performance, such as branch predictors, instruction scheduling, and load-store speculations
Design choices and enhancements to tolerate latency in the cache hierarchy of single and multiple processors
State-of-the-art multithreading and multiprocessing, emphasizing single-chip implementations
Topics are presented as conceptual ideas, with metrics to assess the effects on performance, if appropriate, and examples of realization. The emphasis is on how things work at a black box and algorithmic level. The author also provides sufficient detail at the register transfer level so that readers can appreciate how design features enhance performance as well as complexity.
Preface xi
1 Introduction 1
1.1 A Quick View of Technological Advances 2
1.2 Performance Metrics 6
1.3 Performance Evaluation 12
1.4 Summary 22
1.5 Further Reading and Bibliographical Notes 23
Exercises 24
References 28
2 The Basics 29
2.1 Pipelining 29
2.2 Caches 46
2.3 Virtual Memory and Paging 59
2.4 Summary 68
2.5 Further Reading and Bibliographical Notes 68
Exercises 69
References 73
3 Superscalar Processors 75
3.1 From Scalar to Superscalar Processors 75
3.2 Overview of the Instruction Pipeline of the DEC Alpha 21164 78
3.3 Introducing Register Renaming, Reorder Buffer, and Reservation Stations 89
3.4 Overview of the Pentium P6 Microarchitecture 102
3.5 VLIW/EPIC Processors 111
3.6 Summary 121
3.7 Further Reading and Bibliographical Notes 122
Exercises 124
References 126
4 Front-End: Branch Prediction, Instruction Fetching, and Register Renaming 129
4.1 Branch Prediction 130
Sidebar: The DEC Alpha 21264 Branch Predictor 157
4.2 Instruction Fetching 158
4.3 Decoding 164
4.4 Register Renaming (a Second Look) 165
4.5 Summary 170
4.6 Further Reading and Bibliographical Notes 170
Exercises 171
Programming Projects 174
References 174
5 Back-End: Instruction Scheduling, Memory Access Instructions, and Clusters 177
5.1 Instruction Issue and Scheduling (Wakeup and Select) 178
5.2 Memory-Accessing Instructions 184
5.3 Back-End Optimizations 195
5.4 Summary 203
5.5 Further Reading and Bibliographical Notes 204
Exercises 205
Programming Project 206
References 206
6 The Cache Hierarchy 208
6.1 Improving Access to L1 Caches 209
6.2 Hiding Memory Latencies 218
6.3 Design Issues forLarge Higher-Level Caches 232
6.4 Main Memory 245
6.5 Summary 253
6.6 Further Reading and Bibliographical Notes 254
Exercises 255
Programming Projects 257
References 258
7 Multiprocessors 260
7.1 Multiprocessor Organization 261
7.2 Cache Coherence 269
7.3 Synchronization 281
7.4 Relaxed Memory Models 290
7.5 Multimedia Instruction Set Extensions 294
7.6 Summary 296
7.7 Further Reading and Bibliographical Notes 297
Exercises 298
References 300
8 Multithreading and (Chip) Multiprocessing 303
8.1 Single-Processor Multithreading 304
8.2 General-Purpose Multithreaded Chip Multiprocessors 318
8.3 Special-Purpose Multithreaded Chip Multiprocessors 324
8.4 Summary 330
8.5 Further Reading and Bibliographical Notes 331
Exercises 332
References 333
9 Current Limitations and Future Challenges 335
9.1 Power and Thermal Management 336
9.2 Technological Limitations: Wire Delays and Pipeline Depths 343
9.3 Challenges for Chip Multiprocessors 346
9.4 Summary 348
9.5 Further Reading and Bibliographical Notes 349
References 349
Bibliography 351
Index 361