Description
Material
External Info
Description

ELEC 470 covers some of the advanced topics in computer architecture with a quantitative perspective.  It explores the architectural details that are essential for effective understanding, application, and performance characterization of modern processors, multiprocessors, clusters, and GPU architectures, with hierarchical memory subsystems.  This course first studies the fundamentals of quantitative design and analysis, and then introduces the instruction set design through the use of a MIPS instruction set architecture.  An important portion of the course is dedicated to exploring processor design and implementation with a focus on instruction level parallelism (ILP), including single-issue pipelined processors, multiple-issue (superscalar) processors, with static and dynamic scheduling and speculation, along with simulation studies.  The course then discusses multicore processors and shared-memory multiprocessor architectures, with a focus on thread level parallelism (TLP), cache coherency and parallel programming.  It then studies multicore clusters and message passing systems.  Hierarchical memory subsystems, including multi-level caches and integration with pipelined processors, and virtual memory with address translation is then covered.  Finally, the course discusses data level parallelism (DLP), and GPU architectures.

This course builds on and supplements knowledge from other courses, including ELEC 271, ELEC 274, and ELEC 371 as formal prerequisites, along with ELEC 374 (taken only by Computer Engineering Students) for additional background.

Objectives
  • Understand computer performance, power, energy and cost metrics, laws, as well as standard benchmarking techniques to quantitatively design and analyze computer systems.
  • Understand the principles of the reduced instruction set design (RISC) architecture through MIPS instruction set architecture (ISA). 
  • Write and interpret simple code sequences in MIPS assembly language.  Understand instruction formats, addressing modes, register usage conventions, and procedure support in MIPS, and translate simple codes in C to MIPS.
  • Understand the design of a single-issue, non-pipelined datapath with control unit for the MPIS ISA. Realize how to extend the design for new instructions and features.
  • Understand the design of a single-issue, pipelined datapath with control unit for the MPIS ISA.  Realize how structural, data and control hazards can affect performance.  Understand how data hazards can be handled by code scheduling statically, or by forwarding and/or stalling dynamically in hardware.   Understand how techniques such as delay slots or static/dynamic branch prediction can resolve control hazards in pipelined processors.  Determine the performance of pipeline with stalls, and understand how to deal with exceptions.  Understand how to extend the pipeline to support multi-cycle floating-point operations, and how hazards and forwarding are handled in longer latency pipelines.
  • Understand advanced ILP, including multiple-issue pipelined processors with static scheduling (VLIW), dynamic scheduling (superscalar), in-order execution, out-of-order execution, out-of-order execution with speculation, and with in-order commit.  Learn scheduling, loop unrolling and register renaming in static multiple-issue MIPS pipeline.  Understand true dependency, output dependency, and antidependency.  Learn dynamic scheduling with a scoreboard for MIPS, and dynamic scheduling with the Tomasulo’s algorithm.  Understand increasing ILP further by extending Tomasulo’s algorithm with hardware based speculation.  Study the limitations of ILP for realizable processors.  Understand how to use multithreading (MT) to improve superscalar performance: coarse-grained MT, fine-grained MT, and simultaneous MT (SMT).
  • Understand processor-memory performance gap, and the memory hierarchy as a potential architectural technique to remedy the situation.  Understand single-level and multi-level cache architectures, multilevel inclusion vs. multilevel exclusion, cache read and write policies, write-through vs. write-back policies, write buffer, write allocate vs. no-write allocate, split caches vs. a unified cache.  Handle cache misses in the pipeline, and understand the impact of cache performance on processor pipeline, miss penalty and out-of-order execution.  Learn basic and advanced cache optimization techniques, including non-blocking and multi-banked caches, and hardware vs. compiler-controlled prefetching.
  • Understand the memory management unit, virtual memory and virtual-to-physical address translation, and translation look-aside buffer (TLB).  Learn cache, virtual memory and TLB integration.
  • Use simulation tools to reinforce understanding of pipelining, and the issues involved in pipelining such as data, control and structural hazards.  Study how forwarding and code scheduling might remove or reduce the number of stalls due to different kinds of hazards.
  • Use simulation tools to obtain dynamic instruction execution statistics and understand the different characteristics of application programs, to understand how branch prediction efficiency affects an application’s performance, to understand the trade-offs among different branch prediction strategies, to compare in-order execution with out-of-order execution and speculation, and to understand
the relative importance of various techniques, to have a better understanding of how various cache parameters affects performance, and to understand the trade-offs between different TLB organizations.
  • Understand multiprocessor architectures, from symmetric multiprocessors (SMP) to distributed shared memory multiprocessors (DSM), to message passing multiprocessors (clusters, MPPs).  Understand challenges in parallel processing (partitioning, communication costs, synchronization cost, scheduling, load balancing, and parallel algorithms), Amdahl’s Law, Gustafson’s Law, and weak scaling vs. strong scaling.
  • Understand shared memory multiprocessors, and shared memory programming through examples, process synchronization, lock/unlock mechanisms and hardware primitives in MIPS, spin-lock synchronization, and barriers.  Cache coherency and memory consistency problem.  Snooping vs. directory-based cache coherency protocols, write invalidate vs. write update cache coherency, deriving protocol state transition for MSI, MESI, and MOESI multiprocessor cache coherence protocols, and false sharing impact.  Writing parallel programs using Pthreads and OpenMP on multicore, multiprocessor nodes.  Principles of message passing programming, pros and cons with shared memory programming, Message Passing Interface (MPI), and mixed-mode programming.
Credit Breakdown

Lecture: 3
Lab: 0
Tutorial: 0.5

Academic Unit Breakdown

Mathematics 0
Natural Sciences 0
Complementary Studies 0
Engineering Science 11
Engineering Design 31