Lecture 1: Fundamentals
of Quantitative Design and Analysis
- CRA Community White
Paper, “21st century
computer architecture,” available at http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf,
May 2012.
- J.L. Manferdelli,
N.K. Govindaraju, and C. Crall,
"Challenges
and opportunities in many-core computing," Proceedings of IEEE,
vol. 96, no. 5, pp. 808-815, May 2008.
- S. Borkar,
"Thousand
core chips - a technology perspective," Proc. IEEE/ACM Design
Automation Conf. (DAC), 2007, pp. 746-749.
- A. Roy, J. Xu, and M.H. Chowdhury, "Multi-core
processors: a new way forward and challenge," Proc. Int'l Conf.
Microelectronics, 2008, pp. 454-457.
Lecture 2: Memory
Hierarchy Design
- R. Heald,
K. Shin, V. Reddy, I.-F. Kao, M. Khan, W. L. Lynch, G. Lauterbach,
and J. Petolino, “64-KByte
sum-addressed-memory cache with 1.6-ns cycle and 2.6-ns latency,” J.
of Solid State Circuits, pp. 1682, Nov. 1998.
- B. Jacob and T. Mudge, “Virtual memory in contemporary
microprocessors,” IEEE Micro., vol. 18, no. 4, pp. 60-75, Jul./Aug. 1998.
- J. Kim, A.J. Hong, S.M.
Kim, K.-S. Shin, et. al, “A
stacked memory device on logic 3D technology for ultra-high-density data
storage,” Nanotechnology,
vol. 22, no. 25, Nov. 2011.
- G.H. Loh,
“3D-stacked
memory architectures for multi-core processors,” in Proc. IEEE/ACM Int’l Symp. Computer Architecture (ISCA), 2008, pp.
453-464.
- J. Olukotun, T. N. Mudge,
and R. B. Brown, “Multilevel optimization
of pipelined caches,” IEEE Trans. Computers,
vol. 46, no. 10, pp. 1093-1102, Oct. 1997.
- E. Rotenberg, et. al, “Trace cache: a low latency approach to high bandwidth
instruction fetching,” in Proc. 29th Symp.
Microarchitecture, Dec. 1996.
- S. P. Vanderwiel and D. L. Lija,
“When
caches aren't enough: data prefetching techniques,” IEEE
Computer, vol. 30, no. 7, pp. 23-30, Jul., 1997.
- D.H. Woo, N.H. Seong, D.L. Lewis, and H.H.S. Lee, “An
optimized 3D-stacked memory architecture by exploiting excessive,
high-density TSV bandwidth,” in Proc.
IEEE 16th HPCA, 2010, pp. 1-12.
Lecture 3:
Instruction-Level Parallelism and Its Exploitation
- G. Doshi,
“Understanding the IA-64 Architecture,”
1999.
- J.
Douglas, “Intel 8xx series and Paxville Xeon-MP
Microprocessors,” Proc. Hot Chips, Stanford
University, August. 2005.
- G. Hinton, et. al, “The Microarchitecture
of the Pentium 4 Procssor,” Intel
Technology Journal, Q1, 2001.
- S. A. Mahlke, “A Comparison of Full and Partial
Predicated Execution Support for ILP Processors,” Proc. 22nd Annual Symp.
Computer Architecture,
pp. 138-150, Jun. 1995.
·
H.M.
Mathis, A.E. Mercias, J. D. McCalpin,
R.J. Eickemeyer, and S.R. Kunkel, “Characterization
of the multithreading (SMT) efficiency in Power5,” IBM J. Res. & Dev., 49:4/5 (July/September), 555–564.
·
B.
Sinharoy, R. N. Koala, J. M. Tendler,
R. J. Eickemeyer, and J. B. Joyner, “POWER5 system
microarchitecture,” IBM
J. Res. & Dev, 49:4-5, 505–521.
·
J.M.
Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and
B. Sinharoy, “Power4 system microarchitecture,” IBM J. Res & Dev, 46:1, 5–26.
·
N.
Tuck, and D. Tullsen, “Initial observations of the simultaneous
multithreading Pentium 4 processor,” Proc.
12th Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2003, pp. 26–34.
Lecture 5: Thread-Level
Parallelsim
- B. Busck, M. Engbom, S. Lee, M.
Dubois, and P. Stenstrom, “Loop-level
speculative parallelism in embedded applications,” Proc. Int’l Conf. Parallel Processing,
2007.
- M. M.
Islam, A. Busck, M. Engbom,
S. Lee, M. Dubois, and P. Stenstrom, “Limits
on thread-level speculative parallelism in embedded applications,” Proc. IEEE Int’l Symp.
High-Performance Computer Architecture, 2007.
- A. Kejariwal, M. Girkar, X. Tian, H. Saito, et. al, “Exploitation
of nested thread-level speculative parallelism on multi-core system,” Proc. 7th ACM Int’l Conf.
Computing Frontiers, 2010.
- D. Koufaty,
D. T. Marr, “Hyperthreading
technology in the netburst microarchitecture,”
IEEE Micro., vol. 2, no. 23, Mar.-Apr. 2003.
- J.L. Lo, J.S. Emer, H.M. Levy, R.L. Stamm,
D.M. Tullsen, and S.J. Eggers, “Converting
thread-level parallelism to instruction-level parallelism via simultaneous
multithreading,” ACM Trans.
Computer Systems (TOCS), vol. 15, no. 3, pp. 322-354, Aug. 1997.
- D. M. Tullsen, S. J.
Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R.
L. Stamm, “Exploiting Choice: Instruction Fetch and
Issue on an Implementable Simultaneous Multithreading Processor,” Proc. Symp.
Computer Architecture, 1996.
Lecture 6: Data-Level
Parallelism
- L.A. Barroso
and U. Holzle, The Datacenter
as a Computer: an Introduction to the Design of Warehouse-Scale Machines,
Morgan & Claypool Publishers, 2009.
- M. Chu, R. Ravindran, S. Mahlke, “Data
access partitioning for fine-grain parallelism on multicore architecture,”
Proc. 40th IEEE/ACM Int’l
Symp. Microarchitecture (MICRO), 2007.
- J. Sampson, R. Gonzalez,
J. Collard, et. al, “Exploiting
fine-grained data parallelism with chip multiprocessors and fast barriers,”
Proc. 39th IEEE/ACM Int’l
Symp. Microarchitecture, 2006, pp. 235-246.
- Y. Yi, W. Han, A. Major,
A. T. Erdogan, and T. Arslan,
“Exploiting
loop-level parallelism on multi-core architectures for the wimax physical layer,” Proc. IEEE Int’l SoC Conf., 2008.
- H. Zhong,
S. Lieberman, and S. A. Mahlke, “Extending
multicore architectures to exploit hybrid parallelism in single-thread
applications,” Proc. IEEE Symp. High Performance Computing Architecture (HPCA),
2009.
Appendix A:
Instruction Set Architecture
- K. Diefendorff, et. al,
“Altivec Extension to PowerPC Accelerates Media
Processing,” IEEE Micro., vol. 20,
no. 2, pp. 85-95, Sept. 2001.
- A. Eden and T. Mudge, “The YAGS branch prediction scheme,” Proc. of the 31st Annual
ACM/IEEE International Symposium on Microarchitecture, 69–80.
- J. Huck,
et. al, “Introducing the IA-64
Architecture,” IEEE Micro., vol. 20,
no. 5, pp. 12-23, Sept. 2000.
·
D.A.
Jimenez and C. Lin, “Neural methods for dynamic branch prediction,” ACM Trans. Computer Sys 20:4, (November), 369–397.
·
C. McNairy and D. Soltis. “Itanium 2 processor
microarchitecture,” IEEE Micro,
vol. 23, no. :2, pp. 44–55, Mar.-Apr. 2003.