1161
(Invited) Beyond-CMOS Device and Interconnect Technology Benchmarking Based on a Fast Cross-Layer Optimization Methodology

Tuesday, 31 May 2016: 14:10
Aqua 307 (Hilton San Diego Bayfront)
C. Pan and A. Naeemi (Georgia Institute of Technology)
Moore’s Law scaling has last for almost half century, leading to a tremendous performance/area improvement and cost reduction. Unfortunately, in the past decade, severe challenges have been faced by the CMOS technology as the technology node enters sub-100nm region. To sustain the growing transistor performance and density, many novel device technologies are proposed in the past decade to augment or even replace the conventional Si CMOS technology with Cu interconnect.

For the device innovation, ever since the discovery of graphene with excellent physical and electrical properties, much interest has been drawn in the electron-device community and graphene-based FETs have been introduced. Based on the property of the angular dependent transmission probability of electrons observed in GPNJs, an enhanced device structure is presented. More elaborate physical models are also developed to better evaluate the upper limit of delay and power consumptions of GPNJ circuits, including ON resistance, leakage current, contact resistance, and footprint area. For the low-power applications, TFETs show promise in overcoming the power wall faced by thermionic FETs by allowing significant reduction in the supply voltage. The analytical device-level models for TFETs are developed to efficiently evaluate the overall system-level metrics. It is demonstrated that the IV characteristics of a TFET behaves like a Si CMOS switch, but it provides much lower leakage current and ultra-low supply voltage at the cost of low ON current.

For the interconnect technology, as the conventional Cu interconnect scales, the resistance per unit length increases dramatically because of 1) the smaller cross-sectional area and 2) the severe size effects at sub-20nm nodes. To alleviate the interconnect challenge, a novel local interconnect structure and hybrid Al-Cu interconnect architecture are proposed and benchmarked against their copper counterpart in terms of energy and energy-delay product. Alternatively, carbon-based interconnects, such as graphene sheets and carbon nanotubes, are also potential candidates because of their outstanding electrical properties. However, graphene is a two-dimensional structure, and increasing the interconnect pitch does not lower the resistance as fast as it does in copper interconnects. Hence, comparing graphene and copper interconnects strongly depends on the interconnect pitch. As a result, system-level analyses are essential to better understand and evaluate the overall benefits of graphene interconnects.

Different from existing system-level performance simulators that are based on cycle-accurate simulations, the present methodology employs three hierarchies of compact analytical models on material, device and interconnect, and system levels, respectively. Significant acceleration is achieved in the simulation speed, making multi-parameter optimization feasible. This run-time efficiency is crucial because many novel device concepts have fundamentally different operational principles, and their on/off currents and input capacitances vary drastically in accordance with their design parameters. This methodology allows technologies to evaluate various trade-offs among key design parameters and to maximize the overall chip throughputs or energy efficiencies of processors in a highly efficient way.  For the validation of the proposed methodology, simulation results are compared and well matched with eight commercially available Intel multi-core processors across three technology generations from 65nm to 32nm technology nodes.

For the GPNJ-based processors, the proposed design methodology is applied to efficiently perform device-, circuit-, and system-level co-optimization. For given power density and die size area budgets, various device-level parameters, including supply voltage, control voltage, gap distance, and oxide thickness, are optimized for a GPNJ core, where 2.1X throughput improvement is observed for a sharp-corner GPNJ core. This advantage is predominantly because of the smaller output resistance, which reduces both device and interconnect delay and saves the power for repeaters. For the TFET based processors, the results indicate that TFETs have excellent performance at the low power density range due to the low supply voltage. The limitations imposed by the interconnects and the large leakage current at high supply voltage restrict the driving current, leading to a lower performance of the TFETs at a high power density.

For the emerging interconnect technology, the proposed Al-Cu hybrid interconnect technology is evaluated by replacing short narrow local signal interconnects by Al interconnects. Six interconnect architecture options are analyzed and their optimal aspect ratio and chip frequency are predicted for five technology generations. The optimization and benchmarking results indicate that the potential improvement in chip clock frequency can be between 50 to 100% for the 7nm technology node. A comprehensive optimization/benchmarking is also performed for the multi-layer graphene interconnects. The results show that a single core using graphene interconnects have a higher throughput within the same power density and die size area because of the power saving offered by the low capacitance graphene interconnects.