<em>(Invited)</em> Analog Memory Fully Connected Networks for Deep Neural Network Accelerated Training

Narayanan, Pritish

Neuromorphic computation represents a novel and innovative solution for addressing elaborated tasks which are typically solved by the human brain [1]. The highly parallel structure employed in neuromorphic architectures enables complex computation with faulty devices, as opposite to digital architecture where 100% yield is required for correct behavior [2]. We are developing a neuromorphic chip for hardware acceleration of large Fully Connected (FC) Deep Neural Networks (DNNs) [1,2]. Training is performed with the backpropagation algorithm, which is a supervised form of learning based on three steps: image propagation through the network, comparison of the obtained output results with the correct label and backpropagation of the errors from the output to the input layer, weight update.

Our innovative implementation uses large crossbar arrays of emerging non-volatile memories (NVM) based on Phase-Change Memory (PCM), which map FC networks into arrays in an extremely efficient way, enabling speed and power performances that could potentially outperform current CPUs and GPUs [3]. PCM devices are based on the reversible switching of a chalcogenide layer between crystalline and amorphous states, thus leading to, respectively, high- and low-conductances. The ability to partially crystallize the amorphous dome with voltage pulses enables a gradual tuning of the device conductance. This feature is crucial for neuromorphic computing, since DNN weights are encoded in an analog way, as the difference of the conductances of a couple of PCMs.

In this presentation we review our recent progress towards the design of a PCM-based chip for DNN training. We already reported a mixed software-hardware demonstration of a 3-layer FC perceptron with 164,885 synapses, with every synapse comprising 2 real PCMs, trained on the MNIST dataset of handwritten digits. After discussing the limitations that real NVM devices provide to DNN training, we proceed towards a description of the needs from an architecture point of view [4].

A single NVM array-core is surrounded by neuron circuitry which implements forward propagate, reverse propagate and weight-update operations [5]. For maximum performance and parallelism, each row/column would need its own unique copy of the neuron circuitry to avoid time-multiplexing. This necessitates several design choices and approximate computing approaches that enable area-efficient pitch-matched circuits yet achieve software-equivalent accuracies in training. This approach extends to weight-update, where a crossbar-compatible procedure performs the weight update of the entire array in parallel, thus strongly increasing the training speed [1]. All these concepts have been recently implemented in an Array Diagnostic Monitor, providing useful insight into NVM-based hardware training.

An eventual chip, to be competitive against state-of-the-art GPUs, should be able to train networks with many layers. This is obtained by implementing several NVM array-cores, each of them mapping a single weight layer. Arrays are connected through a reconfigurable routing network capable to provide fast exchange of information between different network layers [6]. The entire chip should be able to:

load images as inputs to the network,
load image labels for backpropagation,
Export trained weights,
Import weights for multi-chip parallel computation.

We will present a review of our recent progress in these areas, and then will conclude the talk providing some guidelines for future development of novel NVM devices for neuromorphic computation.

Abstract references:

[1] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element” IEDM Tech. Digest, 29.5 (2014).

[2] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses) using phase-change memory as the synaptic weight element”, IEEE Trans. Elec. Dev, 62(11), pp. 3498 (2015).

[3] G. W. Burr et al., “Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power)”, IEDM Tech. Digest, 4.4 (2015).

[4] S. Sidler et al., “Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: impact of conductance response”, ESSDERC Proc., 440 (2016).

[5] P. Narayanan et al., “Reducing circuit design complexity for neuromorphic machine learning systems based on non-volatile memory arrays”, Proc. ISCAS, 1 (2017).

[6] P. Narayanan et al., “Toward on-chip acceleration of the backpropagation algorithm using nonvolatile memory”, IBM J. Res. Dev., 61 (4), 1-11 (2017).

732 (Invited) Analog Memory Fully Connected Networks for Deep Neural Network Accelerated Training

732
(Invited) Analog Memory Fully Connected Networks for Deep Neural Network Accelerated Training