Our innovative implementation uses large crossbar arrays of emerging non-volatile memories (NVM) based on Phase-Change Memory (PCM), which map FC networks into arrays in an extremely efficient way, enabling speed and power performances that could potentially outperform current CPUs and GPUs [3]. PCM devices are based on the reversible switching of a chalcogenide layer between crystalline and amorphous states, thus leading to, respectively, high- and low-conductances. The ability to partially crystallize the amorphous dome with voltage pulses enables a gradual tuning of the device conductance. This feature is crucial for neuromorphic computing, since DNN weights are encoded in an analog way, as the difference of the conductances of a couple of PCMs.
In this presentation we review our recent progress towards the design of a PCM-based chip for DNN training. We already reported a mixed software-hardware demonstration of a 3-layer FC perceptron with 164,885 synapses, with every synapse comprising 2 real PCMs, trained on the MNIST dataset of handwritten digits. After discussing the limitations that real NVM devices provide to DNN training, we proceed towards a description of the needs from an architecture point of view [4].
A single NVM array-core is surrounded by neuron circuitry which implements forward propagate, reverse propagate and weight-update operations [5]. For maximum performance and parallelism, each row/column would need its own unique copy of the neuron circuitry to avoid time-multiplexing. This necessitates several design choices and approximate computing approaches that enable area-efficient pitch-matched circuits yet achieve software-equivalent accuracies in training. This approach extends to weight-update, where a crossbar-compatible procedure performs the weight update of the entire array in parallel, thus strongly increasing the training speed [1]. All these concepts have been recently implemented in an Array Diagnostic Monitor, providing useful insight into NVM-based hardware training.
An eventual chip, to be competitive against state-of-the-art GPUs, should be able to train networks with many layers. This is obtained by implementing several NVM array-cores, each of them mapping a single weight layer. Arrays are connected through a reconfigurable routing network capable to provide fast exchange of information between different network layers [6]. The entire chip should be able to:
- load images as inputs to the network,
- load image labels for backpropagation,
- Export trained weights,
- Import weights for multi-chip parallel computation.
We will present a review of our recent progress in these areas, and then will conclude the talk providing some guidelines for future development of novel NVM devices for neuromorphic computation.
Abstract references:
[1] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element” IEDM Tech. Digest, 29.5 (2014).
[2] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses) using phase-change memory as the synaptic weight element”, IEEE Trans. Elec. Dev, 62(11), pp. 3498 (2015).
[3] G. W. Burr et al., “Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power)”, IEDM Tech. Digest, 4.4 (2015).
[4] S. Sidler et al., “Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: impact of conductance response”, ESSDERC Proc., 440 (2016).
[5] P. Narayanan et al., “Reducing circuit design complexity for neuromorphic machine learning systems based on non-volatile memory arrays”, Proc. ISCAS, 1 (2017).
[6] P. Narayanan et al., “Toward on-chip acceleration of the backpropagation algorithm using nonvolatile memory”, IBM J. Res. Dev., 61 (4), 1-11 (2017).