Machine learning implements backpropagation via abundant training samples. We demonstrate a multi-stage learning system realized by a promising non-volatile memory device, the domain-wall magnetic tunnel junction (DW-MTJ). The system consists of unsupervised (clustering) as well as supervised sub-systems, and generalizes quickly (with few samples). We demonstrate interactions between physical properties of this device and optimal implementation of neuroscience-inspired plasticity learning rules, and highlight performance on a suite of tasks. Our energy analysis confirms the value of the approach, as the learning budget stays below 20µJ even for large tasks used typically in machine learning.
The domain wall-magnetic tunnel junction (DW-MTJ) is a spintronic device that enables efficient logic circuit design because of its low energy consumption, small size, and non-volatility. Furthermore, the DW-MTJ is one of the few spintronic devices for which a direct cascading mechanism is experimentally demonstrated without any extra buffers; this enables potential design and fabrication of a large-scale DW-MTJ logic system. However, DW-MTJ logic relies on the conversion between electrical signals and magnetic states which is sensitive to process imperfection. Therefore, it is important to analyze the robustness of such DW-MTJ devices to anticipate the system reliability before fabrication. Here we propose a new DW-MTJ model that integrates the impacts of process variation to enable the analysis and optimization of DW-MTJ logic. This will allow circuit and device design that enhances the robustness of DW-MTJ logic and advances the development of energy-efficient spintronic computing systems.
There have been recent efforts towards the development of biologically-inspired neuromorphic devices and architecture. Here, we show a synapse circuit that is designed to perform spike-timing-dependent plasticity which works with the leaky, integrate, and fire neuron in a neuromorphic computing architecture. The circuit consists of a three-terminal magnetic tunnel junction with a mobile domain wall between two low-pass filters and has been modeled in SPICE. The results show that the current flowing through the synapse is highly correlated to the timing delay between the pre-synaptic and post-synaptic neurons. Using micromagnetic simulations, we show that introducing notches along the length of the domain wall track pins the domain wall at each successive notch to properly respond to the timing between the input and output current pulses of the circuit, producing a multi-state resistance representing synaptic weights. We show in SPICE that a notch-free ideal magnetic device also shows spike-timing dependent plasticity in response to the circuit current. This work is key progress towards making more bio-realistic artificial synapses with multiple weights, which can be trained online with a promise of CMOS compatibility and energy efficiency.
Scaling arrays of non-volatile memory devices from academic demonstrations to reliable, manufacturable systems requires a better understanding of variability at array and wafer-scale levels. CrossSim models the accuracy of neural networks implemented on an analog resistive memory accelerator using the cycle-to-cycle variability of a single device. In this work, we extend this modeling tool to account for device-to-device variation in a realistic way, and evaluate the impact of this reliability issue in the context of neuromorphic online learning tasks.
Analog crossbars have the potential to reduce the energy and latency required to train a neural network by three orders of magnitude when compared to an optimized digital ASIC. The crossbar simulator, CrossSim, can be used to model device nonidealities and determine what device properties are needed to create an accurate neural network accelerator. Experimentally measured device statistics are used to simulate neural network training accuracy and compare different classes of devices including TaOx ReRAM, Lir-Co-Oz devices, and conventional floating gate SONOS memories. A technique called 'Periodic Carry' can overcomes device nonidealities by using a positional number system while maintaining the benefit of parallel analog matrix operations.
Neuromorphic computers could overcome efficiency bottlenecks inherent to conventional computing through parallel programming and readout of artificial neural network weights in a crossbar memory array. However, selective and linear weight updates and <10-nanoampere read currents are required for learning that surpasses conventional computing efficiency. We introduce an ionic floating-gate memory array based on a polymer redox transistor connected to a conductive-bridge memory (CBM). Selective and linear programming of a redox transistor array is executed in parallel by overcoming the bridging threshold voltage of the CBMs. Synaptic weight readout with currents <10 nanoamperes is achieved by diluting the conductive polymer with an insulator to decrease the conductance. The redox transistors endure >1 billion write-read operations and support >1-megahertz write-read frequencies.
Advances in machine intelligence have sparked interest in hardware accelerators to implement these algorithms, yet embedded electronics have stringent power, area budgets, and speed requirements that may limit non- volatile memory (NVM) integration. In this context, the development of fast nanomagnetic neural networks using minimal training data is attractive. Here, we extend an inference-only proposal using the intrinsic physics of domain-wall MTJ (DW-MTJ) neurons for online learning to implement fully unsupervised pattern recognition operation, using winner-take-all networks that contain either random or plastic synapses (weights). Meanwhile, a read-out layer trains in a supervised fashion. We find our proposed design can approach state-of-the-art success on the task relative to competing memristive neural network proposals, while eliminating much of the area and energy overhead that would typically be required to build the neuronal layers with CMOS devices.
Emerging memory devices, such as resistive crossbars, have the capacity to store large amounts of data in a single array. Acquiring the data stored in large-capacity crossbars in a sequential fashion can become a bottleneck. We present practical methods, based on sparse sampling, to quickly acquire sparse data stored on emerging memory devices that support the basic summation kernel, reducing the acquisition time from linear to sub-linear. The experimental results show that at least an order of magnitude improvement in acquisition time can be achieved when the data are sparse. In addition, we show that the energy cost associated with our approach is competitive to that of the sequential method.
Recently, a Cambrian explosion of a novel, non-volatile memory (NVM) devices known as memristive devices have inspired effort in building hardware neural networks that learn like the brain. Early experimental prototypes built simple perceptrons from nanosynapses, and recently, fully-connected multi-layer perceptron (MLP) learning systems have been realized. However, while backpropagating learning systems pair well with high-precision computer memories and achieve state-of-the-art performances, this typically comes with a massive energy budget. For future Internet of Things/peripheral use cases, system energy footprint will be a major constraint, and emerging NVM devices may fill the gap by sacrificing high bit precision for lower energy. In this paper, we contrast the well-known MLP approach with the extreme learning machine (ELM) or NoProp approach, which uses a large layer of random weights to improve the separability of high-dimensional tasks, and is usually considered inferior in a software context. However, we find that when taking the device non-linearity into account, NoProp manages to equal hardware MLP system in terms of accuracy. While also using a sign-based adaptation of the delta rule for energy-savings, we find that NoProp can learn effectively with four to six 'bits' of device analog capacity, while MLP requires eight-bit capacity with the same rule. This may allow the requirements for memristive devices to be relaxed in the context of online learning. By comparing the energy footprint of these systems for several candidate nanosynapses and realistic peripherals, we confirm that memristive NoProp systems save energy compared with MLP systems. Lastly, we show that ELM/NoProp systems can achieve better generalization abilities than nanosynaptic MLP systems when paired with pre-processing layers (which do not require backpropagated error). Collectively, these advantages make such systems worthy of consideration in future accelerators or embedded hardware.