Publications

Results 51–100 of 342
Skip to search filters

Analog architectures for neural network acceleration based on non-volatile memory

Applied Physics Reviews

Xiao, T.P.; Bennett, Christopher H.; Feinberg, Benjamin F.; Agarwal, Sapan A.; Marinella, Matthew J.

Analog hardware accelerators, which perform computation within a dense memory array, have the potential to overcome the major bottlenecks faced by digital hardware for data-heavy workloads such as deep learning. Exploiting the intrinsic computational advantages of memory arrays, however, has proven to be challenging principally due to the overhead imposed by the peripheral circuitry and due to the non-ideal properties of memory devices that play the role of the synapse. We review the existing implementations of these accelerators for deep supervised learning, organizing our discussion around the different levels of the accelerator design hierarchy, with an emphasis on circuits and architecture. We explore and consolidate the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlight the key design trade-offs underlying these techniques.

More Details

Three Artificial Spintronic Leaky Integrate-and-Fire Neurons

SPIN

Brigner, Wesley H.; Hu, Xuan; Hassan, Naimul; Jiang-Wei, Lucian; Bennett, Christopher H.; Garcia-Sanchez, Felipe; Akinola, Otitoaleke; Pasquale, Massimo; Marinella, Matthew J.; Incorvia, Jean A.; Friedman, Joseph S.

Due to their nonvolatility and intrinsic current integration capabilities, spintronic devices that rely on domain wall (DW) motion through a free ferromagnetic track have garnered significant interest in the field of neuromorphic computing. Although a number of such devices have already been proposed, they require the use of external circuitry to implement several important neuronal behaviors. As such, they are likely to result in either a decrease in energy efficiency, an increase in fabrication complexity, or even both. To resolve this issue, we have proposed three individual neurons that are capable of performing these functionalities without the use of any external circuitry. To implement leaking, the first neuron uses a dipolar coupling field, the second uses an anisotropy gradient and the third uses shape variations of the DW track.

More Details

Maximized lateral inhibition in paired magnetic domain wall racetracks for neuromorphic computing

Nanotechnology

Cui, Can; Akinola, Otitoaleke G.; Hassan, Naimul; Bennett, Christopher H.; Marinella, Matthew J.; Friedman, Joseph S.; Incorvia, Jean A.

Lateral inhibition is an important functionality in neuromorphic computing, modeled after the biological neuron behavior that a firing neuron deactivates its neighbors belonging to the same layer and prevents them from firing. In most neuromorphic hardware platforms lateral inhibition is implemented by external circuitry, thereby decreasing the energy efficiency and increasing the area overhead of such systems. Recently, the domain wall - magnetic tunnel junction (DW-MTJ) artificial neuron is demonstrated in modeling to be intrinsically inhibitory. Without peripheral circuitry, lateral inhibition in DW-MTJ neurons results from magnetostatic interaction between neighboring neuron cells. However, the lateral inhibition mechanism in DW-MTJ neurons has not been studied thoroughly, leading to weak inhibition only in very closely-spaced devices. This work approaches these problems by modeling current- and field- driven DW motion in a pair of adjacent DW-MTJ neurons. We maximize the magnitude of lateral inhibition by tuning the magnetic interaction between the neurons. The results are explained by current-driven DW velocity characteristics in response to an external magnetic field and quantified by an analytical model. Dependence of lateral inhibition strength on device parameters is also studied. Finally, lateral inhibition behavior in an array of 1000 DW-MTJ neurons is demonstrated. Our results provide a guideline for the optimization of lateral inhibition implementation in DW-MTJ neurons. With strong lateral inhibition achieved, a path towards competitive learning algorithms such as the winner-take-all are made possible on such neuromorphic devices.

More Details

Lateral inhibition in magnetic domain wall racetrack arrays for neuromorphic computing

Proceedings of SPIE - The International Society for Optical Engineering

Cui, Can; Akinola, Otitoaleke G.; Hassan, Naimul; Bennett, Christopher H.; Marinella, Matthew J.; Friedman, Joseph S.; Incorvia, Jean A.

Neuromorphic computing captures the quintessential neural behaviors of the brain and is a promising candidate for the beyond-von Neumann computer architectures, featuring low power consumption and high parallelism. The neuronal lateral inhibition feature, closely associated with the biological receptive field, is crucial to neuronal competition in the nervous system as well as its neuromorphic hardware counterpart. The domain wall - magnetic tunnel junction (DW-MTJ) neuron is an emerging spintronic artificial neuron device exhibiting intrinsic lateral inhibition. This work discusses lateral inhibition mechanism of the DW-MTJ neuron and shows by micromagnetic simulation that lateral inhibition is efficiently enhanced by the Dzyaloshinskii-Moriya interaction (DMI).

More Details

Plasticity-enhanced domain-wall MTJ neural networks for energy-efficient online learning

Proceedings - IEEE International Symposium on Circuits and Systems

Bennett, Christopher H.; Xiao, T.P.; Cui, Can; Hassan, Naimul; Akinola, Otitoaleke G.; Incorvia, Jean A.; Velasquez, Alvaro; Friedman, Joseph S.; Marinella, Matthew J.

Machine learning implements backpropagation via abundant training samples. We demonstrate a multi-stage learning system realized by a promising non-volatile memory device, the domain-wall magnetic tunnel junction (DW-MTJ). The system consists of unsupervised (clustering) as well as supervised sub-systems, and generalizes quickly (with few samples). We demonstrate interactions between physical properties of this device and optimal implementation of neuroscience-inspired plasticity learning rules, and highlight performance on a suite of tasks. Our energy analysis confirms the value of the approach, as the learning budget stays below 20µJ even for large tasks used typically in machine learning.

More Details

Process variation model and analysis for domain wall-magnetic tunnel junction logic

Proceedings - IEEE International Symposium on Circuits and Systems

Hu, Xuan; Edwards, Alexander J.; Xiao, T.P.; Bennett, Christopher H.; Incorvia, Jean A.; Marinella, Matthew J.; Friedman, Joseph S.

The domain wall-magnetic tunnel junction (DW-MTJ) is a spintronic device that enables efficient logic circuit design because of its low energy consumption, small size, and non-volatility. Furthermore, the DW-MTJ is one of the few spintronic devices for which a direct cascading mechanism is experimentally demonstrated without any extra buffers; this enables potential design and fabrication of a large-scale DW-MTJ logic system. However, DW-MTJ logic relies on the conversion between electrical signals and magnetic states which is sensitive to process imperfection. Therefore, it is important to analyze the robustness of such DW-MTJ devices to anticipate the system reliability before fabrication. Here we propose a new DW-MTJ model that integrates the impacts of process variation to enable the analysis and optimization of DW-MTJ logic. This will allow circuit and device design that enhances the robustness of DW-MTJ logic and advances the development of energy-efficient spintronic computing systems.

More Details

Three-terminal magnetic tunnel junction synapse circuits showing spike-timing-dependent plasticity

Journal of Physics D: Applied Physics

Akinola, Otitoaleke; Hu, Xuan; Bennett, Christopher H.; Marinella, Matthew J.; Friedman, Joseph S.; Incorvia, Jean A.

There have been recent efforts towards the development of biologically-inspired neuromorphic devices and architecture. Here, we show a synapse circuit that is designed to perform spike-timing-dependent plasticity which works with the leaky, integrate, and fire neuron in a neuromorphic computing architecture. The circuit consists of a three-terminal magnetic tunnel junction with a mobile domain wall between two low-pass filters and has been modeled in SPICE. The results show that the current flowing through the synapse is highly correlated to the timing delay between the pre-synaptic and post-synaptic neurons. Using micromagnetic simulations, we show that introducing notches along the length of the domain wall track pins the domain wall at each successive notch to properly respond to the timing between the input and output current pulses of the circuit, producing a multi-state resistance representing synaptic weights. We show in SPICE that a notch-free ideal magnetic device also shows spike-timing dependent plasticity in response to the circuit current. This work is key progress towards making more bio-realistic artificial synapses with multiple weights, which can be trained online with a promise of CMOS compatibility and energy efficiency.

More Details

Wafer-Scale TaOx Device Variability and Implications for Neuromorphic Computing Applications

IEEE International Reliability Physics Symposium Proceedings

Bennett, Christopher H.; Garland, Diana; Jacobs-Gedrim, Robin B.; Agarwal, Sapan A.; Marinella, Matthew J.

Scaling arrays of non-volatile memory devices from academic demonstrations to reliable, manufacturable systems requires a better understanding of variability at array and wafer-scale levels. CrossSim models the accuracy of neural networks implemented on an analog resistive memory accelerator using the cycle-to-cycle variability of a single device. In this work, we extend this modeling tool to account for device-to-device variation in a realistic way, and evaluate the impact of this reliability issue in the context of neuromorphic online learning tasks.

More Details

Designing and modeling analog neural network training accelerators

2019 International Symposium on VLSI Technology, Systems and Application, VLSI-TSA 2019

Agarwal, Sapan A.; Jacobs-Gedrim, Robin B.; Bennett, Christopher H.; Hsia, Alexander W.; Adee, Shane M.; Hughart, David R.; Fuller, Elliot J.; Li, Yiyang; Talin, A.A.; Marinella, Matthew J.

Analog crossbars have the potential to reduce the energy and latency required to train a neural network by three orders of magnitude when compared to an optimized digital ASIC. The crossbar simulator, CrossSim, can be used to model device nonidealities and determine what device properties are needed to create an accurate neural network accelerator. Experimentally measured device statistics are used to simulate neural network training accuracy and compare different classes of devices including TaOx ReRAM, Lir-Co-Oz devices, and conventional floating gate SONOS memories. A technique called 'Periodic Carry' can overcomes device nonidealities by using a positional number system while maintaining the benefit of parallel analog matrix operations.

More Details

Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing

Science

Fuller, Elliot J.; Keene, Scott T.; Melianas, Armantas; Wang, Zhongrui; Agarwal, Sapan A.; Li, Yiyang; Tuchman, Yaakov; James, Conrad D.; Marinella, Matthew J.; Yang, J.J.; Salleo, Alberto; Talin, A.A.

Neuromorphic computers could overcome efficiency bottlenecks inherent to conventional computing through parallel programming and readout of artificial neural network weights in a crossbar memory array. However, selective and linear weight updates and <10-nanoampere read currents are required for learning that surpasses conventional computing efficiency. We introduce an ionic floating-gate memory array based on a polymer redox transistor connected to a conductive-bridge memory (CBM). Selective and linear programming of a redox transistor array is executed in parallel by overcoming the bridging threshold voltage of the CBMs. Synaptic weight readout with currents <10 nanoamperes is achieved by diluting the conductive polymer with an insulator to decrease the conductance. The redox transistors endure >1 billion write-read operations and support >1-megahertz write-read frequencies.

More Details

Semi-supervised learning and inference in domain-wall magnetic tunnel junction (DW-MTJ) neural networks

Proceedings of SPIE - The International Society for Optical Engineering

Bennett, Christopher H.; Hassan, Naimul; Hu, Xuan; Incornvia, Jean A.; Friedman, Joseph S.; Marinella, Matthew J.

Advances in machine intelligence have sparked interest in hardware accelerators to implement these algorithms, yet embedded electronics have stringent power, area budgets, and speed requirements that may limit non- volatile memory (NVM) integration. In this context, the development of fast nanomagnetic neural networks using minimal training data is attractive. Here, we extend an inference-only proposal using the intrinsic physics of domain-wall MTJ (DW-MTJ) neurons for online learning to implement fully unsupervised pattern recognition operation, using winner-take-all networks that contain either random or plastic synapses (weights). Meanwhile, a read-out layer trains in a supervised fashion. We find our proposed design can approach state-of-the-art success on the task relative to competing memristive neural network proposals, while eliminating much of the area and energy overhead that would typically be required to build the neuronal layers with CMOS devices.

More Details

Sparse Data Acquisition on Emerging Memory Architectures

IEEE Access

Quach, Tu-Thach Q.; Agarwal, Sapan A.; James, Conrad D.; Marinella, Matthew J.; Aimone, James B.

Emerging memory devices, such as resistive crossbars, have the capacity to store large amounts of data in a single array. Acquiring the data stored in large-capacity crossbars in a sequential fashion can become a bottleneck. We present practical methods, based on sparse sampling, to quickly acquire sparse data stored on emerging memory devices that support the basic summation kernel, reducing the acquisition time from linear to sub-linear. The experimental results show that at least an order of magnitude improvement in acquisition time can be achieved when the data are sparse. In addition, we show that the energy cost associated with our approach is competitive to that of the sequential method.

More Details

Contrasting Advantages of Learning With Random Weights and Backpropagation in Non-Volatile Memory Neural Networks

IEEE Access

Bennett, Christopher H.; Parmar, Vivek; Calvet, Laurie E.; Klein, Jacques O.; Suri, Manan; Marinella, Matthew J.; Querlioz, Damien

Recently, a Cambrian explosion of a novel, non-volatile memory (NVM) devices known as memristive devices have inspired effort in building hardware neural networks that learn like the brain. Early experimental prototypes built simple perceptrons from nanosynapses, and recently, fully-connected multi-layer perceptron (MLP) learning systems have been realized. However, while backpropagating learning systems pair well with high-precision computer memories and achieve state-of-the-art performances, this typically comes with a massive energy budget. For future Internet of Things/peripheral use cases, system energy footprint will be a major constraint, and emerging NVM devices may fill the gap by sacrificing high bit precision for lower energy. In this paper, we contrast the well-known MLP approach with the extreme learning machine (ELM) or NoProp approach, which uses a large layer of random weights to improve the separability of high-dimensional tasks, and is usually considered inferior in a software context. However, we find that when taking the device non-linearity into account, NoProp manages to equal hardware MLP system in terms of accuracy. While also using a sign-based adaptation of the delta rule for energy-savings, we find that NoProp can learn effectively with four to six 'bits' of device analog capacity, while MLP requires eight-bit capacity with the same rule. This may allow the requirements for memristive devices to be relaxed in the context of online learning. By comparing the energy footprint of these systems for several candidate nanosynapses and realistic peripherals, we confirm that memristive NoProp systems save energy compared with MLP systems. Lastly, we show that ELM/NoProp systems can achieve better generalization abilities than nanosynaptic MLP systems when paired with pre-processing layers (which do not require backpropagated error). Collectively, these advantages make such systems worthy of consideration in future accelerators or embedded hardware.

More Details
Results 51–100 of 342
Results 51–100 of 342