A Test Platform to Characterize Emerging Nonvolatile Memories for Computing
Abstract not provided.
Abstract not provided.
Abstract not provided.
2024 IEEE Neuro Inspired Computational Elements Conference, NICE 2024 - Proceedings
Neuromorphic computing systems have been used for the processing of spatiotemporal video-like data, requiring the use of recurrent networks, while attempting to minimize power consumption by utilizing binary activation functions. However, previous work on binary activation networks has primarily focused on training of feed-forward networks due to difficulties in training recurrent binary networks. Spiking neural networks however have been successfully trained in recurrent networks, despite the fact that they operate with binary communication. Intrigued by this discrepancy, we design a generalized leaky-integrate and fire neuron which can be deconstructed to a binary activation unit, allowing us to investigate the minimal dynamics from a spiking network that are required to allow binary activation networks to be trained. We find that a subthreshold integrative membrane potential is the only requirement to allow an otherwise standard binary activation unit to be trained in a recurrent network. Investigating further the trained networks, we find that these stateful binary networks learn a soft reset mechanism by recurrent weights, allowing them to approximate the explicit reset of spiking networks.
Abstract not provided.
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
The domain wall-magnetic tunnel junction (DW-MTJ) is a versatile device that can simultaneously store data and perform computations. These three-terminal devices are promising for digital logic due to their nonvolatility, low-energy operation, and radiation hardness. Here, we augment the DW-MTJ logic gate with voltage-controlled magnetic anisotropy (VCMA) to improve the reliability of logical concatenation in the presence of realistic process variations. VCMA creates potential wells that allow for reliable and repeatable localization of domain walls (DWs). The DW-MTJ logic gate supports different fanouts, allowing for multiple inputs and outputs for a single device without affecting the area. We simulate a systolic array of DW-MTJ multiply-accumulate (MAC) units with 4-bit and 8-bit precision, which uses the nonvolatility of DW-MTJ logic gates to enable fine-grained pipelining and high parallelism. The DW-MTJ systolic array provides comparable throughput and efficiency to state-of-the-art CMOS systolic arrays while being radiation-hard. These results improve the feasibility of using DW-based processors, especially for extreme-environment applications such as space.
Abstract not provided.
Abstract not provided.
This project evaluated the use of emerging spintronic memory devices for robust and efficient variational inference schemes. Variational inference (VI) schemes, which constrain the distribution for each weight to be a Gaussian distribution with a mean and standard deviation, are a tractable method for calculating posterior distributions of weights in a Bayesian neural network such that this neural network can also be trained using the powerful backpropagation algorithm. Our project focuses on domain-wall magnetic tunnel junctions (DW-MTJs), a powerful multi-functional spintronic synapse design that can achieve low power switching while also opening the pathway towards repeatable, analog operation using fabricated notches. Our initial efforts to employ DW-MTJs as an all-in-one stochastic synapse with both a mean and standard deviation didn’t end up meeting the quality metrics for hardware-friendly VI. In the future, new device stacks and methods for expressive anisotropy modification may make this idea still possible. However, as a fall back that immediately satisfies our requirements, we invented and detailed how the combination of a DW-MTJ synapse encoding the mean and a probabilistic Bayes-MTJ device, programmed via a ferroelectric or ionically modifiable layer, can robustly and expressively implement VI. This design includes a physics-informed small circuit model, that was scaled up to perform and demonstrate rigorous uncertainty quantification applications, up to and including small convolutional networks on a grayscale image classification task, and larger (Residual) networks implementing multi-channel image classification. Lastly, as these results and ideas all depend upon the idea of an inference application where weights (spintronic memory states) remain non-volatile, the retention of these synapses for the notched case was further interrogated. These investigations revealed and emphasized the importance of both notch geometry and anisotropy modification in order to further enhance the endurance of written spintronic states. In the near future, these results will be mapped to effective predictions for room temperature and elevated operation DW-MTJ memory retention, and experimentally verified when devices become available.
Nature Communications
CMOS-based computing systems that employ the von Neumann architecture are relatively limited when it comes to parallel data storage and processing. In contrast, the human brain is a living computational signal processing unit that operates with extreme parallelism and energy efficiency. Although numerous neuromorphic electronic devices have emerged in the last decade, most of them are rigid or contain materials that are toxic to biological systems. In this work, we report on biocompatible bilayer graphene-based artificial synaptic transistors (BLAST) capable of mimicking synaptic behavior. The BLAST devices leverage a dry ion-selective membrane, enabling long-term potentiation, with ~50 aJ/µm2 switching energy efficiency, at least an order of magnitude lower than previous reports on two-dimensional material-based artificial synapses. The devices show unique metaplasticity, a useful feature for generalizable deep neural networks, and we demonstrate that metaplastic BLASTs outperform ideal linear synapses in classic image classification tasks. With switching energy well below the 1 fJ energy estimated per biological synapse, the proposed devices are powerful candidates for bio-interfaced online learning, bridging the gap between artificial and biological neural networks.
Abstract not provided.
Abstract not provided.
Neural networks are largely based on matrix computations. During forward inference, the most heavily used compute kernel is the matrix-vector multiplication (MVM): $W \vec{x} $. Inference is a first frontier for the deployment of next-generation hardware for neural network applications, as it is more readily deployed in edge devices, such as mobile devices or embedded processors with size, weight, and power constraints. Inference is also easier to implement in analog systems than training, which has more stringent device requirements. The main processing kernel used during inference is the MVM.
IEEE Transactions on Circuits and Systems I: Regular Papers
We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-To-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a > 10× gain in energy efficiency over state-of-The-Art digital and analog inference accelerators.
IEEE Transactions on Nuclear Science
We investigate the sensitivity of silicon-oxide-nitride-silicon-oxide (SONOS) charge trapping memory technology to heavy-ion induced single-event effects. Threshold voltage ( V_T ) statistics were collected across multiple test chips that contained in total 18 Mb of 40-nm SONOS memory arrays. The arrays were irradiated with Kr and Ar ion beams, and the changes in their V_T distributions were analyzed as a function of linear energy transfer (LET), beam fluence, and operating temperature. We observe that heavy ion irradiation induces a tail of disturbed devices in the 'program' state distribution, which has also been seen in the response of floating-gate (FG) flash cells. However, the V_T distribution of SONOS cells lacks a distinct secondary peak, which is generally attributed to direct ion strikes to the gate-stack of FG cells. This property, combined with the observed change in the V_T distribution with LET, suggests that SONOS cells are not particularly sensitive to direct ion strikes but cells in the proximity of an ion's absorption can still experience a V_T shift. These results shed new light on the physical mechanisms underlying the V_T shift induced by a single heavy ion in scaled charge trap memory.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proceedings - 2022 IEEE International Conference on Rebooting Computing, ICRC 2022
Non-volatile memory arrays require select devices to ensure accurate programming. The one-selector one-resistor (1S1R) array where a two-terminal nonlinear select device is placed in series with a resistive memory element is attractive due to its high-density data storage; however, the effect of the nonlinear select device on the accuracy of analog in-memory computing has not been explored. This work evaluates the impact of select and memory device properties on the results of analog matrix-vector multiplications. We integrate nonlinear circuit simulations into CrossSim and perform end-to-end neural network inference simulations to study how the select device affects the accuracy of neural network inference. We propose an adjustment to the input voltage that can effectively compensate for the electrical load of the select device. Our results show that for deep residual networks trained on CIFAR-10, a compensation that is uniform across all devices in the system can mitigate these effects over a wide range of values for the select device I-V steepness and memory device On/Off ratio. A realistic I-V curve steepness of 60 mV/dec can yield an accuracy on CIFAR-10 that is within 0.44% of the floating-point accuracy.
Abstract not provided.
Semiconductor Science and Technology
To support the increasing demands for efficient deep neural network processing, accelerators based on analog in-memory computation of matrix multiplication have recently gained significant attention for reducing the energy of neural network inference. However, analog processing within memory arrays must contend with the issue of parasitic voltage drops across the metal interconnects, which distort the results of the computation and limit the array size. This work analyzes how parasitic resistance affects the end-to-end inference accuracy of state-of-the-art convolutional neural networks, and comprehensively studies how various design decisions at the device, circuit, architecture, and algorithm levels affect the system's sensitivity to parasitic resistance effects. A set of guidelines are provided for how to design analog accelerator hardware that is intrinsically robust to parasitic resistance, without any explicit compensation or re-training of the network parameters.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Applied Physics Letters
Inspired by the parallelism and efficiency of the brain, several candidates for artificial synapse devices have been developed for neuromorphic computing, yet a nonlinear and asymmetric synaptic response curve precludes their use for backpropagation, the foundation of modern supervised learning. Spintronic devices - which benefit from high endurance, low power consumption, low latency, and CMOS compatibility - are a promising technology for memory, and domain-wall magnetic tunnel junction (DW-MTJ) devices have been shown to implement synaptic functions such as long-term potentiation and spike-timing dependent plasticity. In this work, we propose a notched DW-MTJ synapse as a candidate for supervised learning. Using micromagnetic simulations at room temperature, we show that notched synapses ensure the non-volatility of the synaptic weight and allow for highly linear, symmetric, and reproducible weight updates using either spin transfer torque (STT) or spin-orbit torque (SOT) mechanisms of DW propagation. We use lookup tables constructed from micromagnetics simulations to model the training of neural networks built with DW-MTJ synapses on both the MNIST and Fashion-MNIST image classification tasks. Accounting for thermal noise and realistic process variations, the DW-MTJ devices achieve classification accuracy close to ideal floating-point updates using both STT and SOT devices at room temperature and at 400 K. Our work establishes the basis for a magnetic artificial synapse that can eventually lead to hardware neural networks with fully spintronic matrix operations implementing machine learning.
IEEE Transactions on Nuclear Science
We evaluate the sensitivity of neuromorphic inference accelerators based on silicon-oxide-nitride-oxide-silicon (SONOS) charge trap memory arrays to total ionizing dose (TID) effects. Data retention statistics were collected for 16 Mbit of 40-nm SONOS digital memory exposed to ionizing radiation from a Co-60 source, showing good retention of the bits up to the maximum dose of 500 krad(Si). Using this data, we formulate a rate-equation-based model for the TID response of trapped charge carriers in the ONO stack and predict the effect of TID on intermediate device states between 'program' and 'erase.' This model is then used to simulate arrays of low-power, analog SONOS devices that store 8-bit neural network weights and support in situ matrix-vector multiplication. We evaluate the accuracy of the irradiated SONOS-based inference accelerator on two image recognition tasks - CIFAR-10 and the challenging ImageNet data set - using state-of-the-art convolutional neural networks, such as ResNet-50. We find that across the data sets and neural networks evaluated, the accelerator tolerates a maximum TID between 10 and 100 krad(Si), with deeper networks being more susceptible to accuracy losses due to TID.
IEEE Transactions on Nuclear Science
We evaluate the resilience of CoFeB/MgO/CoFeB magnetic tunnel junctions (MTJs) with perpendicular magnetic anisotropy (PMA) to displacement damage induced by heavy-ion irradiation. MTJs were exposed to 3-MeV Ta2+ ions at different levels of ion beam fluence spanning five orders of magnitude. The devices remained insensitive to beam fluences up to $10^{11}$ ions/cm2, beyond which a gradual degradation in the device magnetoresistance, coercive magnetic field, and spin-transfer-torque (STT) switching voltage were observed, ending with a complete loss of magnetoresistance at very high levels of displacement damage (>0.035 displacements per atom). The loss of magnetoresistance is attributed to structural damage at the MgO interfaces, which allows electrons to scatter among the propagating modes within the tunnel barrier and reduces the net spin polarization. Ion-induced damage to the interface also reduces the PMA. This study clarifies the displacement damage thresholds that lead to significant irreversible changes in the characteristics of STT magnetic random access memory (STT-MRAM) and elucidates the physical mechanisms underlying the deterioration in device properties.
Frontiers in Neuroscience (Online)
In-memory computing based on non-volatile resistive memory can significantly improve the energy efficiency of artificial neural networks. However, accurate in situ training has been challenging due to the nonlinear and stochastic switching of the resistive memory elements. One promising analog memory is the electrochemical random-access memory (ECRAM), also known as the redox transistor. Its low write currents and linear switching properties across hundreds of analog states enable accurate and massively parallel updates of a full crossbar array, which yield rapid and energy-efficient training. While simulations predict that ECRAM based neural networks achieve high training accuracy at significantly higher energy efficiency than digital implementations, these predictions have not been experimentally achieved. In this work, we train a 3 × 3 array of ECRAM devices that learns to discriminate several elementary logic gates (AND, OR, NAND). We record the evolution of the network’s synaptic weights during parallel in situ (on-line) training, with outer product updates. Due to linear and reproducible device switching characteristics, our crossbar simulations not only accurately simulate the epochs to convergence, but also quantitatively capture the evolution of weights in individual devices. The implementation of the first in situ parallel training together with strong agreement with simulation results provides a significant advance toward developing ECRAM into larger crossbar arrays for artificial neural network accelerators, which could enable orders of magnitude improvements in energy efficiency of deep neural networks.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proceedings - International Symposium on High-Performance Computer Architecture
Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.
Proceedings - International Symposium on High-Performance Computer Architecture
Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Applied Physics Reviews
Analog hardware accelerators, which perform computation within a dense memory array, have the potential to overcome the major bottlenecks faced by digital hardware for data-heavy workloads such as deep learning. Exploiting the intrinsic computational advantages of memory arrays, however, has proven to be challenging principally due to the overhead imposed by the peripheral circuitry and due to the non-ideal properties of memory devices that play the role of the synapse. We review the existing implementations of these accelerators for deep supervised learning, organizing our discussion around the different levels of the accelerator design hierarchy, with an emphasis on circuits and architecture. We explore and consolidate the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlight the key design trade-offs underlying these techniques.
Abstract not provided.
Neuromorphic architectures have seen a resurgence of interest in the past decade owing to 100x-1000x efficiency gain over conventional Von Neumann architectures. Digital neuromorphic chips like Intel's Loihi have shown efficiency gains compared to GPUs and CPUs and can be scaled to build larger systems. Analog neuromorphic architectures promise even further savings in energy efficiency, area, and latency than their digital counterparts. Neuromorphic analog and digital technologies provide both low-power and configurable acceleration of challenging artificial intelligence (AI) algorithms. We present a hybrid analog-digital neuromorphic architecture that can amplify the advantages of both high-density analog memory and spike-based digital communication while mitigating each of the other approaches' limitations.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
IEEE International Reliability Physics Symposium Proceedings
Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proceedings - IEEE International Symposium on Circuits and Systems
Machine learning implements backpropagation via abundant training samples. We demonstrate a multi-stage learning system realized by a promising non-volatile memory device, the domain-wall magnetic tunnel junction (DW-MTJ). The system consists of unsupervised (clustering) as well as supervised sub-systems, and generalizes quickly (with few samples). We demonstrate interactions between physical properties of this device and optimal implementation of neuroscience-inspired plasticity learning rules, and highlight performance on a suite of tasks. Our energy analysis confirms the value of the approach, as the learning budget stays below 20µJ even for large tasks used typically in machine learning.
Abstract not provided.
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
The domain-wall (DW)-magnetic tunnel junction (MTJ) device implements universal Boolean logic in a manner that is naturally compact and cascadable. However, an evaluation of the energy efficiency of this emerging technology for standard logic applications is still lacking. In this article, we use a previously developed compact model to construct and benchmark a 32-bit adder entirely from DW-MTJ devices that communicates with DW-MTJ registers. The results of this large-scale design and simulation indicate that while the energy cost of systems driven by spin-Transfer torque (STT) DW motion is significantly higher than previously predicted, the same concept using spin-orbit torque (SOT) switching benefits from an improvement in the energy per operation by multiple orders of magnitude, attaining competitive energy values relative to a comparable CMOS subprocessor component. This result clarifies the path toward practical implementations of an all-magnetic processor system.