The ASC program seeks to use machine learning to improve efficiencies in its stockpile stewardship mission. Moreover, there is a growing market for technologies dedicated to accelerating AI workloads. Many of these emerging architectures promise to provide savings in energy efficiency, area, and latency when compared to traditional CPUs for these types of applications — neuromorphic analog and digital technologies provide both low-power and configurable acceleration of challenging artificial intelligence (AI) algorithms. If designed into a heterogeneous system with other accelerators and conventional compute nodes, these technologies have the potential to augment the capabilities of traditional High Performance Computing (HPC) platforms [5]. This expanded computation space requires not only a new approach to physics simulation, but the ability to evaluate and analyze next-generation architectures specialized for AI/ML workloads in both traditional HPC and embedded ND applications. Developing this capability will enable ASC to understand how this hardware performs in both HPC and ND environments, improve our ability to port our applications, guide the development of computing hardware, and inform vendor interactions, leading them toward solutions that address ASC’s unique requirements.
This project evaluated the use of emerging spintronic memory devices for robust and efficient variational inference schemes. Variational inference (VI) schemes, which constrain the distribution for each weight to be a Gaussian distribution with a mean and standard deviation, are a tractable method for calculating posterior distributions of weights in a Bayesian neural network such that this neural network can also be trained using the powerful backpropagation algorithm. Our project focuses on domain-wall magnetic tunnel junctions (DW-MTJs), a powerful multi-functional spintronic synapse design that can achieve low power switching while also opening the pathway towards repeatable, analog operation using fabricated notches. Our initial efforts to employ DW-MTJs as an all-in-one stochastic synapse with both a mean and standard deviation didn’t end up meeting the quality metrics for hardware-friendly VI. In the future, new device stacks and methods for expressive anisotropy modification may make this idea still possible. However, as a fall back that immediately satisfies our requirements, we invented and detailed how the combination of a DW-MTJ synapse encoding the mean and a probabilistic Bayes-MTJ device, programmed via a ferroelectric or ionically modifiable layer, can robustly and expressively implement VI. This design includes a physics-informed small circuit model, that was scaled up to perform and demonstrate rigorous uncertainty quantification applications, up to and including small convolutional networks on a grayscale image classification task, and larger (Residual) networks implementing multi-channel image classification. Lastly, as these results and ideas all depend upon the idea of an inference application where weights (spintronic memory states) remain non-volatile, the retention of these synapses for the notched case was further interrogated. These investigations revealed and emphasized the importance of both notch geometry and anisotropy modification in order to further enhance the endurance of written spintronic states. In the near future, these results will be mapped to effective predictions for room temperature and elevated operation DW-MTJ memory retention, and experimentally verified when devices become available.
Analog computing has been widely proposed to improve the energy efficiency of multiple important workloads including neural network operations, and other linear algebra kernels. To properly evaluate analog computing and explore more complex workloads such as systems consisting of multiple analog data paths, system level simulations are required. Moreover, prior work on system architectures for analog computing often rely on custom simulators creating signficant additional design effort and complicating comparisons between different systems. To remedy these issues, this report describes the design and implementation of a flexible tile-based analog accelerator element for the Structural Simulation Toolkit (SST). The element focuses on heavily on the tile controller—an often neglected aspect of prior work—that is sufficiently versatile to simulate a wide range of different tile operations including neural network layers, signal processing kernels, and generic linear algebra operations without major constraints. The tile model also interoperates with existing SST memory and network models to reduce the overall development load and enable future simulation of heterogeneous systems with both conventional digital logic and analog compute tiles. Finally, both the tile and array models are designed to easily support future extensions as new analog operations and applications that can benefit from analog computing are developed.
Neural networks are largely based on matrix computations. During forward inference, the most heavily used compute kernel is the matrix-vector multiplication (MVM): $W \vec{x} $. Inference is a first frontier for the deployment of next-generation hardware for neural network applications, as it is more readily deployed in edge devices, such as mobile devices or embedded processors with size, weight, and power constraints. Inference is also easier to implement in analog systems than training, which has more stringent device requirements. The main processing kernel used during inference is the MVM.
We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-To-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a > 10× gain in energy efficiency over state-of-The-Art digital and analog inference accelerators.
To support the increasing demands for efficient deep neural network processing, accelerators based on analog in-memory computation of matrix multiplication have recently gained significant attention for reducing the energy of neural network inference. However, analog processing within memory arrays must contend with the issue of parasitic voltage drops across the metal interconnects, which distort the results of the computation and limit the array size. This work analyzes how parasitic resistance affects the end-to-end inference accuracy of state-of-the-art convolutional neural networks, and comprehensively studies how various design decisions at the device, circuit, architecture, and algorithm levels affect the system's sensitivity to parasitic resistance effects. A set of guidelines are provided for how to design analog accelerator hardware that is intrinsically robust to parasitic resistance, without any explicit compensation or re-training of the network parameters.
Integration-technology feature shrink increases computing-system susceptibility to single-event effects (SEE). While modeling SEE faults will be critical, an integrated processor's scope makes physically correct modeling computationally intractable. Without useful models, presilicon evaluation of fault-tolerance approaches becomes impossible. To incorporate accurate transistor-level effects at a system scope, we present a multiscale simulation framework. Charge collection at the 1) device level determines 2) circuit-level transient duration and state-upset likelihood. Circuit effects, in turn, impact 3) register-transfer-level architecture-state corruption visible at 4) the system level. Thus, the physically accurate effects of SEEs in large-scale systems, executed on a high-performance computing (HPC) simulator, could be used to drive cross-layer radiation hardening by design. We demonstrate the capabilities of this model with two case studies. First, we determine a D flip-flop's sensitivity at the transistor level on 14-nm FinFet technology, validating the model against published cross sections. Second, we track and estimate faults in a microprocessor without interlocked pipelined stages (MIPS) processor for Adams 90% worst case environment in an isotropic space environment.
We evaluate the sensitivity of neuromorphic inference accelerators based on silicon-oxide-nitride-oxide-silicon (SONOS) charge trap memory arrays to total ionizing dose (TID) effects. Data retention statistics were collected for 16 Mbit of 40-nm SONOS digital memory exposed to ionizing radiation from a Co-60 source, showing good retention of the bits up to the maximum dose of 500 krad(Si). Using this data, we formulate a rate-equation-based model for the TID response of trapped charge carriers in the ONO stack and predict the effect of TID on intermediate device states between 'program' and 'erase.' This model is then used to simulate arrays of low-power, analog SONOS devices that store 8-bit neural network weights and support in situ matrix-vector multiplication. We evaluate the accuracy of the irradiated SONOS-based inference accelerator on two image recognition tasks - CIFAR-10 and the challenging ImageNet data set - using state-of-the-art convolutional neural networks, such as ResNet-50. We find that across the data sets and neural networks evaluated, the accelerator tolerates a maximum TID between 10 and 100 krad(Si), with deeper networks being more susceptible to accuracy losses due to TID.
This presentation concludes in situ computation enables new approaches to linear algebra problems which can be both more effective and more efficient as compared to conventional digital systems. Preconditioning is well-suited to analog computation due to the tolerance for approximate solutions. When combined with prior work on in situ MVM for scientific computing, analog preconditioning can enable significant speedups for important linear algebra applications.
Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.
Analog hardware accelerators, which perform computation within a dense memory array, have the potential to overcome the major bottlenecks faced by digital hardware for data-heavy workloads such as deep learning. Exploiting the intrinsic computational advantages of memory arrays, however, has proven to be challenging principally due to the overhead imposed by the peripheral circuitry and due to the non-ideal properties of memory devices that play the role of the synapse. We review the existing implementations of these accelerators for deep supervised learning, organizing our discussion around the different levels of the accelerator design hierarchy, with an emphasis on circuits and architecture. We explore and consolidate the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlight the key design trade-offs underlying these techniques.
An open question in the metal hydride community is whether there are simple, physics-based design rules that dictate the thermodynamic properties of these materials across the variety of structures and chemistry they can exhibit. While black box machine learning-based algorithms can predict these properties with some success, they do not directly provide the basis on which these predictions are made, therefore complicating the a priori design of novel materials exhibiting a desired property value. In this work we demonstrate how feature importance, as identified by a gradient boosting tree regressor, uncovers the strong dependence of the metal hydride equilibrium H2 pressure on a volume-based descriptor that can be computed from just the elemental composition of the intermetallic alloy. Elucidation of this simple structure-property relationship is valid across a range of compositions, metal substitutions, and structural classes exhibited by intermetallic hydrides. This permits rational targeting of novel intermetallics for high-pressure hydrogen storage (low-stability hydrides) by their descriptor values, and we predict a known intermetallic to form a low-stability hydride (as confirmed by density functional theory calculations) that has not yet been experimentally investigated.
Neuromorphic computers based on analogue neural networks aim to substantially lower computing power by reducing the need to shuttle data between memory and logic units. Artificial synapses containing nonvolatile analogue conductance states enable direct computation using memory elements; however, most nonvolatile analogue memories require high write voltages and large current densities and are accompanied by nonlinear and unpredictable weight updates. Here, we develop an inorganic redox transistor based on electrochemical lithium-ion insertion into LiXTiO2 that displays linear weight updates at both low current densities and low write voltages. The write voltage, as low as 200 mV at room temperature, is achieved by minimizing the open-circuit voltage and using a low-voltage diffusive memristor selector. We further show that the LiXTiO2 redox transistor can achieve an extremely sharp transistor subthreshold slope of just 40 mV/decade when operating in an electrochemically driven phase transformation regime.
Scaling arrays of non-volatile memory devices from academic demonstrations to reliable, manufacturable systems requires a better understanding of variability at array and wafer-scale levels. CrossSim models the accuracy of neural networks implemented on an analog resistive memory accelerator using the cycle-to-cycle variability of a single device. In this work, we extend this modeling tool to account for device-to-device variation in a realistic way, and evaluate the impact of this reliability issue in the context of neuromorphic online learning tasks.
Analog crossbars have the potential to reduce the energy and latency required to train a neural network by three orders of magnitude when compared to an optimized digital ASIC. The crossbar simulator, CrossSim, can be used to model device nonidealities and determine what device properties are needed to create an accurate neural network accelerator. Experimentally measured device statistics are used to simulate neural network training accuracy and compare different classes of devices including TaOx ReRAM, Lir-Co-Oz devices, and conventional floating gate SONOS memories. A technique called 'Periodic Carry' can overcomes device nonidealities by using a positional number system while maintaining the benefit of parallel analog matrix operations.
Neuromorphic computers could overcome efficiency bottlenecks inherent to conventional computing through parallel programming and readout of artificial neural network weights in a crossbar memory array. However, selective and linear weight updates and <10-nanoampere read currents are required for learning that surpasses conventional computing efficiency. We introduce an ionic floating-gate memory array based on a polymer redox transistor connected to a conductive-bridge memory (CBM). Selective and linear programming of a redox transistor array is executed in parallel by overcoming the bridging threshold voltage of the CBMs. Synaptic weight readout with currents <10 nanoamperes is achieved by diluting the conductive polymer with an insulator to decrease the conductance. The redox transistors endure >1 billion write-read operations and support >1-megahertz write-read frequencies.
Emerging memory devices, such as resistive crossbars, have the capacity to store large amounts of data in a single array. Acquiring the data stored in large-capacity crossbars in a sequential fashion can become a bottleneck. We present practical methods, based on sparse sampling, to quickly acquire sparse data stored on emerging memory devices that support the basic summation kernel, reducing the acquisition time from linear to sub-linear. The experimental results show that at least an order of magnitude improvement in acquisition time can be achieved when the data are sparse. In addition, we show that the energy cost associated with our approach is competitive to that of the sequential method.
The image classification accuracy of a TaOx ReRAM-based neuromorphic computing accelerator is evaluated after intentionally inducing a displacement damage up to a fluence of 1014 2.5-MeV Si ions/cm2 on the analog devices that are used to store weights. Results are consistent with a radiation-induced oxygen vacancy production mechanism. When the device is in the high-resistance state during heavy ion radiation, the device resistance, linearity, and accuracy after training are only affected by high fluence levels. The findings in this paper are in accordance with the results of previous studies on TaOx-based digital resistive random access memory. When the device is in the low-resistance state during irradiation, no resistance change was detected, but devices with a 4-kΩ inline resistor did show a reduction in accuracy after training at 1014 2.5-MeV Si ions/cm2. This indicates that changes in resistance can only be somewhat correlated with changes to devices' analog properties. This paper demonstrates that TaOx devices are radiation tolerant not only for high radiation environment digital memory applications but also when operated in an analog mode suitable for neuromorphic computation and training on new data sets.
Electronic synaptic devices are important building blocks for neuromorphic computational systems that can go beyond the constraints of von Neumann architecture. Although two-terminal memristive devices are demonstrated to be possible candidates, they suffer from several shortcomings related to the filament formation mechanism including nonlinear switching, write noise, and high device conductance, all of which limit the accuracy and energy efficiency. Electrochemical three-terminal transistors, in which the channel conductance can be tuned without filament formation provide an alternative platform for synaptic electronics. Here, an all-solid-state electrochemical transistor made with Li ion–based solid dielectric and 2D α-phase molybdenum oxide (α-MoO3) nanosheets as the channel is demonstrated. These devices achieve nonvolatile conductance modulation in an ultralow conductance regime (<75 nS) by reversible intercalation of Li ions into the α-MoO3 lattice. Based on this operating mechanism, the essential functionalities of synapses, such as short- and long-term synaptic plasticity and bidirectional near-linear analog weight update are demonstrated. Simulations using the handwritten digit data sets demonstrate high recognition accuracy (94.1%) of the synaptic transistor arrays. These results provide an insight into the application of 2D oxides for large-scale, energy-efficient neuromorphic computing networks.
Resistive memory (ReRAM) shows promise for use as an analog synapse element in energy-efficient neural network algorithm accelerators. A particularly important application is the training of neural networks, as this is the most computationally-intensive procedure in using a neural algorithm. However, training a network with analog ReRAM synapses can significantly reduce the accuracy at the algorithm level. In order to assess this degradation, analog properties of ReRAM devices were measured and hand-written digit recognition accuracy was modeled for the training using backpropagation. Bipolar filamentary devices utilizing three material systems were measured and compared: one oxygen vacancy system, Ta-TaOx, and two conducting metallization systems, Cu-SiO2, and Ag/chalcogenide. Analog properties and conductance ranges of the devices are optimized by measuring the response to varying voltage pulse characteristics. Key analog device properties which degrade the accuracy are update linearity and write noise. Write noise may improve as a function of device manufacturing maturity, but write nonlinearity appears relatively consistent among the different device material systems and is found to be the most significant factor affecting accuracy. As a result, this suggests that new materials and/or fundamentally different resistive switching mechanisms may be required to improve device linearity and achieve higher algorithm training accuracy.
With the end of Dennard scaling and the ever-increasing need for more efficient, faster computation, resistive switching devices (ReRAM), often referred to as memristors, are a promising candidate for next generation computer hardware. These devices show particular promise for use in an analog neuromorphic computing accelerator as they can be tuned to multiple states and be updated like the weights in neuromorphic algorithms. Modeling a ReRAM-based neuromorphic computing accelerator requires a compact model capable of correctly simulating the small weight update behavior associated with neuromorphic training. These small updates have a nonlinear dependence on the initial state, which has a significant impact on neural network training. Consequently, we propose the piecewise empirical model (PEM), an empirically derived general purpose compact model that can accurately capture the nonlinearity of an arbitrary two-terminal device to match pulse measurements important for neuromorphic computing applications. By defining the state of the device to be proportional to its current, the model parameters can be extracted from a series of voltages pulses that mimic the behavior of a device in an analog neuromorphic computing accelerator. This allows for a general, accurate, and intuitive compact circuit model that is applicable to different resistance-switching device technologies. In this work, we explain the details of the model, implement the model in the circuit simulator Xyce, and give an example of its usage to model a specific Ta / TaO x device.
Resistive memory (ReRAM) shows promise for use as an analog synapse element in energy-efficient neural network algorithm accelerators. A particularly important application is the training of neural networks, as this is the most computationally-intensive procedure in using a neural algorithm. However, training a network with analog ReRAM synapses can significantly reduce the accuracy at the algorithm level. In order to assess this degradation, analog properties of ReRAM devices were measured and hand-written digit recognition accuracy was modeled for the training using backpropagation. Bipolar filamentary devices utilizing three material systems were measured and compared: one oxygen vacancy system, Ta-TaOx, and two conducting metallization systems, Cu-SiO2, and Ag/chalcogenide. Analog properties and conductance ranges of the devices are optimized by measuring the response to varying voltage pulse characteristics. Key analog device properties which degrade the accuracy are update linearity and write noise. Write noise may improve as a function of device manufacturing maturity, but write nonlinearity appears relatively consistent among the different device material systems and is found to be the most significant factor affecting accuracy. This suggests that new materials and/or fundamentally different resistive switching mechanisms may be required to improve device linearity and achieve higher algorithm training accuracy.
The goal of this LDRD is to develop a quantum nanophotonics capability that will allow practical control over electron (hole) and photon confinement in more than one dimension. We plan to use quantum dots (QDs) to control electrons, and photonic crystals to control photons. InGaN QDs will be fabricated using quantum size control processes, and methods will be developed to add epitaxial layers for hole injection and surface passivation. We will also explore photonic crystal nanofabrication techniques using both additive and subtractive fabrication processes, which can tailor photonic crystal properties. These two efforts will be combined by incorporating the QDs into photonic crystal surface emitting lasers (PCSELs). Modeling will be performed using finite-different time-domain and gain analysis to optimize QD-PCSEL designs that balance laser performance with the ability to nano-fabricate structures. Finally, we will develop design rules for QD-PCSEL architectures, to understand their performance possibilities and limits.
Analog resistive memories promise to reduce the energy of neural networks by orders of magnitude. However, the write variability and write nonlinearity of current devices prevent neural networks from training to high accuracy. We present a novel periodic carry method that uses a positional number system to overcome this while maintaining the benefit of parallel analog matrix operations. We demonstrate how noisy, nonlinear TaOx devices that could only train to 80% accuracy on MNIST, can now reach 97% accuracy, only 1% away from an ideal numeric accuracy of 98%. On a file type dataset, the TaOx devices achieve ideal numeric accuracy. In addition, low noise, linear Li1-xCoO2 devices train to ideal numeric accuracies using periodic carry on both datasets.
Resistive memory crossbars can dramatically reduce the energy required to perform computations in neural algorithms by three orders of magnitude when compared to an optimized digital ASIC [1]. For data intensive applications, the computational energy is dominated by moving data between the processor, SRAM, and DRAM. Analog crossbars overcome this by allowing data to be processed directly at each memory element. Analog crossbars accelerate three key operations that are the bulk of the computation in a neural network as illustrated in Fig 1: vector matrix multiplies (VMM), matrix vector multiplies (MVM), and outer product rank 1 updates (OPU)[2]. For an NxN crossbar the energy for each operation scales as the number of memory elements O(N2) [2]. This is because the crossbar performs its entire computation in one step, charging all the capacitances only once. Thus the CV2 energy of the array scales as array size. This fundamentally better than trying to read or write a digital memory. Each row of any NxN digital memory must be accessed one at a time, resulting in N columns of length O(N) being charged N times, requiring O(N3) energy to read a digital memory. Thus an analog crossbar has a fundamental O(N) energy scaling advantage over a digital system. Furthermore, if the read operation is done at low voltage and is therefore noise limited, the read energy can even be independent of the crossbar size, O(1) [2].
Parasitic resistances cause devices in a resistive memory array to experience different read/write voltages depending on the device location, resulting in uneven writes and larger leakage currents. We present a new method to compensate for this by adding extra series resistance to the drivers to equalize the parasitic resistance seen by all the devices. This allows for uniform writes, enabling multi-level cells with greater numbers of distinguishable levels, and reduced write power, enabling larger arrays.
The brain is capable of massively parallel information processing while consuming only ~1-100 fJ per synaptic event1,2. Inspired by the efficiency of the brain, CMOS-based neural architectures3 and memristors4,5 are being developed for pattern recognition and machine learning. However, the volatility, design complexity and high supply voltages for CMOS architectures, and the stochastic and energy-costly switching of memristors complicate the path to achieve the interconnectivity, information density, and energy efficiency of the brain using either approach. Here we describe an electrochemical neuromorphic organic device (ENODe) operating with a fundamentally different mechanism from existing memristors. ENODeswitches at lowvoltage and energy (<10 pJ for 103 μm2 devices), displays >500 distinct, non-volatile conductance states within a~1V range, and achieves high classification accuracy when implemented in neural network simulations. Plastic ENODes are also fabricated on flexible substrates enabling the integration of neuromorphic functionality in stretchable electronic systems6,7. Mechanical flexibility makes ENODes compatible with three-dimensional architectures, opening a path towards extreme interconnectivity comparable to the human brain.
We address practical limits of energy efficiency scaling for logic and memory. Scaling of logic will end with unreliable operation, making computers probabilistic as a side effect. The errors can be corrected or tolerated, but overhead will increase with further scaling. We address the tradeoff between scaling and error correction that yields minimum energy per operation, finding new error correction methods with energy consumption limits about 2× below current approaches. The maximum energy efficiency for memory depends on several other factors. Adiabatic and reversible methods applied to logic have promise, but overheads have precluded practical use. However, the regular array structure of memory arrays tends to reduce overhead and makes adiabatic memory a viable option. This paper reports an adiabatic memory that has been tested at about 85× improvement over standard designs for energy efficiency. Combining these approaches could set energy efficiency expectations for processor-in-memory computing systems.
Resistive memories enable dramatic energy reductions for neural algorithms. We propose a general purpose neural architecture that can accelerate many different algorithms and determine the device properties that will be needed to run backpropagation on the neural architecture. To maintain high accuracy, the read noise standard deviation should be less than 5% of the weight range. The write noise standard deviation should be less than 0.4% of the weight range and up to 300% of a characteristic update (for the datasets tested). Asymmetric nonlinearities in the change in conductance vs pulse cause weight decay and significantly reduce the accuracy, while moderate symmetric nonlinearities do not have an effect. In order to allow for parallel reads and writes the write current should be less than 100 nA as well.
Wide band gap semiconductors like AlN typically cannot be efficiently p-doped: acceptor levels are far from the valence band-edge, preventing holes from activating. This means that pn-junctions cannot be created, and the semiconductor is less useful, a particular problem for deep Ultraviolet (UV) optoelectronics.
The exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-based architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.
Tunneling field-effect transistors (TFETs) have been investigated as a low-voltage replacement for the conventional field-effect transistor with a turn-on response steeper than 60 mV/dec. However, to date no device has achieved a steep turn-on at low voltage with an on-off ratio of 106 or greater. Among the main issues is the finite density of states inside the semiconductor bandgap arising from a large concentration of interface defects [1]. Though these states do not directly conduct current, carriers can be trapped then thermally emitted to the conduction band in a trap-assisted tunneling process, broadening the switching response of the device. Overcoming these effects may not be feasible with current levels of material defects.
As transistors start to approach fundamental limits and Moore's law slows down, new devices and architectures are needed to enable continued performance gains. New approaches based on RRAM (resistive random access memory) or memristor crossbars can enable the processing of large amounts of data[1, 2]. One of the most promising applications for RRAM crossbars is brain inspired or neuromorphic computing[3, 4].
Tunneling Field Effect Transistors (TFETs) have the potential to achieve a low operating voltage by overcoming the thermally limited subthreshold swing of 60mV/decade, but results to date have been unsatisfying. Unfortunately, TFETs have only shown steep subthreshold swings at low currents of a nA/μm or lower while we would like a mA/μm. To understand this we need to consider the two switching mechanisms in a TFET. The gate voltage can be used to modulate the tunneling barrier thickness and thus the tunneling probability as shown Fig. 1(a). Alternatively, it is possible use energy filtering or density of states (DOS) switching as illustrated in Fig. 1(b). If the conduction and valence band don't overlap, no current can flow. Once they do overlap, current can flow.