Publications

37 Results
Skip to search filters

Designing and modeling analog neural network training accelerators

2019 International Symposium on VLSI Technology, Systems and Application, VLSI-TSA 2019

Agarwal, Sapan A.; Jacobs-Gedrim, Robin B.; Bennett, Christopher H.; Hsia, Alexander W.; Adee, Shane M.; Hughart, David R.; Fuller, Elliot J.; Li, Yiyang; Talin, A.A.; Marinella, Matthew J.

Analog crossbars have the potential to reduce the energy and latency required to train a neural network by three orders of magnitude when compared to an optimized digital ASIC. The crossbar simulator, CrossSim, can be used to model device nonidealities and determine what device properties are needed to create an accurate neural network accelerator. Experimentally measured device statistics are used to simulate neural network training accuracy and compare different classes of devices including TaOx ReRAM, Lir-Co-Oz devices, and conventional floating gate SONOS memories. A technique called 'Periodic Carry' can overcomes device nonidealities by using a positional number system while maintaining the benefit of parallel analog matrix operations.

More Details

Piecewise empirical model (PEM) of resistive memory for pulsed analog and neuromorphic applications

Journal of Computational Electronics

Niroula, John N.; Agarwal, Sapan A.; Jacobs-Gedrim, Robin B.; Schiek, Richard L.; Hughart, David R.; Hsia, Alexander W.; James, Conrad D.; Marinella, Matthew J.

With the end of Dennard scaling and the ever-increasing need for more efficient, faster computation, resistive switching devices (ReRAM), often referred to as memristors, are a promising candidate for next generation computer hardware. These devices show particular promise for use in an analog neuromorphic computing accelerator as they can be tuned to multiple states and be updated like the weights in neuromorphic algorithms. Modeling a ReRAM-based neuromorphic computing accelerator requires a compact model capable of correctly simulating the small weight update behavior associated with neuromorphic training. These small updates have a nonlinear dependence on the initial state, which has a significant impact on neural network training. Consequently, we propose the piecewise empirical model (PEM), an empirically derived general purpose compact model that can accurately capture the nonlinearity of an arbitrary two-terminal device to match pulse measurements important for neuromorphic computing applications. By defining the state of the device to be proportional to its current, the model parameters can be extracted from a series of voltages pulses that mimic the behavior of a device in an analog neuromorphic computing accelerator. This allows for a general, accurate, and intuitive compact circuit model that is applicable to different resistance-switching device technologies. In this work, we explain the details of the model, implement the model in the circuit simulator Xyce, and give an example of its usage to model a specific Ta / TaO x device.

More Details

Achieving ideal accuracies in analog neuromorphic computing using periodic carry

Digest of Technical Papers - Symposium on VLSI Technology

Agarwal, Sapan A.; Jacobs-Gedrim, Robin B.; Hsia, Alexander W.; Hughart, David R.; Fuller, Elliot J.; Talin, A.A.; James, Conrad D.; Plimpton, Steven J.; Marinella, Matthew J.

Analog resistive memories promise to reduce the energy of neural networks by orders of magnitude. However, the write variability and write nonlinearity of current devices prevent neural networks from training to high accuracy. We present a novel periodic carry method that uses a positional number system to overcome this while maintaining the benefit of parallel analog matrix operations. We demonstrate how noisy, nonlinear TaOx devices that could only train to 80% accuracy on MNIST, can now reach 97% accuracy, only 1% away from an ideal numeric accuracy of 98%. On a file type dataset, the TaOx devices achieve ideal numeric accuracy. In addition, low noise, linear Li1-xCoO2 devices train to ideal numeric accuracies using periodic carry on both datasets.

More Details

Designing an analog crossbar based neuromorphic accelerator

2017 5th Berkeley Symposium on Energy Efficient Electronic Systems, E3S 2017 - Proceedings

Agarwal, Sapan A.; Hsia, Alexander W.; Jacobs-Gedrim, Robin B.; Hughart, David R.; Plimpton, Steven J.; James, Conrad D.; Marinella, Matthew J.

Resistive memory crossbars can dramatically reduce the energy required to perform computations in neural algorithms by three orders of magnitude when compared to an optimized digital ASIC [1]. For data intensive applications, the computational energy is dominated by moving data between the processor, SRAM, and DRAM. Analog crossbars overcome this by allowing data to be processed directly at each memory element. Analog crossbars accelerate three key operations that are the bulk of the computation in a neural network as illustrated in Fig 1: vector matrix multiplies (VMM), matrix vector multiplies (MVM), and outer product rank 1 updates (OPU)[2]. For an NxN crossbar the energy for each operation scales as the number of memory elements O(N2) [2]. This is because the crossbar performs its entire computation in one step, charging all the capacitances only once. Thus the CV2 energy of the array scales as array size. This fundamentally better than trying to read or write a digital memory. Each row of any NxN digital memory must be accessed one at a time, resulting in N columns of length O(N) being charged N times, requiring O(N3) energy to read a digital memory. Thus an analog crossbar has a fundamental O(N) energy scaling advantage over a digital system. Furthermore, if the read operation is done at low voltage and is therefore noise limited, the read energy can even be independent of the crossbar size, O(1) [2].

More Details

Resistive memory device requirements for a neural algorithm accelerator

Proceedings of the International Joint Conference on Neural Networks

Agarwal, Sapan A.; Plimpton, Steven J.; Hughart, David R.; Hsia, Alexander W.; Richter, Isaac; Cox, Jonathan A.; James, Conrad D.; Marinella, Matthew J.

Resistive memories enable dramatic energy reductions for neural algorithms. We propose a general purpose neural architecture that can accelerate many different algorithms and determine the device properties that will be needed to run backpropagation on the neural architecture. To maintain high accuracy, the read noise standard deviation should be less than 5% of the weight range. The write noise standard deviation should be less than 0.4% of the weight range and up to 300% of a characteristic update (for the datasets tested). Asymmetric nonlinearities in the change in conductance vs pulse cause weight decay and significantly reduce the accuracy, while moderate symmetric nonlinearities do not have an effect. In order to allow for parallel reads and writes the write current should be less than 100 nA as well.

More Details

High voltage with Si series photovoltaics

Patel, Rupal K.; Hsia, Alexander W.; Bennett, Reid S.

A monolithic crystalline Si photovoltaic device, developing a potential of 2,120 Volts, has been demonstrated. The monolithic device consists of 3600 small photovoltaic cells connected in series and fabricated using standard CMOS processing on SOI wafers. The SOI wafers with trenches etched to the buried oxide (BOX) depth are used for cell isolation. The photovoltaic cell is a Si pn junction device with the n surface region forming the front surface diffused region upon which light impinges. Contact is formed to the deeper diffused region at the cell edge. The p+ deep-diffused region forms the contact to the p-type base region. Base regions were 5 or 10 {micro}m thick. Series connection of individual cells is accomplished using standard CMOS interconnects. This allows for the voltage to range from approximately 0.5 Volts for a single cell to above a thousand volts for strings of thousands of cells. The current is determined by cell area. The voltage is limited by dielectric breakdown. Each cell is isolated from the adjacent cells through dielectric-filled trench isolation, the substrate through the SOI buried oxide, and the metal wiring by the deposited pre-metal dielectric. If any of these dielectrics fail (whether due to high electric fields or inherent defects), the photovoltaic device will not produce the desired potential. We have used ultra-thick buried oxide SOI and several novel processes, including an oxynitride trench fill process, to avoid dielectric breakdown.

More Details

High voltage series connected Si photovoltaic cells

Patel, Rupal K.; Stein, David J.; Hsia, Alexander W.; Bennett, Reid S.

This report describes the features of monolithic, series connected silicon (Si) photovoltaic (PV) cells which have been developed for applications requiring higher voltages than obtained with conventional single junction solar cells. These devices are intended to play a significant role in micro / mini firing systems and fuzing systems for DOE and DOD applications. They are also appropriate for other applications (such as micro-electro-mechanical-systems (MEMS) actuation as demonstrated by Bellew et. al.) where electric power is required in remote regions and electrical connection to the region is unavailable or deemed detrimental for whatever reason. Our monolithic device consists of a large number of small PV cells, combined in series and fabricated using standard CMOS processing on silicon-on-insulator (SOI) wafers with 0.4 to 3 micron thick buried oxide (BOX) and top Si thickness of 5 and 10 microns. Individual cell isolation is achieved using the BOX layer of the SOI wafer on the bottom. Isolation along the sides is produced by trenching the top Si and subsequently filling the trench by deposition of dielectric films such as oxide, silicon nitride, or oxynitride. Multiple electrically isolated PV cells are connected in series to produce voltages ranging from approximately 0.5 volts for a single cell to several thousands of volts for strings of thousands of cells.

More Details
37 Results
37 Results