IEEE Transactions on Circuits and Systems II: Express Briefs | Vol.66, Issue.3 | | Pages 477-481
Scaling Up In-Memory-Computing Classifiers via Boosted Feature Subsets in Banked Architectures
In-memory computing is an emerging approach for overcoming memory-accessing bottlenecks, by eliminating the costs of explicitly moving data from point of storage to point of computation outside the array. However, computation increases the dynamic range of signals, such that performing it via the existing structure of dense memory substantially squeezes the signal-to-noise ratio (SNR). In this brief, we explore how computations can be scaled up, to jointly optimize energy/latency/bandwidth gains with SNR requirements. We employ algorithmic techniques to decompose computations so that they can be mapped to multiple parallel memory banks operating at chosen optimal points. Specifically focusing on in-memory classification, we consider a custom IC in 130-nm CMOS IC and demonstrate an algorithm combining error-adaptive classifier boosting and multi-armed bandits, to enable segmentation of a feature vector into multiple subsets. The measured performance of 10-way MNIST digit classification, using images downsampled to <inline-formula> <tex-math notation="LaTeX">$16{times }16$ </tex-math></inline-formula> pixels (mapped across four separate banks), is 91%, close to that simulated using full unsegmented feature vectors. The energy per classification is 879.7 pJ, <inline-formula> <tex-math notation="LaTeX">$14.3{times }$ </tex-math></inline-formula> lower than that of a system based on separated memory and digital accelerator.
Original Text (This is the original text for your reference.)
Scaling Up In-Memory-Computing Classifiers via Boosted Feature Subsets in Banked Architectures
In-memory computing is an emerging approach for overcoming memory-accessing bottlenecks, by eliminating the costs of explicitly moving data from point of storage to point of computation outside the array. However, computation increases the dynamic range of signals, such that performing it via the existing structure of dense memory substantially squeezes the signal-to-noise ratio (SNR). In this brief, we explore how computations can be scaled up, to jointly optimize energy/latency/bandwidth gains with SNR requirements. We employ algorithmic techniques to decompose computations so that they can be mapped to multiple parallel memory banks operating at chosen optimal points. Specifically focusing on in-memory classification, we consider a custom IC in 130-nm CMOS IC and demonstrate an algorithm combining error-adaptive classifier boosting and multi-armed bandits, to enable segmentation of a feature vector into multiple subsets. The measured performance of 10-way MNIST digit classification, using images downsampled to <inline-formula> <tex-math notation="LaTeX">$16{times }16$ </tex-math></inline-formula> pixels (mapped across four separate banks), is 91%, close to that simulated using full unsegmented feature vectors. The energy per classification is 879.7 pJ, <inline-formula> <tex-math notation="LaTeX">$14.3{times }$ </tex-math></inline-formula> lower than that of a system based on separated memory and digital accelerator.
+More
10way mnist digit classification cmos computing full unsegmented feature segmentation of signals energy ic inmemory classification parallel memory banks energylatencybandwidth digital accelerator signaltonoise ratio multiarmed bandits algorithmic techniques erroradaptive classifier boosting dynamic range lttexmathgtltinlineformulagt
Select your report category*
Reason*
New sign-in location:
Last sign-in location:
Last sign-in date: