Welcome to the IKCEST
Journal
IEEE Journal of Solid-State Circuits

IEEE Journal of Solid-State Circuits

Archives Papers: 1,008
IEEE Xplore
Please choose volume & issue:
A 28-nm 142-mW Motion-Control SoC for Autonomous Mobile Robots
I-Ting LinZih-Sing FuWen-Ching ChenLiang-Yi LinNian-Shyang ChangChun-Pin LinChi-Shi ChenChia-Hsiang Yang
Keywords:CostsTrajectory optimizationPhysicsMotion controlCost functionComputational modelingEnergy efficiencyResiliencePredictive modelsMobile robotsAutomated Guided VehiclesTime StepEnergy EfficiencyMotor ControlRobotic ArmMemory UsageTrajectory OptimizationClock FrequencyHardware AcceleratorsTechnology NodeWorkload BalanceMaximum Energy EfficiencyRandom NumberCost FunctionNumber Of StepsPower ConsumptionRandom GenerationGraphics Processing UnitSubmoduleFinal SequenceNumber Of Time StepsNoise SequenceRunning CostsState TrajectoriesMaximum WorkloadSequence MemoryParallel TrajectoriesTarget TrajectoryGradient-based AlgorithmUniform Random NumberAutonomous mobile robot (AMR)CMOS integrated circuitshardware acceleratormotion controlsampling-based trajectory optimization
Abstracts:Autonomous mobile robots (AMRs) have been proven useful in various applications. Motion control is essential for AMRs to adjust the trajectory, especially when AMRs are operated in a fast-changing environment. This work presents a motion-control system-on-chip (SoC) for AMRs that demand low response time and robust control. A sampling-based motion-control algorithm that enables highly parallel hardware acceleration is adopted. Trajectory pruning and physics model transformation are proposed to minimize the computational complexity. The SoC includes a trajectory optimization accelerator that consists of an array of ${4} {\times } {4}$ processing elements (PEs). The PE’s architecture is optimized for trajectory computations to reduce latency and memory usage. A network-on-chip (NoC) is designed for efficient data movements and workload balancing between PEs. An ARM Cortex-M3 microcontroller unit (MCU) is integrated into the SoC for system configurations and scheduling. Fabricated in a 28-nm CMOS technology, the chip has 3.56 mm2 core area. The chip dissipates 142 mW at a 200-MHz clock frequency from a 1.0-V supply. It achieves a 4935-Hz maximum motion-control rate for 130 trajectory time steps for a 7-degree-of-freedom (7-DoF) robot arm on an AMR. The SoC also delivers a 35-Hz/mW maximum energy efficiency. This work outperforms the state of the art, achieving a $22{\times }$ higher maximum motion-control rate and $35{\times }$ higher energy efficiency, at the same technology node.
A 40-nm 131-mW 6.4-Gb/s 256 × 32 Multi-User MIMO OTFS Detector for Next-Gen Communication Systems
Tang LeeTing-Yang ChenI-Hsuan LiuChia-Hsiang Yang
Keywords:DetectorsOFDMSymbolsDoppler effectModulationVectorsWireless communicationBit error rateInterferenceDetection algorithmsCommunication SystemsOrthogonal Time Frequency SpaceWirelessComputational ComplexityLow ComplexityStructure Of MatrixBlock DiagonalBit Error RateBit ErrorPart Of MatrixOrthogonal Frequency Division MultiplexingComputing UnitsMemory SizeUpdate StrategyGram MatrixDoppler SpreadVariance EstimatesChannel ModelSubmatrixDiscrete Fourier TransformBit Error Rate PerformanceDelay SpreadFactor GraphChannel CoefficientsInverse Discrete Fourier TransformSoft InformationIth Row Of MatrixTime-varying ChannelBit-widthReceived Signal VectorAlgorithm-architecture co-optimizationdigital integrated circuitmessage-passing (MP) detectormulti-user multi-input multi-output (MU-MIMO) systemorthogonal time frequency space (OTFS) modulation
Abstracts:High-mobility communication technology enables important applications in the near future. In such scenarios, the wireless channel exhibits high Doppler spread, which makes orthogonal frequency division multiplexing (OFDM) adopted in current wireless system suffer from severe inter-carrier interference (ICI). The orthogonal time frequency space (OTFS) technique is a promising modulation to address this issue. It demonstrates higher resilience to Doppler spread than OFDM in terms of bit error rate (BER) at the expense of higher detection complexity. This work presents the first high-throughput multi-user multi-input multi-output (MU-MIMO) detector for OTFS communication systems. A low-complexity message-passing (MP) detection algorithm is proposed to achieve 93% lower computational complexity by leveraging the structure of Gram matrix. A memory-efficient residual noise (RN) update scheme is devised to reduce the memory size for storing the partial interference by 94%. The proposed MP detector achieves a 60% reduction in latency by employing mean computation unit (MCU) and dual-mode multiplier. In addition, a 91% memory access reduction and an 89% memory size reduction in the channel memory bank are achieved, respectively, by leveraging layer ordering, partial Gram matrix saving, and block diagonal approximation. The chip supports up to 32 users, 256 receive antennas, and 256-QAM modulation. Fabricated in a 40-nm CMOS technology, the chip integrates 6.76 M gates in area of 6.47 mm2 and it delivers a maximal throughput of 6.4 Gb/s. The power consumption is 131 mW at 200 MHz from a 0.9 V supply. In comparison to state-of-the-art MU-MIMO OFDM detectors, this work achieves 3.3-to- $21.3{\times }$ higher maximal throughput and 2.2-to- $67.0{\times }$ lower normalized energy, in addition to higher resilience to Doppler spread.
VISTA: A Memory-Efficient CNN Processor for Video and Image Spatial/Temporal Interpolation Acceleration
Kai-Ping LinJia-Han LiuHong-Chuan LiaoJyun-Yi WuTong WuChao-Tsung Huang
Keywords:Frequency modulationConvolutional neural networksConvolutionComputational modelingInterpolationImage qualityComputational complexityStreaming mediaKernelImage reconstructionConvolutional Neural NetworkComputational ComplexityHigh-resolution ImagesImage QualityPeak Signal-to-noise RatioMultiple FramesTemporal ConsistencySearch WindowExternal MemoryComputational LogicDeformable ConvolutionStatic Random Access MemoryConvolutional LayersReference FrameSingle ImageIndividual ProcessesConvolution OperationConvolutional Neural Network ModelAlignment ModuleInput FramesTemporal OverlapBlock HeightHomogeneous ApproachAdjacent FramesSpatial OverlapSuper-resolution TaskGroup ConvolutionReconstruction ModuleAlgebraic sparsity (AS)cuboid-based layer fusion (CBLF)deformable convolution (DC)external memory access (EMA)high-resolution tasksregion-of-influence (ROI) pyramidsvideo convolutional neural network (V-CNN)
Abstracts:Video convolutional neural networks (V-CNNs) take multiple frames as input and leverage temporal information to enhance quality and temporal consistency, making them promising solutions for high-resolution imaging tasks, such as video super-resolution (VSR) and video frame interpolation (VFI). Previous works have proposed CNN accelerators for single-image high-resolution imaging tasks, using layer-fusion (LF) workflows to reduce the need for external memory access (EMA) of intermediate feature maps (FMs). However, V-CNNs demand more EMA and computational complexity, posing implementation challenges for edge devices. Additionally, using deformable convolution (DC) to break through the fixed shape of the kernel receptive field can improve image quality and temporal consistency but requires additional storage and computational logic. In this article, we present a memory-efficient V-CNN processor, VISTA. We introduce a cuboid-based LF (CBLF) workflow for V-CNNs to reuse temporal information from overlapped FMs at different time points, reducing EMA and computational complexity. Moreover, the VISTA adopts a heterogeneous reuse-recomputing approach to handle overlaps between region-of-influence (ROI) pyramids and uses reference-frame-first scheduling (RFFS) to reduce the need for extensive memory usage during cross-frame alignment computations. Furthermore, we apply a hardware-model co-design to devise tile-based offset-confined DC (TODC), which reduces computational logic and saves line buffer usage for the search window with 0.06–0.18 dB of peak signal-to-noise ratio (PSNR) drop in image quality. The 12.6-mm2 VISTA is fabricated using 40-nm CMOS technology and achieves peak throughput of 4K-UHD 60 and 50 frames/s for supporting VSR and VFI applications, respectively. It reduces 33%–53% of input EMA, 19% of activation static random-access memory (SRAM), and 19%–42% of computational complexity.
A Computing-in-Memory Engine Supporting One-Shot Floating-Point NN Inference and On-Device Fine-Tuning for Edge AI
Haikang DiaoHaoyang LuoJiahao SongBocheng XuRunsheng WangYuan WangXiyuan Tang
Keywords:In-memory computingCommon Information Model (computing)Artificial neural networksComputer architectureComputational modelingComputational efficiencyEnginesThroughputEdge AICostsNeural NetworkEdge AIEnvironmental ChangesThroughputEnergy EfficiencyParallelizationConversion ProcessInference AccuracyFloating-point OperationsDigital CircuitsArtificial Neural NetworkPower ConsumptionObject DetectionFlow DataPer CycleOperating FrequencyTraining AlgorithmNegative WeightsPerformance Of Neural NetworksMost Significant BitInput AlignmentSparse WeightLoss Of PrecisionClock CyclesArea OverheadCompute-in-memory (CIM)floating-point (FP) operationon-device fine-tuningone-shot
Abstracts:With the rapid advancement of edge AI, the complexity of tasks on edge devices is continually increasing, demanding better efficiency and precision from AI accelerators. Pre-aligned floating-point computing-in-memory (FP CIM) has been proposed to achieve high-precision neural network (NN) computations based on floating-point (FP) data precision. However, the complex digital circuitry required for integer (INT) mantissa multiply-accumulate (MAC) computation and exponent alignment severely limits the efficiency and throughput of FP CIM. This work proposes an energy- and area-efficient computing-in-memory (CIM) engine for one-shot FP NN inference and on-device fine-tuning. To improve the throughput of FP CIM, a one-shot compute scheme is proposed to perform FP operation within one cycle. It adopts the multiply-less NN instead of the multiply-based NN to simplify the integer mantissa MAC to minimum selection. A customized 8-bit parallel minimum selector is also designed to further reduce the parallel computation cost. To simplify the FP/INT conversion process, an input-weight co-alignment workflow is proposed to eliminate maximum exponent selection and simplify mantissa shifting logic. To minimize the inference accuracy loss caused by environmental changes, a lightweight on-device fine-tuning core (ODFC) is designed to support online weight updates. The 28-nm fabricated chip achieves an energy efficiency of 128 TFLOPS/W and a computational density of 7.02 TFLOPS/mm2 at BF16, representing a $4.1\times $ and $3.4\times $ improvement over previous state-of-the-art works, respectively.
Digital In-Memory Compute for Machine Learning Applications With Input and Model Security
Maitreyi AshokSaurav MajiXin ZhangJohn CohnAnantha P. Chandrakasan
Keywords:SecuritySystem-on-chipNeural networksComputational modelingCiphersProtectionMachine learningLogic gatesIntegrated circuit modelingAccuracyComputational MemoryNeural NetworkDecodingCircuitryEnergy EfficiencyAccuracy And PrecisionSecret KeySide-channelKey GenerationRandom BitsAccuracy Of Neural NetworkPower ConsumptionInverterState MachineSecurity LevelInformation BitsPower-of-twoParallel OperationClock CyclesTotal Power ConsumptionPhysical Unclonable FunctionsArea OverheadOff-chip MemoryBit-widthHalf AdderSecret ValueMulti-party ComputationSecurity GuaranteesPseudo-random Number GeneratorNeural Network ApproximationIn-memory compute (IMC)machine learningphysically unclonable functionsside-channel security
Abstracts:Digital in-memory compute (IMC) architectures allow for a balance of the high accuracy and precision necessary for many machine learning applications, with high data reuse and parallelism to reduce energy consumption. However, one often overlooked parameter is security, which is necessary to maintain the privacy and integrity of the accelerator. In this work, we propose an IMC macro design that is protected against two types of eavesdropping attacks, passive physical side-channels and memory bus-probing. This is achieved through secure compute that eliminates the need for random bits, local model decryption with a lightweight cipher, and secret key generation reusing existing IMC circuitry. These contributions provide side-channel security against all practical attackers beyond 1 million samples, while still operating without any effect on neural network accuracy at 8.1 TOPS/W energy efficiency.
A Sub-Nanosecond Pulsed VCSEL Driver With PVT-Compensated Constant Current, Integrated Boost Switching Regulator and Class-1 Laser Eye Safety
Ming ZhongYifan WuYuan LiWei ChenMiao SunLiujia SongShanyu CuiXuefeng ChenPatrick Yin ChiangShenglong ZhuoTao Xia
Keywords:Vertical cavity surface emitting lasersSafetyLasersAccuracyPower lasersLaser beam cuttingReal-time systemsPhotonicsLaser tuningCurrent controlVertical-cavity Surface-emitting LasersEye SafetyLaser PulsePulse WidthCurrent ControlSupply VoltageVariable DelayPropagation DelayCurrent LoopSafe SetAmplitude ErrorDuty CycleVoltage DropPeak CurrentOptical PowerOutput CurrentOptical PulsePulse GeneratorCurrent PulseTrigger SignalSingle-photon Avalanche DiodeExternal ClockBoost ConverterTime-to-digital ConverterParasitic InductancePhase-locked LoopSystem-on-chipSchottky DiodeLaser IlluminationGate DriverBoost converterclass-1 laser eye safetydirect time-of-flight (dToF)laser-diode driversub-nanosecond (ns)
Abstracts:This article presents a sub-nanoseconds pulsed laser diode (LD) driver IC for multi-junction (MJ-) vertical-cavity surface-emitting laser (VCSEL) arrays. An embedded real-time current control loop keeps the error of laser current amplitude within ±3% for 900 ps~32 ns pulses, even with process, voltage and temperature (PVT) variations of the VCSEL and the driver. With real-time PVT-compensation techniques, the variation of pulsewidth and propagation delay is below 140 ps for different chips under varying supply. A boost switching regulator is integrated into the IC to convert a low supply voltage into the $5\sim 8.5$ -V laser anode supply. A complete set of laser safety check blocks is integrated to guarantee class-1 eye safety, as the driver IC can stop emitting and cutoff the laser supply when a laser safety hazard is detected.
A Dual-Inductor Quad-Path Hybrid Buck (2L4PHB) Converter With Reduced Inductor Current
Wen-Liang ZengGuigang CaiYan LuSai-Weng SinRui P. MartinsChi-Seng Lam
Keywords:InductorsVoltageVideo recordingStressCapacitorsMicroelectronicsVery large scale integrationTopologyHigh-voltage techniquesDischarges (electric)Inductor CurrentBuck ConverterHybrid BuckConversion RatioHigh Power DensityLoad CurrentLight LoadAverage CurrentVoltage StressPeak EfficiencyHigh Power Conversion EfficiencyDuty RatioConduction ModeContinuous Conduction ModeHigh Voltage StressInductor Current RippleTransfer FunctionDuty CycleTransient ResponseOutput CurrentSwitches S1Conduction LossTurn OffCurrent RatioPower StageVoltage RipplePower EfficiencyVoltage DividerCurrent BalanceConversion LossDiscontinuous conduction mode (DCM) calibrationdouble step-down (DSD) converterdual inductorhigh efficiencyhigh power densityhybrid dc–dcreduced inductor currentswitched-capacitor converter
Abstracts:In high-efficiency 12-to-1–1.8-V applications, the small duty ratio ( ${D}~{\approx }~0.1$ ) and high voltage stress on power switches of the conventional buck converter bring significant efficiency penalty. This article proposes a dual-inductor quad-path hybrid buck (2L4PHB) converter that addresses these issues, achieving both high power conversion efficiency and power density. Compared to the widely used double step-down (DSD) converter, the proposed 2L4PHB converter reduces the average inductor current by 30% and the inductor current ripple by 18% at a voltage conversion ratio (VCR) of 0.15. In addition to continuous conduction mode (CCM), this design incorporates a discontinuous conduction mode (DCM) calibration loop to improve efficiency at light loads. The proposed 2L4PHB converter, fabricated in a 180-nm BCD process, achieves a maximum current density of 0.18 A/mm2 and 256 A/cm3 and a peak efficiency of 93.7% and maintains efficiencies above 85% across load currents ranging from 0.6 to 4 A, all while using a compact inductor with dimensions of $2.5 {\times } 2.0 {\times } 1.0$ mm3 and a DCR of 48 m $\Omega $ .
A Photovoltaic Dynamic Vision Sensor
Pablo Fernández-PeramoJuan Antonio Leñero-BardalloÁngel Rodríguez-Vázquez
Keywords:Voltage controlPhotovoltaic systemsLightingPhotoconductivityNoiseComputer architectureVision sensorsSensitivityProtocolsTransistorsDynamic Vision SensorImpedancePower ConsumptionSolar CellsPhotodiodeSingle OperationFront EndStatic Power ConsumptionThermal NoiseNoise PowerOpen-circuit VoltagePixel LevelContrast SensitivityShot NoiseRefractory PeriodArchitecture For ClassificationNeutral FiltersNoise ContributionNoise PerformanceIllumination LevelsTemporal ContrastConventional ArchitectureLow IlluminationPixel PitchDiffusion CurrentHigh IlluminationPixel ArrayEntire ArrayDark CurrentP-n JunctionAddress event representation (AER)asynchronousdiodedynamic vision sensor (DVS) sensorevent-basedphotovoltaicsolar cellvision sensor
Abstracts:This article reports a dynamic vision sensor (DVS) proof-of-concept chip employing an unconventional photo-transduction front end. Instead of the conventional logarithmic transducer comprising a photodiode and a nonlinear load, the proposed pixel architecture uses a single diode operating in the photovoltaic regime. This operation regime, the same as employed for solar cells, features a voltage-current characteristic that endows the sensor with remarkable sensitivity to transient illumination variations, particularly in low-light conditions. Also, the lack of resistive loads benefits compactness and decreases static power consumption. Experimental results with the sensor in the article demonstrate advantages over previous art regarding noise and latency.
A Hybrid Buck-or-Boost Converter for Fast-Transient and Wide-Voltage-Range Applications With Continuous Output Delivery Current
Junyi RuanJunmin JiangChenzhou DingKai YuanKa Nang LeungZhiyuan ChenXiaoyang ZengXun Liu
Keywords:SwitchesInductorsCapacitorsVoltage measurementBridge circuitsVoltage controlTransient responseTransient analysisBandwidthSwitching circuitsContinuous DeliveryTransient ResponseVoltage RangeInput VoltageVoltage LevelsLoad CurrentMode TransitionPeak EfficiencyReference TrackingDynamic VoltageStart-up ProcessOutput Voltage RippleOutput Voltage RangeTransfer FunctionPower LossShort PulseDC VoltageMaximum VoltageDead TimePower Loss MinimizationDuty RatioPower StageBoost ConverterLED DriverSlew RateLoading CapabilityEquivalent Series ResistanceVolt-second BalanceVirtual GroundContinuous output delivery currentfast dynamic voltage scaling (DVS) and load transient responsehybrid buck-boost dc-dc converterLi-ion battery power supplyno right-half-plane (RHP) zero
Abstracts:This article proposes a hybrid right-half-plane (RHP) zero free buck-or-boost converter designed for applications requiring wide-range input/output voltage levels, fast dynamic voltage scaling (DVS), and load transient responses. The converter operates with a Li-ion battery input voltage range of 2.7–4.2 V and an output voltage range of sub-1 to 6 V. To ensure continuous output delivery current when the voltage conversion ratio (CR) exceeds 2, a dual flying capacitor mode is proposed, which reduces output voltage ripple and alleviates current capacity limitations. Additionally, techniques for reference tracking are introduced to enhance tracking speed and ensure transient reliability. This article also includes dc and ac analysis of the proposed converter, the driver design, the mode transition design, and the start-up process. The chip measurement results show the voltage CR is approximately 0.21–2.22 with a peak efficiency of 97.3%. DVS rates of 1.13– $2.33~{\mu }$ s/V are obtained with the help of assisted time for enhancement. Given an input voltage of 3.7 V and output voltage of 2.5 and 5 V, with step-up and step-down load current ranging from 50 to 800 mA in less than 200 ns, the settling time in both buck and boost modes is around $6~{\mu }$ s.
NeuroFlare: An mm3-Scale Wireless Neural Interface Device With Simultaneous Neural Recording and Optical Stimulation
Linran ZhaoYan GongXiang LiuWei ShiYiming HanWen LiYaoyao Jia
Keywords:CapacitorsLight emitting diodesStimulated emissionOptical pulsesWireless communicationOptical recordingOptical saturationBiomedical optical imagingIntegrated opticsOptical switchesSimultaneous RecordingsNeural RecordingsOptical StimulationSimultaneous StimulationNeural DevicesSimultaneous Optical StimulationEnergy EfficiencyPower ConsumptionFigure Of MeritNeural SignalsCurrent PulseSupply VoltageVoltage PulsesFront EndPresence Of ArtifactsMiniaturization Of DevicesCharge EfficiencyMiniature DevicesConstant CurrentWireless Power TransferWireless DataNeural StimulationGate DriverWireless Data TransmissionWireless PowerPower Spectrum DensityQuantization BitsDevice SizeConduction LossImplantable miniature neural interfaceneural recordingoptical stimulationwireless data transmissionwireless power transmission
Abstracts:This article presents the creation of a wireless, miniature implantable opto-electro neural interface device called NeuroFlare, capable of simultaneous neural recording and optical stimulation. NeuroFlare features a low-power, dual-modal application-specific integrated circuit (ASIC) fabricated using the 180-nm CMOS process. To support the power-intensive optical stimulation in NeuroFlare, the ASIC employs a novel linear-charging switched-capacitor stimulation (LC-SCS) structure. The LC-SCS, operating under only a 1.2-V supply voltage, can illuminate the LED with a driving voltage three times the supply voltage and large current pulses of up to 12 mA while maintaining a high charging efficiency of 86.4%. In addition, LC -SCS requires only one off-chip capacitor, greatly facilitating device miniaturization. To accurately record neural signals in the presence of stimulation artifacts, the ASIC employs a delta-sigma modulator ( $\Delta \Sigma $ M)-based recording front end with a wide dynamic range (DR) to directly digitize the neural signals. The $\Delta \Sigma $ M features a 2nd-order loop architecture implemented using a linearized transconductance-capacitor (Gm-C) integrator followed by an active noise-shaping (NS) successive approximation register (SAR) quantizer. The $\Delta \Sigma $ M with a power consumption of $9.8~{\mu }$ W provides a peak DR of 83.7 dB, corresponding to a 400-mVPP linear input range. The $\Delta \Sigma $ M’s 173.8-dB figure of merit (FoMDR) indicates its superior energy efficiency. The ASIC is assembled into a prototype of NeuroFlare measuring $2.8 \,\, {\times } \,\, 3.5 \,\, {\times } \,\, 0.7$ mm3. The recorded light-evoked local field potential (LFP) verified the functionality of NeuroFlare in vivo.
Hot Journals