A Hybrid Instruction Prefetching Mechanism for Ultra Low-Power Multicore Clusters
Keywords:Energy efficiency; instruction cache; instruction prefetching; ultralow-power (ULP) embedded multicores;
Abstracts:The instruction memory hierarchy plays a critical role in performance and energy efficiency of ultralow-power (ULP) processors for the Internet-of-Things (IoT) end-nodes. This is mainly due to the extremely tight power envelope and area budgets, which imply small instruction-caches (I-Cache) operating at very low supply voltages (near-threshold). The challenge is aggravated by the fact that multiple processors, fetching in parallel, require plenty of bandwidth from the I-Caches. In this letter, we propose a low-cost and energy efficient hybrid instruction-prefetching mechanism to be integrated with a ULP multicore cluster. We study its performance for a wide range of IoT applications, from cryptography to computer vision, and show that it can effectively improve the hit-rate of almost all of them to above 95% (average performance improvement of over 2×). In addition, we designed our prefetcher and integrated it in a 4-cores cluster in 28 nm fully-depleted silicon-on-insulator (FDSOI) technology. We show that system's power consumption increases only by about 11% and silicon area by less than 1%. Altogether, a total energy reduction of 1.9× is achieved, thanks to more than 2× performance improvement, enabling a significantly longer battery life.
A Reactive and Adaptive Data Flow Model for Network-of-System Specification
Keywords:Adaptivity; connected embedded systems; data flow modeling; network losses; reactivity;
Abstracts:With embedded systems being increasingly networked, appropriate models of computation and communication are needed for specification of such networks-of-systems. Traditional dataflow models have shown their usefulness in analyzing isolated systems. However, these models cannot express the inherent requirements of connected applications, such as dynamic behavior associated with network losses and reactivity to external events. This letter proposes a reactive and adaptive data flow (RADF) model that introduces a notion of empty tokens to expose network losses and provide adaptivity at the application level while maintaining overall determinism. Empty tokens, combined with expanded actor semantics, are also used to model reactivity to sporadic external events. We formally define RADF semantics and show efficient methods for analyzing RADF graphs in terms of their worst-case throughput and latency.
Mobile Application to Detect Induction Motor Faults
Keywords:Accelerometer; android application; fault diagnosis; induction motor; vibration spectrum;
Abstracts:An Android-based application has been developed which can convert any mobile phone with inbuilt accelerometer into a squirrel cage induction motor (SCIM) fault diagnosis tool. To detect the faults, the mobile phone needs to be attached to the motor, and the application will record the motor vibration signal using the inbuilt accelerometer. After the recording, the faults are detected by locating the fault frequencies in the motor vibration spectrum. The developed application can also detect the motor faults from any previously recorded files of vibration data. The application has been tested on a 22 kW SCIM using Moto G4 Plus and Moto G5 Plus Android phones. Unlike the other SCIM fault diagnosis systems, the proposed method does not require any dedicated sensors, processing platforms, and power supply arrangements.
Tactics to Directly Map CNN Graphs on Embedded FPGAs
Keywords:Convolutional neural network (CNN); dataflow; field-programmable gate array (FPGA); VHSIC hardware description language (VHDL);
Abstracts:Deep convolutional neural networks (CNNs) are the state-of-the-art in image classification. Since CNN feed forward propagation involves highly regular parallel computation, it benefits from a significant speed-up when running on fine grain parallel programmable logic devices. As a consequence, several studies have proposed field-programmable gate array (FPGA)-based accelerators for CNNs. However, because of the large computational power required by CNNs, none of the previous studies has proposed a direct mapping of the CNN onto the physical resources of an FPGA, allocating each processing actor to its own hardware instance. In this letter, we demonstrate the feasibility of the so called direct hardware mapping (DHM) and discuss several tactics we explore to make DHM usable in practice. As a proof of concept, we introduce the HADDOC2 open source tool, that automatically transforms a CNN description into a synthesizable hardware description with platform-independent DHM.
SAM: Software-Assisted Memory Hierarchy for Scalable Manycore Embedded Systems
Keywords:Manycore embedded system (MES); memory hierarchy; software-programmable memory (SPM);
Abstracts:This letter proposes a system architecture for a scalable software-assisted memory (SAM) hierarchy for emerging manycore embedded systems. Our SAM hierarchy overcomes the coherence overhead and inflexibility of purely hardware-managed memory hierarchies in adapting to variable workloads. Our preliminary results show opportunities for energy saving and performance improvement through: 1) a hybrid software-programmable memory (SPM)/cache local memory adaptable to the application's memory characteristics; 2) SPM-based management of shared data; and 3) virtualizing and sharing the on-chip memory space between concurrently running applications. Primary experimental results show opportunities to reduce the execution time and the memory hierarchy energy consumption by up to 23% and 7%, respectively, depending on the workload.
A Flexible Decision-Making Mechanism Targeting Smart Thermostats
Keywords:Decision-making; embedded system; low-complexity; smart thermostats;
Abstracts:Buildings are immensely energy-demanding and are expected to consume even more in the near future. The operation of cooling/heating mechanisms highly contribute to this parameter, since nonoptimal configuration at temperature set-points usually leads to increased energy cost, as well as violations at occupant's thermal comfort. In this letter, we introduce a flexible decision-making mechanism for supporting the proper configuration of these devices. The competitive advantage of our solution is the remarkable lower computational complexity without any degradation at the quality of derived decisions.
Data Reduction in Sensor Networks: Performance Evaluation in a Real Environment
Keywords:Adaptation and aggregation; data reduction; Kruskal–Wallis test; real experiments; telosB mote; wireless sensor networks (WSNs);
Abstracts:Data reduction is an effective technique for energy saving in wireless sensor networks. It consists on reducing sensing and transmitting data while conserving a high quality of collected information. In this letter, we propose an online data reduction model based on Kruskal-Wallis test that allows sensor nodes to adapt their sensing rates based on the data variance. Then, we propose a local aggregation algorithm to reduce further the data set size before sending to the sink. Experimentation on real telosB sensor network testbed shows the effectiveness of our approach in reducing the size of data transmitted over the network and thus saving energy.
A Powerline-Tuned Camera Trigger for AC Illumination Flickering Reduction
Keywords:Dynamic range improvement; embedded camera trigger; flickering reduction;
Abstracts:Camera triggering represents an essential step for synchronizing artificial vision systems (AVSs) and can affect the quality of acquired images. In fact, a proper trigger signal is mandatory to synchronize in time both stand alone or multiple cameras covering large environments. In addition, indoor environments with artificial light sources can induce flickering noise in captured frames, affecting the performance of the algorithms that are usually executed to interpret a scene or perform various tasks. In this letter, we describe the design of an embedded system for camera triggering that can be employed to deal with such issues and remove flickering noise while capturing an image with the highest possible dynamic range. Experiments on real data show how the proposed trigger can be effectively employed in AVSs.
Real-Time Multiprocessor Scheduling Algorithm Based on Information Theory Principles
Keywords:Global scheduling; information theory; migration minimization; multiprocessor scheduling;
Abstracts:Reducing job migrations is essential for any global multiprocessor scheduling algorithm. In this letter, we present a global, dynamic-priority, laxity-based algorithm that reduces the number of migrations on multiprocessor embedded systems by leveraging information theory principles. A simplification of the proposed scheduling theory is presented to reduce the overhead caused by using information theory. Our results show that the proposed algorithm is able to reduce the number of migrations by up to 41.21% when compared with other global, dynamic-priority, laxity-based algorithms. As the utilization per task set and the number of processors increase, simplified information-theoretic scheduling algorithm is able to improve its performance in terms of the number of migrations.