Welcome to the IKCEST
Journal
IEEE Journal of Selected Topics in Signal Processing

IEEE Journal of Selected Topics in Signal Processing

Archives Papers: 370
IEEE Xplore
Please choose volume & issue:
A Language Model-Based Fine-Grained Address Resolution Framework in UAV Delivery System
Sichun LuoYuxuan YaoHaohan ZhaoLinqi Song
Keywords:Autonomous aerial vehiclesHidden Markov modelsBuildingsDatabasesUrban areasLogisticsTask analysisUnmanned Aerial VehiclesModel-based FrameworkUnmanned Aerial Vehicles DeliveryFuture SystemsLanguage ModelTextual InformationOnline AssessmentUser InputPressure TestInput TextPre-trained Language ModelsPre-processing ModuleParsingF1 ScoreInternet Of ThingsLong Short-term MemoryExact MatchTransformer ModelFood DeliveryConditional Random FieldNamed Entity RecognitionShenzhen CityShanghai CityLongitude CoordinatesLatitude CoordinatesMasked Language ModelCommercial DistrictsSuccessful MatchingSelf-attention MechanismBERT ModelLanguage modelAddress resolutionUAV delivery system
Abstracts:Accurate address resolution plays a vital role in UAV delivery systems. Existing address resolution systems heavily rely on user-provided Point of Interest (POI) information. However, such information often lacks precision, making it challenging to obtain fine-grained details for further processing. In this paper, we present an end-to-end Language Model-based fine-grained Address Resolution framework (LMAR). Instead of solely relying on POI information, we introduce a language model to process the user input text information. Specifically, we start by collecting data and constructing two datasets, which are then used to fine-tune a pre-trained language model. Additionally, our pipeline incorporates pre-processing and post-processing modules to handle data processing and regularization. We combine the output of the language model with the POI information to perform a database match and derive the final outcome. To evaluate our proposed LMAR, we conduct offline and online experiments. In both offline and online testing, our proposed model achieves an overall performance of over 90% accuracy, while in the online pressure test, it achieves satisfactory performance, demonstrating its effectiveness and practicality. The proposed LMAR has passed the internal test and will be deployed into the Meituan UAV delivery system in the near future.
Standoff Target Tracking for Networked UAVs With Specified Performance via Deep Reinforcement Learning
Yi XiaJun DuZekai ZhangZiyuan WangJingzehua XuWeishi Mi
Keywords:Autonomous aerial vehiclesTarget trackingQuadrotorsVectorsEnergy consumptionVehicle dynamicsUncertaintyUnmanned Aerial VehiclesDeep Reinforcement LearningTarget TrackingUnmanned Aerial Vehicles NetworksEnergy ConsumptionTime ConstraintsBarrier FunctionTracking PerformanceTracking AccuracyCost ControlUnknown DynamicsAngular SpaceHuntingUnit VectorNetwork ParametersRobust ControlTracking ErrorEvaluation StageActor NetworkOptimal TargetUnknown TargetOptimal TrackingExtended State ObserverMarkov Decision ProcessAngular DistanceUnknown UncertaintiesRobust TrackingReward FunctionTarget TrajectoryCritic NetworkUAVsdeep reinforcement learningdisturbance compensationstandoff target tracking
Abstracts:Maintaining rapid and prolonged standoff target tracking for networked unmanned aerial vehicles (UAVs) is challenging, as existing methods fail to improve tracking performance while simultaneously reducing energy consumption. This paper proposes a deep reinforcement learning (DRL)-based tracking scheme for UAVs to approximate an escape target, effectively addressing time constraints and guaranteeing low energy expenditure. In the first phase, a coordinated target tracking protocol and a target position estimator are developed using only bearing measurements, which enable the deployment of UAVs along a standoff circle centered at the target with an expected angular spacing. Additionally, an unknown system dynamics estimator (USDE) is devised based on concise filtering operations to mitigate adverse disturbances. In the second phase, multi-agent deep deterministic policy gradient (MADDPG) is employed to strike an optimal balance between tracking accuracy and energy consumption by encoding time limitations as skilled barrier functions. Simulation results demonstrate that the proposed method outperforms benchmarks in terms of tracking accuracy and control cost.
DDL: Empowering Delivery Drones With Large-Scale Urban Sensing Capability
Xuecheng ChenHaoyang WangYuhan ChengHaohao FuYuxuan LiuFan DangYunhao LiuJinqiang CuiXinlei Chen
Keywords:SensorsDronesTask analysisRobot sensing systemsOptimizationResource managementProgrammingDrone DeliveryDeep LearningOptimization ProblemComputational EfficiencyTime Of DeliveryDecision VariablesEfficient SolutionMixed-integer ProgrammingDeep Reinforcement LearningEnergy CapacityHeavy ComputationDelivery PerformanceMixed-integer Programming ProblemMixed-integer Nonlinear Programming ProblemEnergy ConsumptionObjective FunctionUpper BoundRunning TimeArea SizeFeasible SolutionEfficient SchedulingRoute PlanningCurrent SolutionEnergy Consumption ModelEnergy AvailabilityDelivery TeamHeuristic AlgorithmTime AllocationScheduling SystemHeuristic MethodCyber-physical systemsdeep reinforcement learningdrone swarmsmart cities
Abstracts:Delivery drones provide a promising sensing platform for smart cities thanks to their city-wide infrastructure and large-scale deployment. However, due to limited battery lifetime and available resources, it is challenging to schedule delivery drones to derive both high sensing and delivery performance, which is a highly complicated optimization problem with several coupled decision variables. Meanwhile, this complex optimization problem involves multiple interconnected decision variables, making it even more complex. In this paper, we first propose a delivery drone-based sensing system and formulate a mixed-integer non-linear programming problem (MINLP) that jointly optimizes the sensing utility and delivery time, considering practical factors including energy capacity and available delivery drones. Then we provide an efficient solution that integrates the strength of deep reinforcement learning (DRL) and heuristic, which decouples the highly complicated optimization search process and replaces the heavy computation with a rapid approximation. Evaluation results compared with the state-of-the-art baselines show that DDL improves the scheduling quality by at least 46% on average. More importantly, our proposed method could effectively improve the computational efficiency, which is up to 98 times higher than the best baseline.
Adaptive Top-K in SGD for Communication-Efficient Distributed Learning in Multi-Robot Collaboration
Mengzhe RuanGuangfeng YanYuanzhang XiaoLinqi SongWeitao Xu
Keywords:ConvergenceTrainingQuantization (signal)Error compensationDistance learningComputer aided instructionAdaptation modelsStochastic Gradient DescentDistributed LearningMulti-robot CollaborationLoss FunctionModel PerformanceUpper BoundGradient DescentConvergence RateImage ClassificationObject DetectionStochastic GradientMulti-agent SystemsCommunication CostImage Classification TasksConvergence PerformanceConvergence Of ErrorCompression MethodError CompensationGradient NormPASCAL VOC DatasetMNIST DatasetCompression LevelTheoretical AnalysisSparsity LevelCompression RatioCommunication OverheadFederated LearningCompressorObjective FunctionEntire Training ProcessDistributed learningcommunication-efficientgradient sparsificationerror compensationmulti-robot collaboration
Abstracts:Distributed stochastic gradient descent (D-SGD) with gradient compression has become a popular communication-efficient solution for accelerating optimization procedures in distributed learning systems like multi-robot systems. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach with a systematic treatment and analysis to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in Stochastic Gradient Descent framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error with respect to the norm of gradients and the communication budget. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, we consider communication budget constraints and propose an optimization formulation for minimizing the deep model's convergence error under such constraints. We obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, we conduct numerical experiments on general image classification tasks using the MNIST, CIFAR-10 datasets. For the multi-robot collaboration tasks, we choose the object detection task on the PASCAL VOC dataset. The results demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.
Topology-Preserving Motion Coordination for Multi-Robot Systems in Adversarial Environments
Zitong WangYushan LiXiaoming DuanJianping He
Keywords:TopologyRobot kinematicsSignal processing algorithmsPerturbation methodsInference algorithmsHeuristic algorithmsRobot sensing systemsMotor CoordinationMulti-agent SystemsConvergence RateReal-world ExperimentsSecurity RisksCoordination Of ProcessesInteraction TopologyCoordinate AlgorithmPrecise CoordinationInference AttacksDistortionSystem PerformanceChanges In PositionOrdinary Least SquaresControl InputRobotic SystemAlgorithm DesignDirected GraphIndependent SignalsLaplacian MatrixRobot StateSecond-order DynamicsUnmanned Ground VehiclesOrdinary Least Squares EstimatesPosition Of The RobotVelocity Of The RobotFirst-order SystemDifferential PrivacyDistributed MannerMulti-robot systemsinteraction topologytopology preservationsignal processinginference attack
Abstracts:The interaction topology plays a significant role in the distributed motion coordination of multi-robot systems (MRSs) for its noticeable impact on the information flow between robots. However, recent research has revealed that in adversarial environments, the topology can be inferred by external adversaries equipped with advanced sensors, posing severe security risks to MRSs. Therefore, it is of utmost importance to preserve the interaction topology from inference attacks while ensuring the coordination performance. To this end, we propose a topology-preserving motion coordination (TPMC) algorithm that strategically introduces perturbation signals during the coordination process with a compensation design. The major novelty is threefold: i) We focus on the second-order motion coordination model and tackle the coupling issue of the perturbation signals with the unstable state updating process; ii) We develop a general framework for distributed compensation of perturbation signals, strategically addressing the challenge of perturbation accumulation while ensuring precise motion coordination; iii) We derive the convergence conditions and rate characterization to achieve the motion coordination under the TPMC algorithm. Extensive simulations and real-world experiments are conducted to verify the performance of the proposed method.
A Two-Stage Audio-Visual Speech Separation Method Without Visual Signals for Testing and Tuples Loss With Dynamic Margin
Yinggang LiuYuanjie DengYing Wei
Keywords:VisualizationFeature extractionTask analysisMeasurementTestingTrainingTime-frequency analysisVisual SignalsSpeech SeparationLoss FunctionVisual InformationNetwork TrainingVisual FeaturesChange StrategiesVisual InputTwo-stage MethodSeparate NetworksMarginal ChangesGridded DatasetsIntelligent RobotsHigh-quality SignalsDynamic ChangesDeep Neural NetworkTime DomainAmount Of ChangeFeature RepresentationFeature FusionTriplet LossSeparation PerformanceMetric LearningSignal-to-interference RatioAudio InformationMetric Learning MethodsMatched PairsExtract Visual FeaturesBidirectional Long Short-term MemoryTriggering ConditionSpeech separationaudio-visual matchingdynamic margin
Abstracts:Speech separation as a fundamental task in signal processing can be used in many types of intelligent robots, and audio-visual (AV) speech separation has been proven to be superior to audio-only speech separation. In current AV speech separation methods, visual information plays a pivotal role not only during network training but also during testing. However, due to various factors in real environments, sensors do not always possible to obtain high-quality visual signals. In this paper, we propose an effective two-stage AV speech separation model that introduces a new approach of visual feature embedding, where visual information is used to optimize the separation network during training, but no visual input is required during testing. Different from the current methods which fuse visual features and audio features together as the input of the separation network, in this model, visual features are embedded into AV matching block to calculate the cross-modal consistency loss, which is used as part of the loss function for network optimization. A novel tuples loss function with a learnable dynamic margin is proposed for better AV matching, and two margin change strategies are given. The proposed two-stage AV speech separation method is evaluated on the widely used GRID and VoxCeleb2 datasets. Experimental results show that the performance outperforms current AV speech separation methods.
Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition
R. Gnana PraveenJahangir Alam
Keywords:VisualizationAdaptation modelsEmotion recognitionComputational modelingPredictive modelsFeature extractionNoise measurementEmotion RecognitionEmotional DimensionsCross-modal AttentionDimensional Emotion RecognitionFeature RepresentationExtensive ExperimentsAttention MechanismVisual ModalityGating MechanismComplementary RelationshipArousalValidation SetDeep Learning ModelsVisual FeaturesLong Short-term MemoryVideo ClipsIndividual ModulesExternal DatasetDecision-level FusionGate LayerConcordance Correlation CoefficientAttention ModelFusion PerformanceFeature-level FusionGraph Convolutional NetworkCross-modal InteractionsDynamic VideoMel-frequency Cepstral CoefficientsAudio-visual fusioncross-attentionemotion recognitionnon-complementary relationships
Abstracts:Multimodal emotion recognition has immense potential for the comprehensive assessment of human emotions, utilizing multiple modalities that often exhibit complementary relationships. In video-based emotion recognition, audio and visual modalities have emerged as prominent contact-free channels, widely explored in existing literature. Current approaches typically employ cross-modal attention mechanisms between audio and visual modalities, assuming a constant state of complementarity. However, this assumption may not always hold true, as non-complementary relationships can also manifest, undermining the efficacy of cross-modal feature integration and thereby diminishing the quality of audio-visual feature representations. To tackle this problem, we introduce a novel Incongruity-Aware Cross-Attention (IACA) model, capable of harnessing the benefits of robust complementary relationships while efficiently managing non-complementary scenarios. Specifically, our approach incorporates a two-stage gating mechanism designed to adaptively select semantic features, thereby effectively capturing the inter-modal associations. Additionally, the proposed model demonstrates an ability to mitigate the adverse effects of severely corrupted or missing modalities. We rigorously evaluate the performance of the proposed model through extensive experiments conducted on the challenging RECOLA and Aff-Wild2 datasets. The results underscore the efficacy of our approach, as it outperforms state-of-the-art methods by adeptly capturing inter-modal relationships and minimizing the influence of missing or heavily corrupted modalities. Furthermore, we show that the proposed model is compatible with various cross-modal attention variants, consistently improving performance on both datasets.
Brain-Inspired Visual Attention Modeling Based on EEG for Intelligent Robotics
Shuzhan HuYiping DuanXiaoming TaoJian ChuJianhua Lu
Keywords:ElectroencephalographyBit rateMeasurementBrain modelingVideo compressionVisualizationResource managementVisual AttentionIntelligent RobotsPrediction ModelMachine LearningLimited ResourcesMeasurement ModelAttention MechanismBrain ResponsesVideo ClipsPerception Of QualityVideo ContentHuman-robot InteractionVideo InformationLeast Significant BitHuman AttentionVideo CompressionConvolutional Neural NetworkConvolutional LayersEEG DataLatent SpaceEEG SignalsGraph Convolutional NetworkShort-time Fourier TransformRapid Serial Visual PresentationOptical FlowEEG SamplesMost Significant BitVideo SequencesHigh Level Of AttentionEEG ExperimentVisual attentionhuman-robot interactionintelligent roboticsEEGvideo compression
Abstracts:Vision, as the primary perceptual mode for intelligent robots, plays a crucial role in various human-robot interaction (HRI) scenarios. In certain situations, it is essential to utilize the visual sensors to capture videos for humans, assisting them in tasks like exploration missions. However, the increasing amount of video information brings great challenges for data transmission and storage. Therefore, there is an urgent need to develop more efficient video compression strategies to address this challenge. When perceiving a video, humans tend to pay more attention to some specific clips, which may occupy a small part of the whole video content, but largely affect the perceptual quality. This human visual attention (VA) mechanism provides valuable inspiration for optimizing video compression methods for HRI scenarios. Therefore, we combine psychophysiological paradigms and machine learning methods to model human VA and introduce it into the bitrate allocation to fully utilize the limited resources. Specifically, we collect electroencephalographic (EEG) data when humans watch videos, constructing an EEG dataset reflecting VA. Based on the dataset, we propose a VA measurement model to determine the VA states of humans in their underlying brain responses. Then, a brain-inspired VA prediction model is established to obtain VA metrics directly from the videos. Finally, based on the VA metric, more bitrates are allocated to the clips that humans pay more attention to. The experimental results show that our proposed methods can accurately determine the humans' VA states and predict the VA metrics evoked by different video clips. Furthermore, the bitrate allocation method based on the VA metric can achieve better perceptual quality at low bitrates.
ViT-MDHGR: Cross-Day Reliability and Agility in Dynamic Hand Gesture Prediction via HD-sEMG Signal Decoding
Qin HuGolara Ahmadi AzarAlyson FletcherSundeep RanganS. Farokh Atashzar
Keywords:Feature extractionTestingElectrodesAdaptation modelsTrainingData modelsSolid modelingAgilityDynamic PredictionHand GesturesHigh-density sEMGWearableTest DayIntradaySurface ElectromyographyGesture RecognitionSignificant WindowsControl DelayHand Gesture RecognitionSmall Portion Of DataConvolutional Neural NetworkMachine Learning MethodsWindow SizeLong Short-term MemoryRecurrent Neural NetworkPrevious DayTraditional Machine LearningPre-training StageTraditional Machine Learning MethodssEMG SignalsVision TransformerPosition EmbeddingTransformer EncoderCalibration StageLinear LayerRNN-based ModelsSource DomainHuman-robot interactionssurface electromyography (sEMG)vision transformerhand gesture recognition (HGR)cross-day HGRminimal calibration
Abstracts:Surface electromyography (sEMG) and high-density sEMG (HD-sEMG) biosignals have been extensively investigated for myoelectric control of prosthetic devices, neurorobotics, and more recently human-computer interfaces because of their capability for hand gesture recognition/prediction in a wearable and non-invasive manner. High intraday (same-day) performance has been reported. However, the interday performance (separating training and testing days) is substantially degraded due to the poor generalizability of conventional approaches over time, hindering the application of such techniques in real-life practices. There are limited recent studies on the feasibility of multi-day hand gesture recognition. The existing studies face a major challenge: the need for long sEMG epochs makes the corresponding neural interfaces impractical due to the induced delay in myoelectric control. This paper proposes a compact ViT-based network for multi-day dynamic hand gesture prediction. We tackle the main challenge as the proposed model only relies on very short HD-sEMG signal windows (i.e., 50 ms, accounting for only one-sixth of the convention for real-time myoelectric implementation), boosting agility and responsiveness. Our proposed model can predict 11 dynamic gestures for 20 subjects with an average accuracy of over 71% on the testing day, 3-25 days after training. Moreover, when calibrated on just a small portion of data from the testing day, the proposed model can achieve over 92% accuracy by retraining less than 10% of the parameters for computational efficiency.
Cooperative Robotics Visible Light Positioning: An Intelligent Compressed Sensing and GAN-Enabled Framework
Sicong LiuXianyao WangJian SongZhu Han
Keywords:RobotsLocation awarenessLight emitting diodesWireless communicationWireless sensor networksRobot sensing systemsService robotsVisible LightVisible Light PositioningAutocorrelationMeasurement ModelGrid PointsGenerative Adversarial NetworksPosition Of The RobotIntelligent RobotsMultiple RobotsDeep LearningInternet Of ThingsLocalization AccuracyPosition ErrorExtensive SimulationsLocal VectorChannel GainNoise IntensityReal VectorNon-line-of-sightObservation MatrixSparse VectorIndustrial Internet Of ThingsIndependent EquationsAutomated Guided VehiclesPilot SignalsHigh-precision PositioningIndoor LocalizationCramer-Rao Lower BoundOrthogonal Matching PursuitAdditive NoiseRobotics sensingvisible light positioningmulti-target localizationcompressed sensingcooperative localization
Abstracts:This article presents a compressed sensing (CS) based framework for visible light positioning (VLP), designed to achieve simultaneous and precise localization of multiple intelligent robots within an indoor factory. The framework leverages light-emitting diodes (LEDs) originally intended for illumination purposes as anchors, repurposing them for the localization of robots equipped with photodetectors. By predividing the plane encompassing the robot positions into a grid, with the number of robots being notably fewer than the grid points, the inherent sparsity of the arrangement is harnessed. To construct an effective sparse measurement model, a sequence of aggregation, autocorrelation, and cross-correlation operations are employed to the signals. Consequently, the complex task of localizing multiple targets is reformulated into a sparse recovery problem, amenable to resolution through CS-based algorithms. Notably, the localization precision is augmented by inter-target cooperation among the robots, and inter-anchor cooperation among distinct LEDs. Furthermore, to fortify the robustness of localization, a generative adversarial network (GAN) is introduced into the proposed localization framework. The simulation results affirm that the proposed framework can successfully achieve centimeter-level accuracy for simultaneous localization of multiple targets.
Hot Journals