Welcome to the IKCEST
IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering

Archives Papers: 657
IEEE Xplore
Please choose volume & issue:
A Similarity-Based Framework for Classification Task
Zhongchen MaSongcan Chen
Keywords:TrainingTime complexityTask analysisComputational modelingLogisticsData preprocessingTraining dataSimilarity-based learningmulti-classmulti-labelclass interdependenciesinterpretability
Abstracts:Similarity-based method gives rise to a new class of methods for multi-label learning and also achieves promising performance. In this paper, we generalize this method, resulting in a new framework for classification task. Specifically, we unite similarity-based learning and generalized linear models to achieve the best of both worlds. This allows us to capture interdependencies between classes and prevent from impairing performance of noisy classes. Each learned parameter of the model can reveal the contribution of one class to another, providing interpretability to some extent. Experiment results show the effectiveness of the proposed approach on multi-class and multi-label datasets.
Verifiable Fuzzy Multi-Keyword Search Over Encrypted Data With Adaptive Security
Qiuyun TongYinbin MiaoJian WengXimeng LiuKim-Kwang Raymond ChooRobert H. Deng
Keywords:IndexesKeyword searchResistsEncryptionData privacyPeriodic structuresComplexity theoryAdaptive securityfuzzy multi-keyword searchresult verificationsymmetric searchable encryption
Abstracts:To ensure the security of outsourced data without affecting data availability, one can use Symmetric Searchable Encryption (SSE) to achieve search over encrypted data. Considering that query users may search with misspelled words, the fuzzy search should be supported. However, conventional privacy-preserving fuzzy multi-keyword search schemes are incapable of achieving the result verification and adaptive security. To solve the above challenging issues, in this paper we propose a <underline>V</underline>erifiable <underline>F</underline>uzzy multi-keyword <underline>S</underline>earch scheme with <underline>A</underline>daptive security (VFSA). VFSA first employs the locality sensitive hashing to hash the misspelled and correct keywords to the same positions, then designs a twin Bloom filter for each document to store and mask all keywords contained in the document, next constructs an index tree based on the graph-based keyword partition algorithm to achieve adaptive sublinear retrieval, finally combines the Merkle hash tree structure with the adapted multiset accumulator to check the correctness and completeness of search results. Our formal security analysis shows that VFSA is secure under the IND-CKA2 model and achieves query authentication. Our empirical experiments using the real-world dataset demonstrate the practicality of VFSA.
Variational Bandwidth Auto-Encoder for Hybrid Recommender Systems
Yaochen ZhuZhenzhong Chen
Keywords:CollaborationUncertaintyFeature extractionBandwidthRecommender systemsNoise measurementMeasurement uncertaintyAuto-encodersinformation bottleneckrecommender systemsuncertainty modelingvariational inference
Abstracts:Hybrid recommendations have recently attracted a lot of attention where user features are utilized as auxiliary information to address the sparsity problem caused by insufficient user-item interactions. However, extracted user features generally contain rich multimodal information, and most of them are irrelevant to the recommendation purpose. Therefore, excessive reliance on these features will make the model overfit on noise and difficult to generalize. In this article, we propose a variational bandwidth auto-encoder (VBAE) for recommendations, aiming to address the sparsity and noise problems simultaneously. VBAE first encodes user collaborative and feature information into Gaussian latent variables via deep neural networks to capture non-linear user similarities. Moreover, by considering the fusion of collaborative and feature variables as a virtual communication channel from an information-theoretic perspective, we introduce a user-dependent channel to dynamically control the information allowed to be accessed from the feature embeddings. A quantum-inspired uncertainty measurement of the hidden rating embeddings is proposed accordingly to infer the channel bandwidth by disentangling the uncertainty information in the ratings from the semantic information. Through this mechanism, VBAE incorporates adequate auxiliary information from user features if collaborative information is insufficient, while avoiding excessive reliance on noisy user features to improve its generalization ability to new users. Extensive experiments conducted on three real-world datasets demonstrate the effectiveness of the proposed method. Codes and datasets are released at <uri>https://github.com/yaochenzhu/VBAE</uri>.
Towards Automatic Job Description Generation With Capability-Aware Neural Networks
Chuan QinKaichun YaoHengshu ZhuTong XuDazhong ShenEnhong ChenHui Xiong
Keywords:RecruitmentData modelsTask analysisWritingTrainingNatural languagesWeb and internet servicesJob description generationrecruitment analysistopic model
Abstracts:A job description shows the responsibilities of the job position and the skill requirements for the job. An effective job description will help employers to identify the right talents for the job, and give a clear understanding to candidates of what their duties and qualifications for a particular position would be. However, due to the variation in experiences, it is always a challenge for both hiring managers and recruiters to decide what capabilities the job requires and prioritize them accordingly on the job description. Also, tedious and expensive human efforts are usually required to prepare a job description. Therefore, in this paper, we investigate how to automate the process to generate job descriptions with less human intervention. To this end, we propose an end-to-end capability-aware neural job description generation framework, namely Cajon, to facilitate the writing of job description. Specifically, we first propose a novel capability-aware neural topic model to distill the various capability information from the larger-scale recruitment data. Also, an encoder-decoder recurrent neural network is designed for enabling the job description generation. In particular, the capability-aware attention mechanism and copy mechanism are proposed to guide the generation process to ensure the generated job descriptions can comprehensively cover relevant and representative capability requirements for the job. Moreover, we propose a capability-aware policy gradient training algorithm to further enhance the rationality of the generated job description. Finally, extensive experiments on real-world recruitment data clearly show our Cajon framework can help to generate more effective job descriptions in an interpretable way. In particular, our Cajon framework has been deployed in Baidu as an intelligent tool for talent recruitment.
Top-<italic>k</italic> Socio-Spatial Co-Engaged Location Selection for Social Users
Nur Al Hasan HaldarJianxin LiMohammed Eunus AliTaotao CaiYunliang ChenTimos SellisMark Reynolds
Keywords:Cultural differencesSocial networking (online)OrganizationsAustraliaAdvertisingSpatial databasesSocial factorsLBSNlocation selection in social networkssocial graph computingspatial database
Abstracts:With the advent of location-based social networks, users can tag their daily activities in different locations through check-ins. These check-in locations signify user preferences for various socio-spatial activities and can be used to improve the quality of services in some applications such as recommendation systems, advertising, and group formation. To support such applications, in this paper, we formulate a new problem of <italic>identifying top-k</italic> <italic>S</italic><italic>ocio-</italic><italic>S</italic><italic>patial co-engaged</italic> <italic>L</italic><italic>ocation</italic> <italic>S<italic/>election</italic> (<italic>SSLS</italic>) for users in a social graph, that selects the best set of <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="li-ieq2-3151095.gif"/></alternatives></inline-formula> locations from a large number of location candidates relating to the user and her friends. The selected locations should be (i) <italic>spatially and socially relevant</italic> to the user and her friends, and (ii) <italic>diversified both spatially and socially</italic> to maximize the coverage of friends in the socio-spatial space. This problem has been proved as NP-hard. To address such a challenging problem, we first develop an <monospace>Exact</monospace> solution by designing some pruning strategies based on derived bounds on diversity. To make the solution scalable for large datasets, we also develop an approximate solution by deriving relaxed bounds and advanced termination rules to filter out insignificant intermediate results. To further accelerate the efficiency, we present one fast exact approach and a meta-heuristic approximate approach by avoiding the repeated computation of diversity at the running time. Finally, we have performed extensive experiments to evaluate the performance of our proposed algorithms against three adapted existing methods using four - arge real-world datasets.
Shortening Passengers&#x2019; Travel Time: A Dynamic Metro Train Scheduling Approach Using Deep Reinforcement Learning
Zhaoyuan WangZheyi PanShun ChenShenggong JiXiuwen YiJunbo ZhangJingyuan WangZhiguo GongTianrui LiYu Zheng
Keywords:Urban areasDynamic schedulingReinforcement learningCorrelationFeature extractionSchedulesNeural networksMetro systemsspatio-temporal dataneural networkdeep reinforcement learningurban computing
Abstracts:As travel efficiency matters to the work productivity of cities, shortening passengers' travel time for metros is therefore a pressing need. To this end, we study a strategy by dynamically scheduling dwell time for trains. Developing such a strategy is challenging because of three aspects: 1) Optimizing the average travel time of passengers needs to properly balance passengers' waiting time at platforms and journey time on trains, as well as considering long-term impacts; 2) Capturing dynamic spatio-temporal (ST) correlations of incoming passengers for metro stations is difficult; and 3) For each train, the dwell time scheduling is affected by other trains, which is hard to measure. To tackle these challenges, we propose a novel deep neural network, entitled AutoDwell. Specifically, AutoDwell optimizes the long-term rewards of dwell time settings in terms of passengers' waiting and journey time by a reinforcement learning framework. Next, AutoDwell employs gated recurrent units and graph attention networks to extract the ST correlations of the passenger flows among metro stations. Moreover, attention mechanisms are leveraged in AutoDwell for capturing the interactions between the trains. Extensive experiments on two real-world datasets demonstrate the superior performance of AutoDwell over several baselines, capable of saving passengers' travel time significantly.
Short Text Topic Learning Using Heterogeneous Information Network
Qingren WangChengcheng ZhuYiwen ZhangHong ZhongJinqin ZhongVictor S. Sheng
Keywords:SemanticsPeriodic structuresOptical wavelength conversionGrammarPeer-to-peer computingSpeech processingElectronic mailShort textstopic learningheterogeneous information networkparts of speechmeta structurenatural language processing
Abstracts:With the explosive growth of short texts on users&#x2019; interests and preferences, learning discriminative and coherent latent topics from short texts is a critical and significative work, since many practical applications, such as e-commerce and recommendations, require semantic understandings that short texts convey explicitly and implicitly. However, existing short text topic learning methods face the challenge of fully capturing semantically related co-occurrence phrases. Therefore, this paper proposes a novel <underline>H</underline>eterogeneous <underline>I</underline>nformation <underline>N</underline>etwork-based <underline>Sho</underline>rt <underline>T</underline>ext <underline>T</underline>opic learning approach (HIN-ShoTT) in terms of parts of speech, without depending on any auxiliary information. Specifically, HIN-ShoTT can be decomposed into three phases: <inline-formula><tex-math notation="LaTeX">${{i}}$</tex-math><alternatives><mml:math><mml:mi>i</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq1-3147766.gif"/></alternatives></inline-formula>) seeking semantic relations among words with different parts of speech, where HIN-ShoTT models multiple explicit and implicit semantic relations among words based on a Heterogeneous Information Network (HIN) in terms of parts of speech; <inline-formula><tex-math notation="LaTeX">${{ii}}$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq2-3147766.gif"/></alternatives></inline-formula>) extracting co-occurrence phrases and filtering noises, where HIN-ShoTT defines parts-of-speech meta structures to guide co-occurrence phrase extraction and a self-adapting threshold filtering module is proposed for discarding noises; and <inline-formula><tex-math notation="LaTeX">${{iii}}$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow>- /mml:math><inline-graphic xlink:href="zhang-ieq3-3147766.gif"/></alternatives></inline-formula>) inferring topics, where HIN-ShoTT directly models the generative process of co-occurrence phrases to make topic learning effective with the abundant corpus-level information. Our experimental results on three real-world datasets not only show that HIN-ShoTT performs well, but also demonstrate that it is feasible to incorporate HIN into short text topic learning for accuracy improvement.
Semi-Supervised Clustering Under a &#x201C;Compact-Cluster&#x201D; Assumption
Zhen JiangYongzhao ZhanQirong MaoYang Du
Keywords:Clustering algorithmsLabelingLinear programmingPartitioning algorithmsSearch problemsOptimizationTransformsSemi-supervised clusteringpartial labelingcompact-cluster assumptioncluster-splitting
Abstracts:Semi-supervised clustering (SSC) aims to improve clustering performance with the support of prior knowledge (i.e., side information). Compared with pairwise constraints, the partial labeling information is more natural to characterize the data distribution in a high level. However, the natural gap between the class information and the clustering is not adequately taken into account in exiting SSC methods when utilizing partial labeling information to guide the clustering procedure. In order to address this problem, we present a &#x201C;compact-cluster&#x201D; assumption for SSC to utilize the partial labeling information via a cluster-splitting technique. Based on this assumption, a general framework, CSSC, is proposed to supervise the traditional clustering with an objective function which is defined by incorporating an item to measure the compact degree of clusters. Furthermore, we provide two effective solutions for Kmeans and spectral clustering within the CSSC framework and derive the corresponding algorithms to seek the optimum number of clusters and their centroids. Corresponding theoretical analyses demonstrate the feasibility and effectivity of the proposed method. Finally, the extensive experiments on eight real-world datasets demonstrate the superiority of our method over other state-of-the-art SSC methods.
Semi-Supervised Air Quality Forecasting via Self-Supervised Hierarchical Graph Neural Network
Jindong HanHao LiuHaoyi XiongJing Yang
Keywords:Air qualityUrban areasSpatiotemporal phenomenaForecastingMonitoringGraph neural networksAtmospheric modelingAir quality forecastinggraph neural networkself-supervised learningurban computing
Abstracts:Predicting air quality in fine spatiotemporal granularity is of great importance for air pollution control and urban sustainability. However, existing studies are either focused on predicting station-wise future air quality, or inferring current air quality for unmonitored regions. How to accurately forecast future air quality for these unmonitored regions in a fine granularity remains an unexplored problem. In this paper, we propose the Self-Supervised Hierarchical Graph Neural Network (SSH-GNN), for fine-grained air quality forecasting in a semi-supervised way. Specifically, to augment spatially sparse air quality observations, SSH-GNN first approximates the city-wide air quality distribution based on historical readings and various urban contextual factors (e.g., weather conditions and traffic flows). Then, we propose a hierarchical recurrent graph neural network to make city-wide predictions, which encodes the spatial hierarchy of urban regions for long-range spatiotemporal correlation modeling. Moreover, by leveraging spatiotemporal self-supervision strategies, SSH-GNN exploits both universal topological and contextual patterns to further enhance the forecasting effectiveness. Extensive experiments on two real-world datasets show that SSH-GNN significantly outperforms the state-of-the-art algorithms.
Robust Clustering Model Based on Attention Mechanism and Graph Convolutional Network
Hui XiaShushu ShaoChunqiang HuRui ZhangTie QiuFu Xiao
Keywords:RobustnessData modelsDeep learningComputational modelingTask analysisFusesTrainingDeep clusteringadversarial attackrobustnessdefense
Abstracts:GCN-based clustering schemes cannot interactively fuse feature information of nodes and topological structure information of graphs, leading to insufficient accuracy of clustering results. Moreover, the deep clustering model based on graph structure is vulnerable to the attack of adversarial samples leading to the reduced robustness of the model. To solve the above two problems, this paper proposes a robust clustering model based on attention mechanism and graph convolutional network (GCN), named AG-cluster. This model firstly uses graph attention network and GCN to learn the feature information of nodes and the topological structure information of graphs, respectively. Then the representation results of the above two learning modules are interactively fused by the interlayer transfer operator. Finally, the model is trained end-to-end using a self-supervised training module to optimize the clustering results of the model. In particular, an efficient graph purification defense mechanism (GPDM) is designed to resist adversarial attacks on graph data to improve the robustness of the model. Experimental results show that AG-cluster outperforms the other four benchmark methods, specifically, AG-cluster improves 7.6&#x0025; in Accuracy and 11.5&#x0025; in NMI compared to the best benchmark method. Besides, the new model still shows higher robustness and stronger transferability under multiple attacks.
Hot Journals