Welcome to the IKCEST
Journal
IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering

Archives Papers: 579
IEEE Xplore
Please choose volume & issue:
Walking With Attention: Self-Guided Walking for Heterogeneous Graph Embedding
Yunzhi HaoXinchao WangXingen WangXinyu WangChun ChenMingli Song
Keywords:SemanticsTask analysisMatrix decompositionLegged locomotionTrainingTopologyData miningGraph embeddinggraph representation learningheterogeneous graphattention networks
Abstracts:Heterogeneous graph embedding aims at learning low-dimensional representations from a graph featuring nodes and edges of diverse natures, and meanwhile preserving the underlying topology. Existing approaches along this line have largely relied on <italic>meta-paths</italic>, which are by nature hand-crafted and pre-defined transition rules, so as to explore the semantics of a graph. Despite the promising results, defining meta-paths requires domain knowledge, and thus when the test distribution deviates from the priors, such methods are prone to errors. In this paper, we propose a self-learning scheme for heterogeneous graph embedding, termed as self-guided walk (SILK), that bypasses meta-paths and learns adaptive attentions for node walking. SILK assumes no prior knowledge or annotation is provided, and conducts a customized random walk to encode the contexts of the heterogeneous graph of interest. Specifically, this is achieved via maintaining a dynamically-updated <italic>guidance matrix</italic> that records the node-conditioned transition potentials. Experimental results on four real-world datasets demonstrate that SILK significantly outperforms state-of-the-art methods.
Understanding WeChat User Preferences and &#x201C;Wow&#x201D; Diffusion
Fanjin ZhangJie TangXueyi LiuZhenyu HouYuxiao DongJing ZhangXiao LiuRuobing XieKai ZhuangXu ZhangLeyu LinPhilip S. Yu
Keywords:Social networking (online)Message serviceKnowledge engineeringComputer scienceTerminologyInstant messagingIEEE FellowsSocial networkssocial influenceinformation diffusionuser behavioruser modeling
Abstracts:WeChat is the largest social instant messaging platform in China, with 1.1 billion monthly active users. &#x201C;Top Stories&#x201D; is a novel friend-enhanced recommendation engine in WeChat, in which users can read articles based on preferences of both their own and their friends. Specifically, when a user reads an article by opening it, the &#x201C;click&#x201D; behavior is private. Moreover, if the user clicks the &#x201C;wow&#x201D; button, (only) her/his direct connections will be aware of this action/preference. Based on the unique WeChat data, we aim to understand user preferences and &#x201C;wow&#x201D; diffusion in Top Stories at different levels. We have made some interesting discoveries. For instance, the &#x201C;wow&#x201D; probability of one user is negatively correlated with the number of connected components that are formed by her/his active friends, but the click probability is the opposite. We further study to what extent users&#x2019; &#x201C;wow&#x201D; and click behavior can be predicted from their social connections. To address this problem, we present a hierarchical graph representation learning based model DiffuseGNN, which is capable of capturing the structure-based social observations discovered above. Our experiments show that the proposed method can significantly improve the prediction performance compared with alternative methods.
Toward Tweet Entity Linking With Heterogeneous Information Networks
Wei ShenYuwei YinYang YangJiawei HanJianyong WangXiaojie Yuan
Keywords:Joining processesTask analysisInternetEncyclopediasElectronic publishingBlogsSocial networking (online)Tweet entity linkingheterogeneous information networksiterative clustering
Abstracts:Twitter, a microblogging platform, has developed into an increasingly invaluable information source, where millions of users post a great quantity of tweets with various topics per day. Heterogeneous information networks consisting of multi-type objects and relations are becoming more and more prevalent as an organization form of knowledge and information. The task of linking an entity mention in a tweet with its corresponding entity in a heterogeneous information network is of great importance, for the purpose of enriching heterogeneous information networks with the abundant and fresh knowledge embedded in tweets. However, the entity mention is ambiguous. Additionally, tweets are short and informal, making it difficult to mine enough information from a single tweet for entity linking. In this paper, we propose an unsupervised iterative clustering framework TELHIN to link multiple similar tweets with a heterogeneous information network jointly. Our framework takes three dimensions of tweet similarity into consideration: (1) content similarity, (2) temporal similarity, and (3) user similarity. The appropriate weights of different similarity dimensions for each entity mention are learned iteratively based on the metric learning algorithm by leveraging the pairwise constraints generated automatically. Experiments on real data demonstrate the effectiveness of our framework in comparison with the baselines.
Self-Propagation Graph Neural Network for Recommendation
Wenhui YuXiao LinJinfei LiuJunfeng GeWenwu OuZheng Qin
Keywords:Task analysisFeature extractionCollaborationBipartite graphToolsRecommender systemsGraph neural networksGraph neural networkself propagationcollaborative filteringitem recommendation
Abstracts:In recommendation tasks, we model user preferences by learning node representations (i.e., user and item embeddings) based on the observed user-item interaction data, which is a bipartite graph. <bold>G</bold>raph <bold>N</bold>eural <bold>N</bold>etworks (<bold>GNN</bold>s) are widely used to refine the representations by exploring the topology of the graph: embeddings of neighbors are propagated to each node to reconstruct its embeddings. However, the propagation strategy in existing GNNs is empirical and defective: (1) a substantial proportion of links are missed in the sparse observed graph, which causes ineffective and biased propagation; and (2) the propagation weights are determined by a coarse pre-defined rule, which only takes the degree of nodes into consideration. In this paper, we propose a dense and data-driven propagation mechanism for GNNs. Considering the graph we use to propagate embeddings in recommendation tasks is extremely sparse, we complement it and use the predicted graph as the new propagation tool. We learn the propagation matrix from the data, and propose a <bold>S</bold>elf-propagation <bold>G</bold>raph <bold>N</bold>eural <bold>N</bold>etwork (<bold>SGNN</bold>). Since it is very space- and time-consuming to maintain a large and dense propagation matrix, we factorize it for storing and updating. In this paper, we propose three methods to complete the sparse graph and construct the propagation matrix: (1) we complete the graph based on a recommendation model; (2) we measure the node distance based on spectral clustering; (3) we predict missing links of the graph based on predictive embeddings. In SGNN, the embeddings can be propagated to not only the observed neighbors, but also the potential yet unobserved neighbors, and the propagation weights are learned based on the connection strength. Comprehensive experiments on three real-world datasets demonstrate the effectiveness and efficiency of our propo- ed model: SGNN outperforms recent state-of-the-art GNNs significantly. Codes are available on <uri>https://github.com/Wenhui-Yu/LCFN</uri>.
Scalable Label Propagation for Multi-Relational Learning on the Tensor Product of Graphs
Zhuliu LiRaphael PetegrossoShaden SmithDavid SterlingGeorge KarypisRui Kuang
Keywords:TensorsHypertext systemsTask analysisScalabilityPrediction algorithmsOptimizationApproximation algorithmsMulti-relational learningtensor product graphlabel propagationhyperlink predictionmultiple graph alignmenttensor decomposition and completion
Abstracts:Multi-relational learning on knowledge graphs infers high-order relations among the entities across the graphs. This learning task can be solved by label propagation on the tensor product of the knowledge graphs to learn the high-order relations as a tensor. In this paper, we generalize a widely used label propagation model to the normalized tensor product graph, and propose an optimization formulation and the scalable Low-rank Tensor-based Label Propagation algorithm (LowrankTLP) to infer multi-relations for two learning tasks, hyperlink prediction and multiple graph alignment. The optimization formulation minimizes the upper bound of the noisy-tensor estimating error for multiple graph alignment, by learning with a subset of the eigen-pairs in the spectrum of the normalized tensor product graph. We also provide a data-dependent transductive Rademacher bound for binary hyperlink prediction. We accelerate LowrankTLP with parallel tensor computation which enables label propagation on a tensor product of 100 graphs each of size 1000 in less than half hour in the simulation. LowrankTLP was also applied to predicting the author-paper-venue hyperlinks in publication records, alignment of segmented regions across up to 26 CT-scan images and alignment of protein-protein interaction networks across multiple species. The experiments demonstrate that LowrankTLP indeed well approximates the original label propagation with better scalability and accuracy.
Ranking-Based Implicit Regularization for One-Class Collaborative Filtering
Defu LianJin ChenKai ZhengEnhong ChenXiaofang Zhou
Keywords:OptimizationRecommender systemsCollaborationTime complexityComputer scienceTask analysisStochastic processesOne-class collaborative filteringimplicit regularizationitem recommendationcoordinate descentalternating least square
Abstracts:One-class collaborative filtering (OCCF) problems are ubiquitous in real-world recommendation systems, such as news recommendation, but suffer from data sparsity and lack of negative items. To address the challenge, the state-of-the-art algorithm assigns uninteracted items with smaller weights of being negative and performs low-rank approximation over the user-item interaction matrix. However, the prior ratings are usually suggested to be zero but may not be well-defined. To avert the direct utilization of prior ratings for uninteracted items, we propose a novel ranking-based implicit regularizer by hypothesizing that users&#x2019; preference scores for uninteracted items should not deviate a lot from each other. The regularizer is then used in a ranking-based OCCF framework to penalize large differences of preference scores between uninteracted items. To efficiently optimize model parameters in this framework, we develop the scalable alternating least square algorithm and coordinate descent algorithm, whose time complexity is linearly proportional to the data size. Finally, we extensively evaluate the proposed algorithms on six public real-world datasets. The results show that the proposed regularizer significantly improves the recommendation quality of ranking-based OCCF algorithms, such as BPRMF and RankALS. Moreover, the ranking-based framework with the proposed regularizer outperforms the state-of-the-art recommendation algorithms for implicit feedback.
Possibilistic Data Cleaning
Henning KoehlerSebastian Link
Keywords:CleaningUncertaintyMaintenance engineeringPossibility theoryData modelsSemanticsRelational databasesAlgorithmconstraintdata cleaningdatabasefixed-parameter tractableintractabilitypossibility theoryvertex cover
Abstracts:Classical data cleaning performs a minimal set of operations on the data to satisfy the given integrity constraints. Often, this minimization is equivalent to vertex cover, for example when tuples can be removed due to the violation of functional dependencies. Classically, the uncertainty of tuples and constraints is ignored. We propose not to view data as dirty but the uncertainty information about data. Since probabilities are often unavailable and their treatment is limited due to correlations in the data, we investigate a qualitative approach to uncertainty. Tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Our approach is non-invasive to the data as we lower the possibility degree of tuples as little as possible. The new resulting qualitative version of vertex cover remains NP-hard. We establish an algorithm that is fixed-parameter tractable in the size of the qualitative vertex cover. Experiments with synthetic and real-world data show that our algorithm outperforms the classical algorithm proportionally to the available number of uncertainty degrees. By mining the certainty degrees with which constraints hold, our framework becomes applicable even when uncertainty information is unavailable.
Point-of-Interest Recommendation for Users-Businesses With Uncertain Check-ins
Zhu SunChen LiYu LeiLu ZhangJie ZhangShunpan Liang
Keywords:BusinessUrban areasSocial networking (online)SunData analysisCorrelationWeb searchNext point-of-interest recommendationuncertain check-inscollective POIsliving environmentneural networks
Abstracts:Most existing studies on next point-of-interest (POI) recommendation assume that users deliver certain check-ins over individual POIs. In reality, we typically obtain uncertain check-ins due to the presence of collective POIs, which are gathering places of multiple individual POIs (e.g., shopping malls). On one hand, such uncertain check-ins over collective POIs hinder more accurate next POI recommendation for users due to the transition vanishing issue; on the other hand, the presence of collective POIs poses the challenge for businesses to select which collective POIs to locate in due to complicated competition and cooperation relations between businesses. As such, these collective POIs bring an unprecedented opportunity and necessity on recommendation for both users and businesses. Therefore, we propose novel solutions of location service beneficial for users-businesses. For users, we propose the STSP equipped with category- and location-aware encoders, to deliver more accurate next POI prediction by fusing rich context features. Regarding businesses, we explore their competition and cooperation relations from check-in records, based on which we derive the <italic>living environment</italic> (LE) of a business. Insight on site selection for businesses is provided by exploiting the LE, aiming to bring in more profits. Extensive empirical studies demonstrate the efficiency of our solutions.
Personalized Route Recommendation With Neural Network Enhanced Search Algorithm<inline-formula><tex-math notation="LaTeX"/><alternatives><mml:math><mml:msup><mml:mi/><mml:mo/></mml:msup></mml:math><inline-graphic xlink:href="wang-ieq1-3068479.gif"/></alternatives></inline-formula>
Jingyuan WangNing WuWayne Xin Zhao
Keywords:Task analysisTrajectoryRoadsHeuristic algorithmsCost functionDeep learningRecurrent neural networksRoute recommendation $A^\star$ A ★ searchgraph neural networksattentiondeep learning
Abstracts:In this work, we study an important task in location-based services, namely <italic>Personalized Route Recommendation (PRR)</italic>. Given a road network, the PRR task aims to generate user-specific route suggestions for replying to users&#x2019; route queries. A classic approach is to adapt search algorithms to construct pathfinding-like solutions. These methods typically focus on reducing search space with suitable heuristic strategies. For these search algorithms, heuristic strategies are often handcrafted, which are not flexible to work in complicated task settings. In addition, it is difficult to utilize useful context information in the search procedure. To develop a more principled solution to the PRR task, we propose to improve search algorithms with neural networks for solving the PRR task based on the widely used <inline-formula><tex-math notation="LaTeX">$A^{*}$</tex-math><alternatives><mml:math><mml:msup><mml:mi>A</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:math><inline-graphic xlink:href="wang-ieq2-3068479.gif"/></alternatives></inline-formula> algorithm. The main idea of our solution is to automatically learn the cost functions in <inline-formula><tex-math notation="LaTeX">$A^{*}$</tex-math><alternatives><mml:math><mml:msup><mml:mi>A</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:math><inline-graphic xlink:href="wang-ieq3-3068479.gif"/></alternatives></inline-formula> algorithms, which is the key of heuristic search algorithms. Our model consists of two main components. First, we employ attention-based Recurrent Neural Networks (RNN) to model the cost from the source to the candidate location by incorporating useful context information. Instead of learning a single cost value, the RNN component is able to learn a time-varying vectorized representation for the moving state of a user. Second, we propose to use an estimation network for predicting the cost from a candidate location to the destination. For capturing structural characteristics, the- estimation network is built on top of position-aware graph attention networks. The two components are integrated in a principled way for deriving a more accurate cost of a candidate location for the <inline-formula><tex-math notation="LaTeX">$A^{*}$</tex-math><alternatives><mml:math><mml:msup><mml:mi>A</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:math><inline-graphic xlink:href="wang-ieq4-3068479.gif"/></alternatives></inline-formula> algorithm. Extensive experiment results on three real-world datasets have shown the effectiveness and robustness of the proposed model.
Periodic Weather-Aware LSTM With Event Mechanism for Parking Behavior Prediction
Feng ZhangYani LiuNingxuan FengCheng YangJidong ZhaiShuhao ZhangBingsheng HeJiazao LinXiao ZhangXiaoyong Du
Keywords:MeteorologyPredictive modelsCOVID-19Logic gatesUrban areasRecurrent neural networksMathematical modelPeriodicweather-awareLSTMevent mechanismparking behavior predictionCOVID-19
Abstracts:There are plenty of parking spaces in big cities, but we often find nowhere to park. For example, New York has 1.4 million cars and 4.4 million on-street parking spaces, but it is still not easy to find a parking place near our destination, especially during peak hours. The reason is the lack of prediction of parking behavior. If we could provide parking behavior in advance, we can ease this parking problem that affects human well-being. We observe that parking lots have periodic parking patterns, which is an important factor for parking behavior prediction. Unfortunately, existing work ignores such periodic parking patterns in parking behavior prediction, and thus incurs low accuracy. To solve this problem, we propose PewLSTM, a novel periodic weather-aware LSTM model that successfully predicts the parking behavior based on historical records, weather, environments, weekdays, and events. PewLSTM includes a periodic weather-aware LSTM prediction module and an event prediction module, for predicting parking behaviors in regular days and events. PewLSTM is extremely useful for drivers and parking lot owners to improve customer experience. For example, the probability of parking space that will be available soon can be provided even if the parking lot is full. Based on 910,477 real parking records in 904 days from 13 parking lots, PewLSTM yields 93.84&#x0025; parking prediction accuracy, which is about 30&#x0025; higher than the state-of-the-art parking behavior prediction method. Additionally, we have analyzed parking behaviors in events like holidays and COVID-19. PewLSTM can handle parking behavior prediction in events and reaches 90.68 percent accuracy.
Hot Journals