Welcome to the IKCEST
Journal
IEEE Transactions on Image Processing

IEEE Transactions on Image Processing

Archives Papers: 1,386
IEEE Xplore
Please choose volume & issue:
Multi-View Saliency-Guided Clustering for Image Cosegmentation
Zhiqiang TaoHongfu LiuHuazhu FuYun Fu
Keywords:Clustering algorithmsRobustnessTask analysisPartitioning algorithmsOptimizationComputational modelingVisualizationImage cosegmentationconstrained clusteringsaliency priorcosine similaritymulti-view learning
Abstracts:Image cosegmentation aims at extracting the common objects from multiple images simultaneously. Existing methods mainly solve cosegmentation via the pre-defined graph, which lacks flexibility and robustness to handle various visual patterns. Besides, similar backgrounds also confuse the identification of the common foreground. To address these issues, we propose a novel multi-view saliency-guided clustering algorithm (MvSGC) for the image cosegmentation task. In our model, the unsupervised saliency prior is used as partition-level side information to guide the foreground clustering process. To achieve robustness to noises and missing observations, similarities on an instance-level and the partition-level are both considered. Specifically, a unified clustering model with cosine similarity is proposed to capture the intrinsic structure of data and keep the partition result consistent with the side information. Moreover, we leverage multi-view weight learning to integrate multiple feature representations to further improve the robustness of our approach. A <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-means-like optimization algorithm is developed to proceed the constrained clustering in a highly efficient way with theoretical support. The experimental results on three benchmark datasets (i.e., the iCoseg, MSRC, and Internet image dataset) and one RGB-D image dataset demonstrate the superiority of applying our clustering method for image cosegmentation.
Discrete Multi-Graph Clustering
Minnan LuoCaixia YanQinghua ZhengXiaojun ChangLing ChenFeiping Nie
Keywords:Clustering algorithmsOptimizationLinear programmingPartitioning algorithmsImage segmentationTask analysisNP-hard problemSpectral clusteringmultiple feature learningdiscrete graph clusteringimage segmentation
Abstracts:Spectral clustering plays a significant role in applications that rely on multi-view data due to its well-defined mathematical framework and excellent performance on arbitrarily-shaped clusters. Unfortunately, directly optimizing the spectral clustering inevitably results in an NP-hard problem due to the discrete constraints on the clustering labels. Hence, conventional approaches intuitively include a relax-and-discretize strategy to approximate the original solution. However, there are no principles in this strategy that prevent the possibility of information loss between each stage of the process. This uncertainty is aggravated when a procedure of heterogeneous features fusion has to be included in multi-view spectral clustering. In this paper, we avoid an NP-hard optimization problem and develop a general framework for multi-view discrete graph clustering by directly learning a consensus partition across multiple views, instead of using the relax-and-discretize strategy. An effective re-weighting optimization algorithm is exploited to solve the proposed challenging problem. Further, we provide a theoretical analysis of the model&#x2019;s convergence properties and computational complexity for the proposed algorithm. Extensive experiments on several benchmark datasets verify the effectiveness and superiority of the proposed algorithm on clustering and image segmentation tasks.
Adaptive Transform Domain Image Super-Resolution via Orthogonally Regularized Deep Networks
Tiantong GuoHojjat Seyed MousaviVishal Monga
Keywords:Discrete cosine transformsTrainingSpatial resolutionDeep learningDictionariesDeep learningsuper-resolutionimage transform domainorthogonality constraintcomplexity constraint
Abstracts:Deep learning methods, in particular, trained convolutional neural networks (CNNs) have recently been shown to produce compelling results for single image super-resolution (SR). Invariably, a CNN is learned to map the low resolution (LR) image to its corresponding high resolution (HR) version in the spatial domain. We propose a novel network structure for learning the SR mapping function in an image transform domain, specifically the discrete cosine transform (DCT). As the first contribution, we show that DCT can be integrated into the network structure as a convolutional DCT (CDCT) layer. With the CDCT layer, we construct the DCT deep SR (DCT-DSR) network. We further extend the DCT-DSR to allow the CDCT layer to become <italic>trainable (i.e., optimizable)</italic>. Because this layer represents an image transform, we enforce pairwise orthogonality constraints and newly formulated <italic>complexity order</italic> constraints on the individual basis functions/filters. This orthogonally regularized deep SR network (ORDSR) simplifies the SR task by taking advantage of image transform domain while adapting the design of transform basis to the training image set. The experimental results show ORDSR achieves state-of-the-art SR image quality with fewer parameters than most of the deep CNN methods. A particular success of ORDSR is in overcoming the artifacts introduced by bicubic interpolation. A key burden of deep SR has been identified as the requirement of generous training LR and HR image pairs; ORSDR exhibits a much more graceful degradation as training size is reduced with significant benefits in the regime of limited training. Analysis of memory and computation requirements confirms that ORDSR can allow for a more efficient network with faster inference.
Discriminative Feature Learning With Foreground Attention for Person Re-Identification
Sanping ZhouJinjun WangDeyu MengYudong LiangYihong GongNanning Zheng
Keywords:Feature extractionMeasurementNeural networksLearning systemsDecodingTrainingTask analysisPerson re-identificationconvolutional neural network (CNN)foreground attentive feature learning
Abstracts:The performance of person re-identification (Re-ID) has been seriously affected by the large cross-view appearance variations caused by mutual occlusions and background clutter. Hence, learning a feature representation that can adaptively emphasize the foreground persons becomes very critical to solve the person Re-ID problem. In this paper, we propose a simple yet effective foreground attentive neural network (FANN) to learn a discriminative feature representation for person Re-ID, which can adaptively enhance the positive side of foreground and weaken the negative side of background. Specifically, a novel foreground attentive subnetwork is designed to drive the network&#x2019;s attention, in which a decoder network is used to reconstruct the binary mask by using a novel local regression loss function, and an encoder network is regularized by the decoder network to focus its attention on the foreground persons. The resulting feature maps of encoder network are further fed into the body part subnetwork and feature fusion subnetwork to learn discriminative features. Besides, a novel symmetric triplet loss function is introduced to supervise feature learning, in which the intra-class distance is minimized and the inter-class distance is maximized in each triplet unit, simultaneously. Training our FANN in a multi-task learning framework, a discriminative feature representation can be learned to find out the matched reference to each probe among various candidates in the gallery. Extensive experimental results on several public benchmark datasets are evaluated, which have shown clear improvements of our method over the state-of-the-art approaches.
Accurate Facial Image Parsing at Real-Time Speed
Zhen WeiSi LiuYao SunHefei Ling
Keywords:FaceTask analysisDeep learningReal-time systemsKnowledge engineeringTrainingHairFace parsingreceptive fieldmetrics learningdistillationdeep learning
Abstracts:In this paper, we propose a design scheme for deep learning networks in the face parsing task with promising accuracy and real-time inference speed. By analyzing the differences between the general image parsing task and face parsing task, we first revisit the structure of traditional FCN and make improvements to adapt to the unique properties of the face parsing task. Especially, the concept of <italic>Normalized Receptive Field</italic> is proposed to give more insights on designing the network. Then, a novel loss function called <italic>Statistical Contextual Loss</italic> is introduced, which integrates richer contextual information and regularizes features during training. For further model acceleration, we propose a semi-supervised distillation scheme that effectively transfers the learned knowledge to a lighter network. Extensive experiments on LFW and Helen dataset demonstrate the significant superiority of the new design scheme on both efficacy and efficiency.
Deep Manifold Structure Transfer for Action Recognition
Ce LiBaochang ZhangChen ChenQixiang YeJungong HanGuodong GuoRongrong Ji
Keywords:ManifoldsThree-dimensional displaysDeep learningData structuresStreaming mediaBackpropagationAction recognitionmanifoldalternating direction method of multipliersbackward propagationADMM-BP
Abstracts:While intrinsic data structure in subspace provides useful information for visual recognition, it has not yet been well studied in deep feature learning for action recognition. In this paper, we introduce a new spatio-temporal manifold network (STMN) that leverages data manifold structures to regularize deep action feature learning, aiming at simultaneously minimizing the intra-class variations of learned deep features and alleviating the over-fitting problem. To this end, the manifold prior is imposed from the top layer of a convolutional neural network (CNN) and propagated across convolutional layers during forward&#x2013;backward propagation. The observed correspondence of manifold structures in the data space and feature space validates that the manifold priori can be transferred across the CNN layers. The STMN theoretically recasts the problem of transferring the data structure prior into the deep learning architectures as a projection over the manifold via an embedding method, which can be easily solved by an alternating direction method of multipliers and backward propagation (ADMM-BP) algorithm. The STMN is generic in the sense that it can be plugged into various backbone architectures to learn more discriminative representation for action recognition. The extensive experimental results show that our method achieves comparable or even better performance compared with the state-of-the-art approaches on four benchmark datasets.
Transfer Neural Trees: Semi-Supervised Heterogeneous Domain Adaptation and Beyond
Wei-Yu ChenTzu-Ming Harry HsuYao-Hung Hubert TsaiMing-Syan ChenYu-Chiang Frank Wang
Keywords:Task analysisArtificial neural networksDeep learningForestryTrainingBiological neural networksTransfer learningdomain adaptationneural decision forestneural networkzero-shot learning
Abstracts:Heterogeneous domain adaptation (HDA) addresses the task of associating data not only across dissimilar domains but also described by different types of features. Inspired by the recent advances of neural networks and deep learning, we propose a deep leaning model of transfer neural trees (TNT), which jointly solves cross-domain feature mapping, adaptation, and classification in a unified architecture. As the prediction layer in TNT, we introduce transfer neural decision forest (transfer-NDF), which is able to learn the neurons in TNT for adaptation by stochastic pruning. In order to handle semi-supervised HDA, a unique embedding loss term is introduced to TNT for preserving prediction and structural consistency between labeled and unlabeled target-domain data. Furthermore, we show that our TNT can be extended to zero shot learning for associating image and attribute data with promising performance. Finally, experiments on different classification tasks across features, datasets, and modalities would verify the effectiveness of our TNT.
Automatic Example-Based Image Colorization Using Location-Aware Cross-Scale Matching
Bo LiYu-Kun LaiMatthew JohnPaul L. Rosin
Keywords:Image color analysisImage edge detectionGray-scaleOptimizationSemanticsImage segmentationFeature extractionImage colorizationcross-scale texture matchinglocation statisticsgraph cutsparseedge preserving
Abstracts:Given a reference color image and a destination grayscale image, this paper presents a novel automatic colorization algorithm that transfers color information from the reference image to the destination image. Since the reference and destination images may contain content at different or even varying scales (due to changes of distance between objects and the camera), existing texture matching-based methods can often perform poorly. We propose a novel cross-scale texture matching method to improve the robustness and quality of the colorization results. Suitable matching scales are considered locally, which are then fused using global optimization that minimizes both the matching errors and spatial change of scales. The minimization is efficiently solved using a multi-label graph-cut algorithm. Since only low-level texture features are used, texture matching-based colorization can still produce semantically incorrect results, such as meadow appearing above the sky. We consider a class of semantic violation where the statistics of up-down relationships learned from the reference image are violated and propose an effective method to identify and correct unreasonable colorization. Finally, a novel nonlocal <inline-formula> <tex-math notation="LaTeX">$ell _{1}$ </tex-math></inline-formula> optimization framework is developed to propagate high confidence micro-scribbles to regions of lower confidence to produce a fully colorized image. Qualitative and quantitative evaluations show that our method outperforms several state-of-the-art methods.
Multi-Level Semantic Feature Augmentation for One-Shot Learning
Zitian ChenYanwei FuYinda ZhangYu-Gang JiangXiangyang XueLeonid Sigal
Keywords:SemanticsTrainingVisualizationDecodingTask analysisTraining dataManifoldsOne-shot learningfeature augmentation
Abstracts:The ability to quickly recognize and learn new visual concepts from limited samples enable humans to quickly adapt to new tasks and environments. This ability is enabled by the semantic association of novel concepts with those that have already been learned and stored in memory. Computers can start to ascertain similar abilities by utilizing a semantic concept space. A concept space is a high-dimensional semantic space in which similar abstract concepts appear close and dissimilar ones far apart. In this paper, we propose a novel approach to one-shot learning that builds on this core idea. Our approach learns to map a novel sample instance to a concept, relates that concept to the existing ones in the concept space and, using these relationships, generates new instances, by interpolating among the concepts, to help learning. Instead of synthesizing new image instance, we propose to directly synthesize instance features by leveraging semantics using a novel auto-encoder network called <italic>dual TriNet</italic>. The encoder part of the TriNet learns to map multi-layer visual features from CNN to a semantic vector. In semantic space, we search for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet. Two strategies in the semantic space are explored. Notably, this seemingly simple strategy results in complex augmented feature distributions in the image feature space, leading to substantially better performance.
Saliency Inside: Learning Attentive CNNs for Content-Based Image Retrieval
Shikui WeiLixin LiaoJia LiQinjie ZhengFei YangYao Zhao
Keywords:Feature extractionImage retrievalVisualizationSemanticsStreaming mediaTask analysisReliabilityVisual saliencycontent-based image retrievalbag-of-wordconvolutional neural networks
Abstracts:In content-based image retrieval (CBIR), one of the most challenging and ambiguous tasks is to correctly understand the human query intention and measure its semantic relevance with images in the database. Due to the impressive capability of visual saliency in predicting human visual attention that is closely related to the query intention, this paper attempts to explicitly discover the essential effect of visual saliency in CBIR via qualitative and quantitative experiments. Toward this end, we first generate the fixation density maps of images from a widely used CBIR dataset by using an eye-tracking apparatus. These ground-truth saliency maps are then used to measure the influence of visual saliency to the task of CBIR by exploring several probable ways of incorporating such saliency cues into the retrieval process. We find that visual saliency is indeed beneficial to the CBIR task, and the best saliency involving scheme is possibly different for different image retrieval models. Inspired by the findings, this paper presents two-stream attentive convolutional neural networks (CNNs) with saliency embedded inside for CBIR. The proposed network has two streams that simultaneously handle two tasks. The main stream focuses on extracting discriminative visual features that are tightly related to semantic attributes. Meanwhile, the auxiliary stream aims to facilitate the main stream by redirecting the feature extraction to the salient image content that a human may pay attention to. By fusing these two streams into the Main and Auxiliary CNNs (MAC), image similarity can be computed as the human being does by reserving conspicuous content and suppressing irrelevant regions. Extensive experiments show that the proposed model achieves impressive performance in image retrieval on four public datasets.
Hot Journals