Welcome to the IKCEST
Journal
IEEE/ACM Transactions on Audio, Speech, and Language Processing

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Archives Papers: 757
IEEE Xplore
Please choose volume & issue:
Relation Classification via Keyword-Attentive Sentence Mechanism and Synthetic Stimulation Loss
Luoqin LiJiabing WangJichang LiQianli MaJia Wei
Keywords:Feature extractionSemanticsTask analysisSyntacticsAdaptation modelsNeural networksKernelRelation classificationattention mechanismloss functionbidirectional gated recurrent unitmutual learningimbalanced classificationshortest dependency path
Abstracts:Previous studies have shown that attention mechanisms and shortest dependency paths have a positive effect on relation classification. In this paper, a keyword-attentive sentence mechanism is proposed to effectively combine the two methods. Furthermore, to effectively handle the imbalanced classification problem, this paper proposes a new loss function called the <italic>synthetic stimulation loss</italic>, which uses a modulating factor to allow the model to focus on hard-to-classify samples. The proposed two methods are integrated into a bidirectional gated recurrent unit (BiGRU). As a single model is not strong in noise immunity, this paper applies the mutual learning method to our model and forces the networks to teach each other. Therefore, we call the final model <italic>SSL-KAS-MuBiGRU</italic>. Experiments on the SemEval-2010 Task 8 data set and the TAC40 data set demonstrate that the keyword-attentive sentence mechanism and synthetic stimulation loss are useful for relation classification, and our model achieves state-of-the-art results.
AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning
Lu ChenZhi ChenBowen TanSishan LongMilica GašićKai Yu
Keywords:Reinforcement learningTask analysisNeural networksOptimizationOntologiesComputational modelingTrainingDialogue policydeep reinforcement learninggraph neural networkspolicy adaptationtransfer learning
Abstracts:Dialogue policy plays an important role in task-oriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been used for policy optimization. However, these deep models are still challenging for two reasons: first, many DRL-based policies are not sample efficient; and second, most models do not have the capability of policy transfer between different domains. In this paper, we propose a universal framework, <italic>AgentGraph</italic>, to tackle these two problems. The proposed AgentGraph is the combination of graph neural network (GNN) based architecture and DRL-based algorithm. It can be regarded as one of the multi-agent reinforcement learning approaches. Each agent corresponds to a node in a graph, which is defined according to the dialogue domain ontology. When making a decision, each agent can communicate with its neighbors on the graph. Under AgentGraph framework, we further propose dual GNN-based dialogue policy, which implicitly decomposes the decision in each turn into a high-level global decision and a low-level local decision. Experiments show that AgentGraph models significantly outperform traditional reinforcement learning approaches on most of the 18 tasks of the PyDial benchmark. Moreover, when transferred from the source task to a target task, these models not only have acceptable initial performance but also converge much faster on the target task.
Multichannel Online Dereverberation Based on Spectral Magnitude Inverse Filtering
Xiaofei LiLaurent GirinSharon GannotRadu Horaud
Keywords:ConvolutionMicrophonesAdaptation modelsSpeech processingReverberationIndexesTime-domain analysisOnline speech dereverberationchannel identificationmultichannel equalizationinverse filtering
Abstracts:This paper addresses the problem of multichannel online dereverberation. The proposed method is carried out in the short-time Fourier transform (STFT) domain, and for each frequency band independently. In the STFT domain, the time-domain room impulse response is approximately represented by the convolutive transfer function (CTF). The multichannel CTFs are adaptively identified based on the cross-relation method, and using the recursive least square criterion. Instead of the complex-valued CTF convolution model, we use a nonnegative convolution model between the STFT magnitude of the source signal and the CTF magnitude, which is just a coarse approximation of the former model, but is shown to be more robust against the CTF perturbations. Based on this nonnegative model, we propose an online STFT magnitude inverse filtering method. The inverse filters of the CTF magnitude are formulated based on the multiple-input/output inverse theorem, and adaptively estimated based on the gradient descent criterion. Finally, the inverse filtering is applied to the STFT magnitude of the microphone signals, obtaining an estimate of the STFT magnitude of the source signal. Experiments regarding both speech enhancement and automatic speech recognition are conducted, which demonstrate that the proposed method can effectively suppress reverberation, even for the difficult case of a moving speaker.
Methods of Extending a Generalized Sidelobe Canceller With External Microphones
Randall AliGiuliano BernardiToon van WaterschootMarc Moonen
Keywords:Noise reductionMicrophone arraysSpeech enhancementWireless sensor networksWireless communicationMulti-Microphone Noise ReductionSpeech EnhancementExternal MicrophoneGSCbeamforming
Abstracts:While substantial noise reduction and speech enhancement can be achieved with multiple microphones organized in an array, in some cases, such as when the microphone spacings are quite close, it can also be quite limited. This degradation can, however, be resolved by the introduction of one or more external microphones (<inline-formula><tex-math notation="LaTeX">$text{XM}$</tex-math></inline-formula>s) into the same physical space as the local microphone array (<inline-formula><tex-math notation="LaTeX">$text{LMA}$</tex-math></inline-formula>). In this paper, three methods of extending an <inline-formula><tex-math notation="LaTeX">$text{LMA}$</tex-math></inline-formula>-based generalized sidelobe canceller (<inline-formula><tex-math notation="LaTeX">$text{GSC-LMA}$</tex-math></inline-formula>) with multiple <inline-formula><tex-math notation="LaTeX">$text{XM}$</tex-math></inline-formula>s are proposed in such a manner that the relative transfer function pertaining to the <inline-formula><tex-math notation="LaTeX">$text{LMA}$</tex-math></inline-formula> is treated as <italic>a priori</italic> knowledge. Two of these methods involve a procedure for completing an extended blocking matrix, whereas the third uses the speech estimate from the <inline-formula><tex-math notation="LaTeX">$text{GSC-LMA}$</tex-math></inline-formula> directly with an orthogonalized version of the <inline-formula><tex-math notation="LaTeX">$text{XM}$</tex-math></inline-formula> signals to obtain an improved speech estimate via a rank-1 generalized eigenvalue decomposition. All three methods were evaluated with recorded data from an office room and it was found that the third method could offer the most improvement. It was also shown that in using this method, the speech estimate from the <inline-formula><tex-math notation="LaTeX">$text{GSC-LMA}$</tex-math></inline-formula> was not compromised and would be available to the listener if so desired, along with the improved speech estimate t- at uses both the <inline-formula><tex-math notation="LaTeX">$text{LMA}$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$text{XM}$</tex-math></inline-formula>s.
Hot Journals