Affiliated with RCfalse
Multi-view self-attention networks
Xu, Mingzhou1; Yang, Baosong2; Wong, Derek F.1; Chao, Lidia S.1
Source PublicationKnowledge-Based Systems

Self-attention networks (SANs) have attracted an amount of research attention for their outstanding performance under the machine translation community. Recent studies proved that SANs can be further improved by exploiting different inductive biases, each of which guides SANs to learn a specific view of the input sentence, e.g., short-term dependencies, forward and backward views, as well as phrasal patterns. However, less studies investigate how these inductive techniques complementarily improve the capability of SANs and this would be an interesting question to be answered. In this paper we selected five inductive biases which are simple and not over parameterized to investigate their complementarily. We further propose multi-view self-attention networks, which jointly learn different linguistic aspects of the input sentence under a unified framework. Specifically, we propose and exploit a variety of inductive biases to regularize the conventional attention distribution. Different views are then aggregated by a hybrid attention mechanism to quantify and leverage the specific views and their associated representation conveniently. Experiments on various translation tasks demonstrate that different views are able to progressively improve the performance of SANs, and the proposed approach outperforms both the strong TRANSFORMER baseline and related models on TRANSFORMER-BASE and TRANSFORMER-BIG settings. Extensive analyses on 10 linguistic probing tasks verify that different views indeed tend to extract distinct linguistic features and our method gives highly effective improvements in their integration.

KeywordLinguistics Machine Translation Multi-head Attention Mechanism Multi-pattern Self-attention Mechanism
URLView the original
Indexed BySCIE
WOS Research AreaImmunology
WOS SubjectImmunology
WOS IDWOS:000792018000001
Scopus ID2-s2.0-85124215000
Fulltext Access
Citation statistics
Cited Times [WOS]:1   [WOS Record]     [Related Records in WOS]
Document TypeJournal article
Corresponding AuthorWong, Derek F.
Affiliation1.NLPCT Laboratory, University of Macau, Macau, China
2.Alibaba Group, Hangzhou, China
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Xu, Mingzhou,Yang, Baosong,Wong, Derek F.,et al. Multi-view self-attention networks[J]. Knowledge-Based Systems,2022,241.
APA Xu, Mingzhou,Yang, Baosong,Wong, Derek F.,&Chao, Lidia S..(2022).Multi-view self-attention networks.Knowledge-Based Systems,241.
MLA Xu, Mingzhou,et al."Multi-view self-attention networks".Knowledge-Based Systems 241(2022).
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Xu, Mingzhou]'s Articles
[Yang, Baosong]'s Articles
[Wong, Derek F.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Xu, Mingzhou]'s Articles
[Yang, Baosong]'s Articles
[Wong, Derek F.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Xu, Mingzhou]'s Articles
[Yang, Baosong]'s Articles
[Wong, Derek F.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.