UM
Status即將出版Forthcoming
Inductive Document Representation Learning for Short Text Clustering
Chen, Junyang1; Gong, Zhiguo1; Wang, Wei1; Dong, Xiao2; Wang, Wei3; Liu, Weiwen4; Wang, Cong3; Chen, Xian5
2021
Source PublicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12459 LNAI
Pages600-616
AbstractShort text clustering (STC) is an important task that can discover topics or groups in the fast-growing social networks, e.g., Tweets and Google News. Different from the long texts, STC is more challenging since the word co-occurrence patterns presented in short texts usually make the traditional methods (e.g., TF-IDF) suffer from a sparsity problem of inevitably generating sparse representations. Moreover, these learned representations may lead to the inferior performance of clustering which essentially relies on calculating the distances between the presentations. For alleviating this problem, recent studies are mostly committed to developing representation learning approaches to learn compact low-dimensional embeddings, while most of them, including probabilistic graph models and word embedding models, require all documents in the corpus to be present during the training process. Thus, these methods inherently perform transductive learning which naturally cannot handle well the representations of unseen documents where few words have been learned before. Recently, Graph Neural Networks (GNNs) has drawn a lot of attention in various applications. Inspired by the mechanism of vertex information propagation guided by the graph structure in GNNs, we propose an inductive document representation learning model, called IDRL, that can map the short text structures into a graph network and recursively aggregate the neighbor information of the words in the unseen documents. Then, we can reconstruct the representations of the previously unseen short texts with the limited numbers of word embeddings learned before. Experimental results show that our proposed method can learn more discriminative representations in terms of inductive classification tasks and achieve better clustering performance than state-of-the-art models on four real-world datasets.
DOI10.1007/978-3-030-67664-3_36
URLView the original
Language英語English
Scopus ID2-s2.0-85103260063
Fulltext Access
Citation statistics
Cited Times [WOS]:0   [WOS Record]     [Related Records in WOS]
Document TypeConference paper
CollectionUniversity of Macau
Affiliation1.University of Macau, Macao, Macao
2.The University of Queensland, Brisbane, Australia
3.Dalian University of Technology, Dalian, China
4.The Chinese University of Hong Kong, Hong Kong, China
5.The University of Hong Kong, Hong Kong, Hong Kong
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Chen, Junyang,Gong, Zhiguo,Wang, Wei,et al. Inductive Document Representation Learning for Short Text Clustering[C],2021:600-616.
APA Chen, Junyang,Gong, Zhiguo,Wang, Wei,Dong, Xiao,Wang, Wei,Liu, Weiwen,Wang, Cong,&Chen, Xian.(2021).Inductive Document Representation Learning for Short Text Clustering.Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),12459 LNAI,600-616.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Chen, Junyang]'s Articles
[Gong, Zhiguo]'s Articles
[Wang, Wei]'s Articles
Baidu academic
Similar articles in Baidu academic
[Chen, Junyang]'s Articles
[Gong, Zhiguo]'s Articles
[Wang, Wei]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Chen, Junyang]'s Articles
[Gong, Zhiguo]'s Articles
[Wang, Wei]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.