Semantic Attention Networks for Intuitive Information Retrieval
Abstract
In this work we present a siamese LSTM-attention network based off of the structured self-attention mechanism of \cite{lin2017structured}, and the dependency parse tree of \cite{tai2015improved}. The network is designed to both learn semantic similarity between sentence pairs as well as provide a simple mechanisms for comparing text entities through learned attention weights. We train the model in a supervised setting on the SICK data set, achieving a test accuracy of 87.03 \%, an improvement over the results of \cite{tai2015improved} achieving 86.76\%References
Marco Marelli, Luisa Bentivogli, Marco Baroni, Raaella
Bernardi, Stefano Menini, and Roberto Zamparelli. Semeval-
task 1: Evaluation of compositional distributional semantic
models on full sentences through semantic relatedness and tex-
tual entailment. In SemEval@ COLING, pages 1{8, 2014.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jerey Dean. Ef-
cient estimation of word representations in vector space. arXiv
preprint arXiv:1301.3781, 2013.
Kai Sheng Tai, Richard Socher, and Christopher D Manning. Im-
proved semantic representations from tree-structured long short-
term memory networks. arXiv preprint arXiv:1503.00075, 2015.
Quoc Le and Tomas Mikolov. Distributed representations of sen-
tences and documents. In Proceedings of the 31st International
Conference on Machine Learning (ICML-14), pages 1188{1196,
S. Hochreiter and J. Schmidhuber. Long short-term memory.
Neural Computation, 9(8):1735{1780, 1997. ISSN 0899-7667. doi:
1162/neco.1997.9.8.1735.
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu,
Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-
attentive sentence embedding. arXiv preprint arXiv:1703.03130,
Danqi Chen and Christopher Manning. A fast and accurate de-
pendency parser using neural networks. In Proceedings of the
conference on empirical methods in natural language pro-
cessing (EMNLP), pages 740{750, 2014.
Jerey Pennington, Richard Socher, and Christopher Manning.
Glove: Global vectors for word representation. In Proceedings of
the 2014 conference on empirical methods in natural language
processing (EMNLP), pages 1532{1543, 2014.
Matthew D Zeiler. Adadelta: an adaptive learning rate method.
arXiv preprint arXiv:1212.5701, 2012.
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault,
and Antoine Bordes. Supervised learning of universal sentence
representations from natural language inference data. arXiv
preprint arXiv:1705.02364, 2017.
Martin F Porter. An algorithm for sux stripping. Program, 14
(3):130{137, 1980.
Jonas Mueller and Aditya Thyagarajan. Siamese recurrent archi-
tectures for learning sentence similarity. In AAAI, pages 2786{
, 2016.
An open-source implementation of the paper a struc-
tured self-attentive sentence embedding" published by
ibm and mila. https://github.com/ExplorerFreda/
Structured-Self-Attentive-Sentence-Embedding. Accessed:
-01-10.
Tree lstm implementation in pytorch. https://github.com/
dasguptar/treelstm.pytorch. Accessed: 2018-01-10.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.