Semantic Attention Networks for Intuitive Information Retrieval

Authors

  • Jonathan David La McMaster University

Abstract

In this work we present a siamese LSTM-attention network based off of the structured self-attention mechanism of \cite{lin2017structured}, and the dependency parse tree of \cite{tai2015improved}. The network is designed to both learn semantic similarity between sentence pairs as well as provide a simple mechanisms for comparing text entities through learned attention weights. We train the model in a supervised setting on the SICK data set, achieving a test accuracy of 87.03 \%, an improvement over the results of \cite{tai2015improved} achieving 86.76\%

Author Biography

Jonathan David La, McMaster University

4th year engineering physics student.

References

Marco Marelli, Luisa Bentivogli, Marco Baroni, Raaella

Bernardi, Stefano Menini, and Roberto Zamparelli. Semeval-

task 1: Evaluation of compositional distributional semantic

models on full sentences through semantic relatedness and tex-

tual entailment. In SemEval@ COLING, pages 1{8, 2014.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jerey Dean. Ef-

cient estimation of word representations in vector space. arXiv

preprint arXiv:1301.3781, 2013.

Kai Sheng Tai, Richard Socher, and Christopher D Manning. Im-

proved semantic representations from tree-structured long short-

term memory networks. arXiv preprint arXiv:1503.00075, 2015.

Quoc Le and Tomas Mikolov. Distributed representations of sen-

tences and documents. In Proceedings of the 31st International

Conference on Machine Learning (ICML-14), pages 1188{1196,

S. Hochreiter and J. Schmidhuber. Long short-term memory.

Neural Computation, 9(8):1735{1780, 1997. ISSN 0899-7667. doi:

1162/neco.1997.9.8.1735.

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu,

Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-

attentive sentence embedding. arXiv preprint arXiv:1703.03130,

Danqi Chen and Christopher Manning. A fast and accurate de-

pendency parser using neural networks. In Proceedings of the

conference on empirical methods in natural language pro-

cessing (EMNLP), pages 740{750, 2014.

Jerey Pennington, Richard Socher, and Christopher Manning.

Glove: Global vectors for word representation. In Proceedings of

the 2014 conference on empirical methods in natural language

processing (EMNLP), pages 1532{1543, 2014.

Matthew D Zeiler. Adadelta: an adaptive learning rate method.

arXiv preprint arXiv:1212.5701, 2012.

Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault,

and Antoine Bordes. Supervised learning of universal sentence

representations from natural language inference data. arXiv

preprint arXiv:1705.02364, 2017.

Martin F Porter. An algorithm for sux stripping. Program, 14

(3):130{137, 1980.

Jonas Mueller and Aditya Thyagarajan. Siamese recurrent archi-

tectures for learning sentence similarity. In AAAI, pages 2786{

, 2016.

An open-source implementation of the paper a struc-

tured self-attentive sentence embedding" published by

ibm and mila. https://github.com/ExplorerFreda/

Structured-Self-Attentive-Sentence-Embedding. Accessed:

-01-10.

Tree lstm implementation in pytorch. https://github.com/

dasguptar/treelstm.pytorch. Accessed: 2018-01-10.

Downloads

Published

2018-01-16