Semantic Attention Networks for Intuitive Information Retrieval

  • Jonathan David La McMaster University

Abstract

In this work we present a siamese LSTM-attention network based off of the structured self-attention mechanism of \cite{lin2017structured}, and the dependency parse tree of \cite{tai2015improved}. The network is designed to both learn semantic similarity between sentence pairs as well as provide a simple mechanisms for comparing text entities through learned attention weights. We train the model in a supervised setting on the SICK data set, achieving a test accuracy of 87.03 \%, an improvement over the results of \cite{tai2015improved} achieving 86.76\%

Author Biography

Jonathan David La, McMaster University
4th year engineering physics student.

References

Marco Marelli, Luisa Bentivogli, Marco Baroni, Raaella
Bernardi, Stefano Menini, and Roberto Zamparelli. Semeval-
2014 task 1: Evaluation of compositional distributional semantic
models on full sentences through semantic relatedness and tex-
tual entailment. In SemEval@ COLING, pages 1{8, 2014.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jerey Dean. Ef-
cient estimation of word representations in vector space. arXiv
preprint arXiv:1301.3781, 2013.

Kai Sheng Tai, Richard Socher, and Christopher D Manning. Im-
proved semantic representations from tree-structured long short-
term memory networks. arXiv preprint arXiv:1503.00075, 2015.

Quoc Le and Tomas Mikolov. Distributed representations of sen-
tences and documents. In Proceedings of the 31st International
Conference on Machine Learning (ICML-14), pages 1188{1196,
2014.

S. Hochreiter and J. Schmidhuber. Long short-term memory.
Neural Computation, 9(8):1735{1780, 1997. ISSN 0899-7667. doi:
10.1162/neco.1997.9.8.1735.

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu,
Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-
attentive sentence embedding. arXiv preprint arXiv:1703.03130,
2017.

Danqi Chen and Christopher Manning. A fast and accurate de-
pendency parser using neural networks. In Proceedings of the
2014 conference on empirical methods in natural language pro-
cessing (EMNLP), pages 740{750, 2014.

Jerey Pennington, Richard Socher, and Christopher Manning.
Glove: Global vectors for word representation. In Proceedings of
the 2014 conference on empirical methods in natural language
processing (EMNLP), pages 1532{1543, 2014.

Matthew D Zeiler. Adadelta: an adaptive learning rate method.
arXiv preprint arXiv:1212.5701, 2012.

Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault,
and Antoine Bordes. Supervised learning of universal sentence
representations from natural language inference data. arXiv
preprint arXiv:1705.02364, 2017.

Martin F Porter. An algorithm for sux stripping. Program, 14
(3):130{137, 1980.

Jonas Mueller and Aditya Thyagarajan. Siamese recurrent archi-
tectures for learning sentence similarity. In AAAI, pages 2786{
2792, 2016.

An open-source implementation of the paper \a struc-
tured self-attentive sentence embedding" published by
ibm and mila. https://github.com/ExplorerFreda/
Structured-Self-Attentive-Sentence-Embedding. Accessed:
2018-01-10.

Tree lstm implementation in pytorch. https://github.com/
dasguptar/treelstm.pytorch. Accessed: 2018-01-10.
Published
2018-01-16