Computing word senses by semantic mirroring and spectral graph partitioning
Conference paper (help) | |
---|---|
Computing word senses by semantic mirroring and spectral graph partitioning | |
Authors: | Martin Fagerlund, Magnus Merkel, Lars Eldén, Lars Ahrenberg |
Citation: | Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing : 103-107. 2010 |
Editors: | |
Publisher: | Association for Computational Linguistics, Stroudsburg, PA, USA |
Meeting: | 2010 Workshop on Graph-based Methods for Natural Language Processing |
Database(s): | Google Scholar cites |
DOI: | Define doi. |
Link(s): | http://aclweb.org/anthology//W/W10/W10-2317.pdf |
Search | |
Web: | DuckDuckGo Bing Google Yahoo! — Google PDF |
Article: | Google Scholar PubMed |
Restricted: | DTU Digital Library |
Services | |
Format: | BibTeX |
Computing word senses by semantic mirroring and spectral graph partitioning analyze the bipartite graph among words in a English-Swedish lexicon using the H. Dyvik concept of semantic mirroring. Representing the translation of words by a translation matrix and translating back and forth several times the apply graph partitioning on the resulting graph.
Contents |
[edit] Data
A English-Swedish lexicon of 14850 words represented in a translation matrix B, i.e., a binary matrix where the columns correspond to words in one language and rows words in the other language and a one indicating where a word is translated to another.
A word is represented in a vector with one for the row element corresponding to the word e.
[edit] Method
[edit] Construction of an adjacency matrix
A translation is a matrix-vector multiplication <math>\mathbf{B}' \mathbf{e}</math>
A "mirror operation" is <math>\mathbf{B}\mathbf{B}' \mathbf{e}</math>, e.g., translating from English to Swedish and back to English
They use a modification for the second matrix as they leave out the element corresponding to the word translated.
The modified mirroring operation is repeated one more time to obtain a symmetric adjacency matrix A.
[edit] Graph partitioning
They construct the graph Laplacian
- <math>L = D - A = diag(diag(A)) - A</math>
The then look at the Fiedler vector, cutting it it two and compute the conductance between the two sets of words, choosing the cut with the smallest conductance. The procedure is repeated until the conductance reaches a human decided tolerance.
[edit] Evaluation
They made a small evaluation with two evaluators judged the results independently and coordinated.