Computing word senses by semantic mirroring and spectral graph partitioning

From Brede Wiki
Jump to: navigation, search
Conference paper (help)
Computing word senses by semantic mirroring and spectral graph partitioning
Authors: Martin Fagerlund, Magnus Merkel, Lars Eldén, Lars Ahrenberg
Citation: Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing  : 103-107. 2010
Editors:
Publisher: Association for Computational Linguistics, Stroudsburg, PA, USA
Meeting: 2010 Workshop on Graph-based Methods for Natural Language Processing
Database(s): Google Scholar cites
DOI: Define doi.
Link(s): http://aclweb.org/anthology//W/W10/W10-2317.pdf
Search
Web: DuckDuckGo Bing Google Yahoo!Google PDF
Article: Google Scholar PubMed
Restricted: DTU Digital Library
Services
Format: BibTeX

Computing word senses by semantic mirroring and spectral graph partitioning analyze the bipartite graph among words in a English-Swedish lexicon using the H. Dyvik concept of semantic mirroring. Representing the translation of words by a translation matrix and translating back and forth several times the apply graph partitioning on the resulting graph.

Contents

[edit] Data

A English-Swedish lexicon of 14850 words represented in a translation matrix B, i.e., a binary matrix where the columns correspond to words in one language and rows words in the other language and a one indicating where a word is translated to another.

A word is represented in a vector with one for the row element corresponding to the word e.

[edit] Method

[edit] Construction of an adjacency matrix

A translation is a matrix-vector multiplication <math>\mathbf{B}' \mathbf{e}</math>

A "mirror operation" is <math>\mathbf{B}\mathbf{B}' \mathbf{e}</math>, e.g., translating from English to Swedish and back to English

They use a modification for the second matrix as they leave out the element corresponding to the word translated.

The modified mirroring operation is repeated one more time to obtain a symmetric adjacency matrix A.

[edit] Graph partitioning

They construct the graph Laplacian

<math>L = D - A = diag(diag(A)) - A</math>

The then look at the Fiedler vector, cutting it it two and compute the conductance between the two sets of words, choosing the cut with the smallest conductance. The procedure is repeated until the conductance reaches a human decided tolerance.

[edit] Evaluation

They made a small evaluation with two evaluators judged the results independently and coordinated.

Personal tools