Domain specific affective classification of documents
|Conference paper (help)|
|Domain specific affective classification of documents|
|Authors:||Sara Owsley, Sanjay Sood, Kristian J. Hammond|
|Citation:||AAI Spring Symposium: Computational Approaches to Analyzing Weblogs : 181-183. 2006|
|Publisher:||American Association for Artificial Intelligence|
|Database(s):||Citeulike Google Scholar cites|
|Web:||DuckDuckGo Bing Google Yahoo! — Google PDF|
|Article:||Google Scholar PubMed|
|Restricted:||DTU Digital Library|
Domain specific affective classification of documents describes a system for text sentiment analysis of movie reviews for IMDb. They use a naïve Bayes classifier and the comparison the ANEW word list.
10.000 movie reviews from IMDb collected by crawling the site: text as well as the star rating between 1 and 10. The selected only the reviews with 1 or 10 stars: 1200 1-star reviews and 1200 10-star reviews
A test set was countructed by crawling 5000 movie reviews which had ratings between 1 and 10 starts.
- Selecting only adjectives with the Brill tagger for part-of-speech tagging.
- Porter stemmer for word stemming.
- Naïve Bayes classifier
- Comparison with a general purpose affective corpus labelling with ANEW.
- The ANEW sentiment score (valence) was collapsed to negative (1 to 4) and positive (6 to 9).
They find 3180 unique adjectives.
On the test set the accuracy was 78.06% for the naïve Baes classifier approach, while the ANEW approach only scored 57.7%.
- How did they split the test set movie review into the positive and the negative set?
- It is not clear how they aggregated the sentiment with the ANEW word list for the individual words to the entire movie review. Did they just count the number of "positive" and "negative" words?