Sentiment analysis

From Brede Wiki

Jump to: navigation, search
Topic (help)
Sentiment analysis
Variations:

Text sentiment analysis
Opinion mining

Category: Sentiment analysis
Parents:

Text mining
Affective computing

Children:

Twitter sentiment analysis
Wikipedia sentiment analysis

Databases: Wikipedia with DBpedia
Search
Papers: DOAJ Google Scholar PubMed (Open J-Gate)
Ontologies: MeSH NeuroLex Wikidata Wikipedia
Other: Google Twitter WolframAlpha

This is a graph with borders and nodes. Maybe there is an Imagemap used so the nodes may be linking to some Pages.

Text sentiment analysis (or usually just sentiment analysis) is a text mining technique to analyze the sentiment of the writer or to the topic written about.

Bo Pang and Lillian Lee have written a lengthy introduction to sentiment analysis: Opinion mining and sentiment analysis.[1]

Sentiment analysis may be combined with another text-mining technique, topic mining, in what is called topic-sentiment analysis.[2]


Contents

[edit] Methods

Sentiment analysis may employ machine learning techniques. One often apply method is naïve Bayes classifier where the algorithm is trained on a labeled data set. Within the Python package NLTK is a classic sentiment analysis data set (movie reviews) as well as general machine learning methods for sentiment classification. Some of the earliest papers on this approach are probably

Another approach is to use a word list where each word has been scored for positivity/negativity or sentiment strength. There exists several word lists: ANEW is the oldest and has around 1000 words, AFINN is newer and has around 2.500, while labMT has over 10.000 words scored.

One way to extended word lists is to use word co-occurence or a word ontology such as WordNet.[3] The method may go back to 1957.[4]

[edit] Corpora

Affective Text 
"Affective Text: Data Annotated for Emotions and Polarity" Rada Mihalcea [3]
BLOGS06 
[5]
Darmstadt Service Review Corpus 
http://www.ukp.tu-darmstadt.de/data/sentiment-analysis/darmstadt-service-review-corpus/ "consumer reviews annotated with opinion related information at the sentence and expression levels."[6]
EmotiBlog 
Movie reviews 
A classic data set in sentiment analysis. It is included in the NLTK python package in nltk.corpus.movie_reviews.
Multi-Domain Sentiment Dataset
[4]
RepLab 
Manually-labeled Twitter posts. The data is described here, but it is unclear if the data is publicly available.
SemEval 
[5] In 2013 there was a Twitter sentiment analysis task with several thousand labeled postings and SMS text messages.
Sentiment140 corpora 
2 data sets from Twitter. One with 498 labeled tweets [6]. See also the description: [7].
Twitter Sentiment Corpus 
[8] by Niek Sanders "consists of 5513 hand-classified tweets".
TASS Corpus 
[9] consists of 70000 tweets in Spanish, annotated with global polarity.

Several researchers have crawled IMDb and downloaded movie reviews text and star rating.[7]

[edit] Affective word lists

Sentiment analysis may use word lists annotated for their arousal and their valence, i.e., whether they are positive or negative. Some word lists are listed and commented on in setion 7.3 of the Pang/Lee monograph. Some of the word lists are:

Affective Norms for English Words (ANEW
An English word list constructed by Bradley and Lang[8] and available from University of Florida [10]. There are 1034 words rated for valence, arousal and dominance. It is "solely for use in academic, not-for-profit research at recognized educational institutions". (It is associated with a program by Greg Siegel, http://www.sci.sdsu.edu/CAL/wordlist/ ). SPANEW, Spanish ANEW[9]. DANEW, Dutch ANEW.[10].
AFINN 
An English word list with 2477 words (previously 1468 words) constructed by Finn Årup Nielsen for sentiment analysis of Twitter messages (while also used for other texts) and is available with a share-alike license: [11]. Each word is rated by a valence value from -5 to +5. A evaluation of the word list was described in A new ANEW: evaluation of a word list for sentiment analysis in microblogs and the word list was used in Good friends, bad news - affect and virality in Twitter. For a simple example of using the list with Python see [12].
Balanced Affective Word List ("original")
An older version of the Balanced Affective Word List with 277 English words and associated with the program of Greg Siegle, http://www.sci.sdsu.edu/CAL/wordlist/origwordlist.html (The original URL has gone Internet Archive version) The valence coded is 1=positive 2=negative 3=anxious 4=neutral. The words were aggregated from two lists: one list collected by Greg Siegle and Mark Shibley and another list of 240 words by Carolyn H. John from the publication Emotionality ratings and free-association norms of 240 emotional and non-emotional words.[11]
Berlin Affective Word List (BAWL) 
A word list of 2'200 German words with emotional valence and imageability.[12] A research project took some of these words as part of the basis for an annotated word list of 300 English words.[13]
Berlin Affective Word List Reloaded (BAWL-R) 
A newer version of BAWL with addition of arousal for words.[14]
Bilingual Finnish Affective Norms 
210 British English and Finnish nouns, including taboo words.[13] [15]
Compass DeRose Guide to Emotion Words 
English emotional words collected by Steven J. DeRose and categorized but without valence or arousal. http://www.derose.net/steve/resources/emotionwords/ewords.html
Dictionary of Affect in Language (DAL
constructed by Cynthia M. Whissell. A description of it seems to be available as a chapter in the book Emotion: theory, research, and experience (pp. 113-131) with Robert Plutchik and Henry Kellerman as editors and published by Academic Press. One Web services uses DAL: [14] The list has also been called "Whissell's Dictionary of Affect in Language" (WDAL).[16]
General Inquirer 
has several dictionaries, e.g., a "positive" list with 1'915 words and one 'negative' list with 2'291 words. http://www.wjh.harvard.edu/~inquirer/homecat.htm
Hu-Liu opinion lexicon
around 6800 words in a negative and a positive list. [15]. Collected over the years starting with the papers Mining and summarizing customer reviews.
LabMT 
A large word list
Leipzig Affective Norms for German (LANG) 
"A list of 1,000 German nouns that have been rated for emotional valence, arousal, and concreteness" http://www.springerlink.com/content/m244118283586754/supplementals/ .[17]
LIWC 
Linguistic Inquiry and Word Count [16] Commercial ($90) word lists with computer program to extract basic counts / ratios. Contains dictionaries for English, German, Spanish, Dutch, and Italian. Extracts around 60 different word categories, including "positive emotions" and "negative emotions". The program can be purchased; their site also allows you to analyze texts one by one.
Loughran and McDonald Financial Sentiment Dictionaries 
[17] Dictionaries with negative, poisitive, uncertainty, litigious and modal words especially for financial texts by Tim Loughran and Bill McDonald. The lists are "Not for commercial use without authorization". Described in When is a liability not a liability? textual analysis, dictionaries, and 10-Ks.
NRC Emotion Lexicon 
(EmoLex) A large word list constructed by Saif M. Mohammad through Amazon Mechanical Turk.
NRC Hashtag Sentiment Lexicon 
[18] large list of words created from 775,310 tweets with a positive or negative hash tag.[18]
NTU Sentiment Dictionary 
(Listed by Pang and Lee)
OpinionFinder's Subjectivity Lexicon 
[19] 8221 words scored for polarity (positive or negative), subjectivity. Distinguishes between POS-tag.[19]
Pattern 
The Pattern Python package has the sentiment.xml included which 2888 words scored for polarity, subjectivity, intensity and reliability. The words are mostly adjectives. There are no nouns.
Sentiment140 Lexicon 
[20] Large list built from tweets.[20]
SAMAR/Sifaat 
Subjectivity and Sentiment Analysis of Social Media Arabic by Muhammad Abdul-Mageed and Mona T. Diab. Not clear whether it is available. See also Toward building a large-scale Arabic sentiment lexicon.
SentiSense 
[21] It "consists of 5,496 words and 2,190 synsets labeled with an emotion from a set of 14 emotional categories"[21]
SentiWordNet 
Assigns 3 sentiment scores for WordNet synset: positivity, negativity, objectivity. The license has been "only for research, non-profit purposes",[22] but now changed to CC-BY-SA.[23] The 3.0 version was described in 2010.[24] http://sentiwordnet.isti.cnr.it/ See also Python interface at https://bitbucket.org/jaganadhg/pysentiwn/wiki/Home
Taboada and Grieve's Turney adjective list 
(listed in Pang and Lee) available through Yahoo! sentimentAI group.
TUDADEN 
Turkish[25]
WordNet-Affect 
An English list.[26] Originally "freely available, for research purposes".[27] Now part of WordNet Domains which is distributed under CC-BY.[28] See http://wndomains.fbk.eu/wnaffect.html and http://wndomains.fbk.eu/download.html

For comparison of the different word lists see Enhancing lexicon-based review classification by merging and revising sentiment dictionaries and A new ANEW: evaluation of a word list for sentiment analysis in microblogs.

[edit] Tools

  1. SentiStrength
  2. AFINN, A affective wordlist. Code exists in several programming languages
  3. Pattern, Python library.
  4. sasa-tool, [22], USC SAIL/AIL sentiment analysis tool.
  5. Senti by Crowflower [23], commercial crowd-based service

See also list by Seth Grimes in What are the most powerful open-source sentiment-analysis tools?

[edit] Online services

  1. http://sentimentalytics.com - a browser plug-in that automatically analyzes social media content (including sentiment)
  2. http://www.socialmention.com
  3. http://www.viralheat.com
  4. http://neuro.imm.dtu.dk/cgi-bin/brede_str_nmf Sentiment-topic mining
  5. http://www.textmap.com
  6. http://www.sentiment140.com/
  7. http://www.sentigem.com — does this work?
  1. ConveyAPI As of 2013 June seemingly Vaporware-ish: "currently offering free a evaluation of the ConveyAPI to select companies." [24]
  2. Bitext, demo available at http://svc8.bitext.com/API-demo/

[edit] Events

  1. Workshop on sentiment and subjectivity in text COLING ACL 2006
    1. Extracting opinions, opinion holders, and topics expressed in online news media text [Extracting opinions, opinion holders, and topics expressed in online news media text]
  2. First International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion Measurement, 2009.
  3. 1st Workshop on Opinion Mining and Sentiment Analysis, 2009.
  4. ICDM11 workshop on opinion mining and sentiment analysis

[edit] Researchers

  1. Bing Liu
  2. Finn Årup Nielsen, AFINN
  3. Mike Thelwall, SentiStrength
  4. Peter D. Turney, unsupervized sentiment analysis
  5. Saif M. Mohammad, NRC Emotion Lexicon, SemEval winner.
  6. ...

[edit] Papers

  1. A new ANEW: evaluation of a word list for sentiment analysis in microblogs
  2. Building lexicon for sentiment analysis from massive collection of HTML documents
  3. Combining social network analysis and sentiment analysis to explore the potential for online radicalisation
  4. Crowd sentiment detection during disasters and crises
  5. Determining the sentiment of opinions
  6. Domain specific affective classification of documents
  7. Good friends, bad news - affect and virality in Twitter
  8. Large-scale sentiment analysis for news and blogs
  9. Leveraging textual sentiment analysis with social network modeling
  10. Micro-blogging sentiment detection by collaborative online learning
  11. Mining the peanut gallery: opinion extraction and semantic classification of product reviews
  12. Negative emotions accelerating users activity in BBC Forum
  13. Quantitative analysis of bloggers collective behavior powered by emotions
  14. Robust sentiment detection on Twitter from biased and noisy data
  15. Sentiment analysis with global topics and local dependency
  16. Sentiment in short strength detection informal text
  17. Tweetin' in the rain: exploring societal-scale effects of weather on mood
  18. Using emoticons to reduce dependency in machine learning techniques for sentiment classification
  19. Using verbs and adjectives to automatically classify blog sentiment

[edit] See also

  1. Sentiment-based text segmentation

[edit] References

  1. Bo Pang, Lillian Lee (2008). "Opinion mining and sentiment analysis". Foundations and Trends in Information Retrieval 2(1-2): 1-135. [1].
  2. Quaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, ChengXiang Zhai(2007). "Topic sentiment mixture: modeling facets and opinions in weblogs".
  3. Words with attitude
  4. The measurement of meaning
  5. I. Ounis, C. MacDonald, I. Soboroff. "Overview of the trec-2008 blog trac". The TREC 2008 Proceedings.
  6. Sentence and expression level annotation of opinions in user-generated discourse
  7. Domain specific affective classification of documents
  8. Margaret M. Bradley, Peter J. Lang. (1999). Affective norms for English words (ANEW). Gainesville, FL. The NIMH Center for the Study of Emotion and Attention, University of Florida.
  9. The Spanish adaptation of ANEW (affective norms for English words)
  10. The QWERTY effect: how typing shapes the meanings of words
  11. Carolyn H. John (1998). "Emotionality ratings and free-association norms of 240 emotional and non-emotional words". Cognition & Emotion 2(1): 49-70. doi: 10.1080/02699938808415229.
  12. Melissa L.-H. Võ, Arthur M. Jacobs, Markus Conrad (2006). "Cross-validating the Berlin Affective Word List". Behavior Research Methods 38(4): 606-609.
  13. Evaluation of lexical and semantic features for English emotion words
  14. Melissa L.-H. Võ, Markus Conrad, Lars Kuchinke, Karolina Urton, Markus J. Hofmann, Arthur M. Jacobs (2009). "The Berlin Affective Word List Reloaded (BAWL-R)". Behavior Research Methods 41: 534-538. doi: 10.3758/BRM.41.2.534.
  15. Tiina M. Eilola, Jelena Havelka (2010). "Affective norms for 210 British English and Finnish nouns". Behavior Research Methods 42(1): 134-140. PMID: 20160293.
  16. Let me listen to poetry, let me see emotions
  17. P. Kanske, S. A. Kotz (2010). "Leipzig Affective Norms for German: A reliability study". Behav Res Methods 42(4): 987-991. PMID: 21139165.
  18. NRC-Canada: building the state-of-the-art in sentiment analysis of tweets
  19. Theresa Wilson, Janyce Wiebe, Paul Hoffmann(2005). "Recognizing contextual polarity in phrase-level sentiment analysis". Proc. of HLT-EMNLP-2005.
  20. NRC-Canada: building the state-of-the-art in sentiment analysis of Tweets
  21. SentiSense: an easily scalable concept-based affective lexicon for sentiment analysis
  22. Andrea Esuli, Fabrizio Sabastiani. "SentiWordNet: a publicly available lexical resource for opinion mining".
  23. http://sentiwordnet.isti.cnr.it/
  24. Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani(2010). "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining". Pages 2200-2204 in In Proceedings of LREC-10, 7th Conference on Language Resources and Evaluation.
  25. Gökçay D., Smith MA., "TÜDADEN:Türkçede Duygusal ve Anlamsal Değerlendirmeli Norm Veri Tabanı", Proceedings of Brain-Computer Workshop 4, 2008, Istanbul.
  26. WordNet-Affect: an affective extension of WordNet
  27. A. Valitutti, C. Strapparava, O. Stock (2004). "Developing affective lexical resources". PsychNology Journal 2(1): 61-83. [2].
  28. http://wndomains.fbk.eu/download.html

[edit] Other

  1. Carlo Strapparava, Rada Mihalcea(2008). "Learning to identify emotions in text". Pages 1556-1560 in PSAC '08: Proceedings of the 2008 ACM symposium on Applied computing. doi: http://doi.acm.org/10.1145/1363686.1364052. [25]
  2. Understanding sentiment of people from news articles: temporal sentiment analysis of social events
Personal tools