---
abstract: 'The evaluative character of a word is called its semantic orientation. A positive semantic orientation implies desirability (e.g., "honest", "intrepid") and a negative semantic orientation implies undesirability (e.g., "disturbing", "superfluous"). This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora. The method involves issuing queries to a Web search engine and using pointwise mutual information to analyse the results. The algorithm is empirically evaluated using a training corpus of approximately one hundred billion words � the subset of the Web that is indexed by the chosen search engine. Tested with 3,596 words (1,614 positive and 1,982 negative), the algorithm attains an accuracy of 80%. The 3,596 test words include adjectives, adverbs, nouns, and verbs. The accuracy is comparable with the results achieved by Hatzivassiloglou and McKeown (1997), using a complex four-stage supervised learning algorithm that is restricted to determining the semantic orientation of adjectives. '
altloc:
- http://extractor.iit.nrc.ca/reports/ERB-1094.pdf
chapter: ~
commentary: ~
commref: ~
confdates: ~
conference: ~
confloc: ~
contact_email: ~
creators_id: []
creators_name:
- family: Turney
given: Peter D.
honourific: ''
lineage: ''
- family: Littman
given: Michael L.
honourific: ''
lineage: ''
date: 2002
date_type: published
datestamp: 2002-07-15
department: Institute for Information Technology
dir: disk0/00/00/23/22
edit_lock_since: ~
edit_lock_until: ~
edit_lock_user: ~
editors_id: []
editors_name: []
eprint_status: archive
eprintid: 2322
fileinfo: /style/images/fileicons/application_postscript.png;/2322/1/ERB%2D1094.ps|/style/images/fileicons/application_pdf.png;/2322/5/ERB%2D1094.pdf
full_text_status: public
importid: ~
institution: National Research Council Canada
isbn: ~
ispublished: unpub
issn: ~
item_issues_comment: []
item_issues_count: 0
item_issues_description: []
item_issues_id: []
item_issues_reported_by: []
item_issues_resolved_by: []
item_issues_status: []
item_issues_timestamp: []
item_issues_type: []
keywords: ~
lastmod: 2011-03-11 08:54:57
latitude: ~
longitude: ~
metadata_visibility: show
note: ~
number: ~
pagerange: ~
pubdom: FALSE
publication: ~
publisher: ~
refereed: FALSE
referencetext: ~
relation_type: []
relation_uri: []
reportno: NRC Technical Report ERB-1094
rev_number: 14
series: ~
source: ~
status_changed: 2007-09-12 16:44:14
subjects:
- comp-sci-art-intel
- comp-sci-lang
- comp-sci-mach-learn
- comp-sci-stat-model
succeeds: ~
suggestions: ~
sword_depositor: ~
sword_slug: ~
thesistype: ~
title: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus
type: techreport
userid: 2175
volume: ~