Detecting Wikipedia vandalism with active learning and statistical language models

From Brede Wiki
Jump to: navigation, search
Conference paper (help)
Detecting Wikipedia vandalism with active learning and statistical language models
Authors: Si-Chi Chin, W. Nick Street, Padmini Srinivasan, David Eichmann
Citation: Proceedings of the 4th workshop on Information credibility  : 3-10. 2010
Editors:
Publisher: Association for Computing Machinery
Meeting: 4th workshop on Information credibility
Database(s):
DOI: 10.1145/1772938.1772942.
Link(s):
Search
Web: DuckDuckGo Bing Google Yahoo!Google PDF
Article: Google Scholar PubMed
Restricted: DTU Digital Library
Services
Format: BibTeX

Detecting Wikipedia vandalism with active learning and statistical language models reports on a system for vandalism detection on Wikipedia.

Their taxonomy for vandalism is:

  • Blanking
  • Large-scale editing
  • Graffiti
  • Misinformation
  • Image attack
  • Link Spam
  • Irregular formatting

They looked on two Wikipedia articles "Abraham Lincoln" and "Microsoft", both which had over 8000 revisions. These pages are among the most vandalized pages [1]

The worked with the CMU-toolkit to build bigrams statistical language models.

Based on extracted features they apply three classifiers: Boosting J48, Logistic regression and support vector machines.

[edit] Related papers

  1. Automatic vandalism detection in Wikipedia
  2. Detecting wikipedia vandalism using wikitrust
  3. Vandalism detection in Wikipedia: a high-performing, feature-rich model and its reduction through Lasso
  4. Wikipedia vandalism detection
Personal tools