Warren Sack, Nik Hanselmann, and Nick Lally

Our Project
Allows users to search a corpus of articles and use lexical analysis tools, clustering algorithms and metric tags to sort the corpus and find trends. Users can then do a close reading of select articles. The process is iterative and allows users to save states along the way and revert to earlier steps in the process. The interface is written in Javascript and uses Python with NLTK and BeautifulSoup to prepare the corpus and article metrics.

Online Test Version
Online Test Version with Save States

Tools We've Made
CIA Factbook Parser, Download .zip file:
Parses a local copy of the CIA Factbook (downloadable here: https://www.cia.gov/library/publications/the-world-factbook/) into text files which can easily be read by a program to load country data. File includes parser and parsed files. Requires Python and BeautifulSoup (version 3.07 or older).

Lexis Nexis Parser, Download .zip file:
Parses results from Lexis Nexis and creates a new corpus for the project. Outputs articles as .txt files and a JSON file (with attributes). Requires Python, NLTK, and Beautiful Soup (version 3.07 or older)

Limited Logic Programming Language (Warren Sack), Download .js file:
This code is a translation and adaption of the logic programming interpreter in chapter 4 of Sussman and Abelson's book "The Structure and Interpretation of Computer Programs." This implementation assumes that you will be running the code in a Firefox browser with Firebug installed. Written in JavaScript.

Examples
jQuery and the Twitter API, example:
Uses jQuery to load and display JSON results loaded from the Twitter API

NLTK, Download .zip file:
Applies NLTK functions to a text of your choice, requires Python and NLTK


Resources
Protovis, http://vis.stanford.edu/protovis/:
Used for treemap visualization

Web Workers, http://www.whatwg.org/specs/web-workers/current-work/:
This specification defines an API for running scripts in the background independently of any user interface scripts.

NLTK, http://www.nltk.org/:
Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics,

BeautifulSoup 3.0.7, http://www.crummy.com/software/BeautifulSoup/:
Python HTML parser

JSON, http://json.org/:
JSON (JavaScript Object Notation) is a lightweight data-interchange format.

jQuery, http://jquery.com/:
jQuery is a fast and concise JavaScript Library that simplifies HTML document traversing, event handling, animating, and Ajax interactions for rapid web development

Python, http://www.python.org/:
Programming language

LexisNexis, http://www.lexisnexis.com:
News service

Bottle, http://bottle.paws.de/:
Bottle is a fast and simple WSGI web-framework for Python packed into a single file with no external dependencies.


Texts
Programming Collective Intelligence, http://oreilly.com/catalog/9780596529321
Natural Language Processing with Python, http://www.nltk.org/book


Acknowledgments
This work has been supported by Award #0416353 ("An Interface and Search Engine for Deliberation") from the National Science Foundation, Directorate for Computer and Information Science and Engineering. Any opinions, findings and conclusions or recommendations expressed in this website are those of the authors and do not necessarily reflect those of the National Science Foundation.