Simple Java CLI tool wrapping the Stanford NLP for use with RLetters (no longer in use)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
This repo is archived. You can view files and clone it, but cannot push or open issues/pull-requests.
Charles Pence 22a3f01f64 Shutting down this repository. 3 years ago
src Double-quote and escape everything output in YAML, for safety. 7 years ago
.gitignore Initial commit. 7 years ago
.travis.yml Add a Travis YML file. 7 years ago
COPYING Initial commit. 7 years ago Shutting down this repository. 3 years ago
pom.xml Initial commit. 7 years ago

Natural Language Processing for RLetters

N.B.: We have moved to pure-Ruby solutions for NLP in the latest versions of RLetters. This repository is thus no longer in use, nor required for running recent (after May, 2018) versions of RLetters.


A simple interface script that calls out to the Stanford Natural Language Processing toolkit, designed to return certain specific kinds of results to RLetters.


Bridge-type interfaces from Ruby to Java are clunky, prone to strange JVM and GC trouble, and hard to debug. It's actually much easier to write this thin Java wrapper, have Maven take care of all the package dependencies, and call out to it from Ruby.


You need to have Apache Maven installed. On Mac OS X, this is just brew install maven, and on Ubuntu you're looking for sudo apt-get install maven. To compile the JAR file, run:

git co (this repository)
mvn install
java -jar target/nlp-tool-(VERSION)-jar-with-dependencies.jar

You should probably write a shell script or something that calls this JAR file, say:

java -jar (PATH_TO)/nlp-tool-(VERSION)-jar-with-dependencies.jar $?


The following functionality is included:

  • Named Entity Recognition: Run nlp-tool -n < data and get back a YAML-formatted hash that looks something like this:

        - John Doe
        - Jane Smith
        - London
        - Argentina
        - The Corporation
        - Aperture Science
  • Parts of Speech Tagging: Run nlp-tool -p < data and get back a YAML-formatted array of words with their parts of speech tags attached:

    - It_PRP
    - was_VBD
    - the_DT
    - best_JJS
    - of_IN
    - times_NNS
  • Lemmatization Run nlp-tool -l < data and get back a YAML-formatted array of lemmatized words:

    - it
    - be
    - the
    - best
    - of
    - time


Copyright (C) 2014 Charles Pence, and released under the MIT license.