Extract the comments from PDF, DOC, or DOCX documents.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Charles Pence 8a452c3f37 Update dependency to fix CVE, even though this is unmaintained. 2 years ago
src/main/java/net/charlespence/get_comments Update PDFBOX to close some CVEs. 3 years ago
.gitignore Fully functional extraction of both DOC and DOCX comments. Printing to console for now. 9 years ago
COPYING Add license, README. 9 years ago
README.md Document the command-line options. 9 years ago
pom.xml Update dependency to fix CVE, even though this is unmaintained. 2 years ago



Extract the comments/annotations from a Word DOC or DOCX document or a PDF file, and dump them to the console (for now).


Because I grade student papers by putting the grades in Word and PDF comments, and I wanted to be able to extract them from the command line, without running Word or Acrobat themselves, or a hack like AppleScript.


You need to have Apache Maven installed. On Mac OS X, this is just brew install maven, and on Ubuntu you're looking for sudo apt-get install maven. To compile the JAR file, run:

git co (this repository)
mvn install
java -jar target/get_comments-(VERSION)-jar-with-dependencies.jar

For everyday use, you might want to drop the JAR somewhere memorable and write a little shell script:

java -jar (PATH_TO)/get_comments.jar $?


Just call java -jar PATH_TO_JAR_FILE [OPTIONS] FILENAME, and the comments from the file will be printed to standard output. There are two command-line options:

  • --quiet or -q: Only print the comments themselves. By default, each comment will be prefixed by "Comment #N: "; setting this option disables that.
  • --limit N or -l N: Only print the first N comments from the document. By default, all comments in the document will be printed.

For example, to print only the value of the document's first comment, you can call java -jar (PATH_TO_JAR_FILE) --quiet --limit 1 (FILENAME).


Copyright (C) 2012 Charles Pence, and released under the MIT license.