Extract the comments from PDF, DOC, or DOCX documents.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Charles Pence 8a452c3f37 Update dependency to fix CVE, even though this is unmaintained. 3 years ago
src/main/java/net/charlespence/get_comments Update PDFBOX to close some CVEs. 4 years ago
.gitignore Fully functional extraction of both DOC and DOCX comments. Printing to console for now. 10 years ago
COPYING Add license, README. 10 years ago
README.md Document the command-line options. 10 years ago
pom.xml Update dependency to fix CVE, even though this is unmaintained. 3 years ago



Extract the comments/annotations from a Word DOC or DOCX document or a PDF file, and dump them to the console (for now).


Because I grade student papers by putting the grades in Word and PDF comments, and I wanted to be able to extract them from the command line, without running Word or Acrobat themselves, or a hack like AppleScript.


You need to have Apache Maven installed. On Mac OS X, this is just brew install maven, and on Ubuntu you're looking for sudo apt-get install maven. To compile the JAR file, run:

git co (this repository)
mvn install
java -jar target/get_comments-(VERSION)-jar-with-dependencies.jar

For everyday use, you might want to drop the JAR somewhere memorable and write a little shell script:

java -jar (PATH_TO)/get_comments.jar $?


Just call java -jar PATH_TO_JAR_FILE [OPTIONS] FILENAME, and the comments from the file will be printed to standard output. There are two command-line options:

  • --quiet or -q: Only print the comments themselves. By default, each comment will be prefixed by "Comment #N: "; setting this option disables that.
  • --limit N or -l N: Only print the first N comments from the document. By default, all comments in the document will be printed.

For example, to print only the value of the document's first comment, you can call java -jar (PATH_TO_JAR_FILE) --quiet --limit 1 (FILENAME).


Copyright (C) 2012 Charles Pence, and released under the MIT license.