A command-line tool for managing Sciveyor and its document databases
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Charles Pence fd2b1655a3
Build category editing box, get the add-category button half-working.
6 months ago
bin Add test-coverage support. 11 months ago
categories Build category editing box, get the add-category button half-working. 6 months ago
cmd Add a skeleton categories command with a hello-world. 6 months ago
json Add the mongo validate command. 11 months ago
mongo Add synchronization code. 11 months ago
solr Add synchronization code. 11 months ago
sync Add a skeleton categories command with a hello-world. 6 months ago
transform Add json import. 11 months ago
util Clean up a few linter errors. 11 months ago
.gitignore Add test-coverage support. 11 months ago
.gitmodules Add the schema code from the previous mongo-tool. 11 months ago
LICENSE Add README, and fix license (whoops). 11 months ago
README.md Test and fix the journal list code. 6 months ago
go.mod Add a skeleton categories command with a hello-world. 6 months ago
go.sum Add a skeleton categories command with a hello-world. 6 months ago
main.go Initial commit, Cobra/Viper application skeleton. 11 months ago

README.md

Sciveyor Tool

This is a swiss-army-knife utility designed to administer Sciveyor installations, including collections of JSON files in Sciveyor format, MongoDB servers containing Sciveyor data, and user-configurable parameters in the Sciveyor installation itself, such as journal categories.

Contents

Building

  1. Check out the code, including the submodules: git clone --recurse-submodules ...
  2. Build: go build
  3. Run: ./sciveyor-tool ...

Requirements

  1. A MongoDB server, with a collection of documents that follow the schema spelled out here. FIXME: At some point in the future, this server will become more complex, with support for other collections carrying information about disambiguated authors, journals, and institutions. That support is not currently available in this tool.
  2. A Solr server, pre-loaded with the schema described here. The easiest way to obtain one of these is to spin up the Docker image at this link.

Usage

You can get a list of all the available major commands by running sciveyor-tool --help, and you can get more help on any command by running sciveyor-tool <command> --help.

You can also set persistent configuration flags for sciveyor-tool by creating a configuration file. The default path for the configuration file is ~/.sciveyor.yaml, and you can set a custom path for the configuration file by passing --config <path>.

json import: Import JSON Files to MongoDB

This tool should be used whenever you want to import JSON documents (once again, in the JSON schema specified by Sciveyor) into the MongoDB server.

To use it, call sciveyor-tool as follows:

./sciveyor-tool json import \
  --batch-size NUM \
  --mongo-address mongodb://localhost \
  --mongo-database YourDatabase \
  --mongo-collection documents \
  --mongo-timeout SECS \
  <files> ...

For information about the MongoDB connection flags, see the sync command.

The files argument may either refer to specific files or to glob patterns.

The --batch-size flag can be set to any number of documents (it defaults to 100). The optimal size will depend on your connection to your MongoDB server, your document sizes, and your network configuration, but 100 works for most purposes.

Note that no schema validation at all will be done on these documents, though they will be passed through several kinds of essential transformations (for example, converting the dates from JSON string format to MongoDB date format).

json validate: Validate JSON Files

The tool can be used to check whether or not a collection of JSON files on disk conforms to the Sciveyor JSON schema. To use it, call sciveyor-tool as follows:

./sciveyor-tool json validate [--loose] [--unique] /path/to/*.json

The files arguments may either refer to specific files or to glob patterns.

For information about the --loose flag, see the mongo validate command. If --unique is passed, then the validation will parse each file, load its ID value, and check to see if there are any duplicate ID values among the JSON files that are passed. This will slow down validation, so it is disabled by default.

mongo validate: Validate MongoDB Documents

The tool can be used to check whether or not the contents of a given MongoDB server conform to the Sciveyor JSON schema. To use it, call sciveyor-tool as follows:

./sciveyor-tool validate [--strict] \
  --mongo-address mongodb://localhost \
  --mongo-database YourDatabase \
  --mongo-collection documents \
  --mongo-timeout SECS

For information about the MongoDB connection flags, see the sync command.

By default, the tool operates in "strict mode," and will thus check to make sure not only that the attributes of each document are valid, but also it will print errors if there are any fields in a document which do not appear in the JSON schema (that is, it will print errors for any "extra" fields in the documents). If you want to ignore these errors, you can deactivate strict mode by passing the --loose flag, in which case sciveyor-tool will silently ignore the presence of any extra fields, only printing errors if there are known fields containing invalid data.

sync: Synchronize MongoDB to Solr

The tool can be used to perform a three-step synchronization of the content from the MongoDB server to the Solr server. This is an extremely simple sync:

  1. For each document in the MongoDB database:
    1. If it is present in the Solr database, but either its version or its dataSourceVersion values have changed, delete and re-create it in the Solr database.
    2. If it is not present in the Solr database, create it.
  2. For each document in the Solr database:
    1. If it is not present in the Mongo database, delete it.

Notably, this is not a field-by-field synchronization. Documents are entirely overwritten, not partially updated (in Solr's terminology, we do not use "atomic updates"). We also do not detect any changes other than in the two version numbers. Version numbers must be bumped to trigger a sync. (This is an intentional policy choice.)

To use it, then, call sciveyor-tool as follows:

./sciveyor-tool sync [--force] \
  --mongo-address mongodb://localhost \
  --mongo-database YourDatabase \
  --mongo-collection documents \
  --mongo-timeout SECS \
  --solr-address http://localhost:8983/solr \
  --solr-collection sciveyor

The flags are simply the various connection options for the two servers. The mongo-address is a URL, which can specify username, password, and port (mongodb://user:pass@address:port). The mongo-database flag should be familiar from any connection to MongoDB. In almost all Sciveyor cases, the mongo-collection should be set to documents. The mongo-timeout flag controls how long we will wait for MongoDB timeouts, in seconds. It defaults to 30, for small synchronization jobs. For a very large synchronization (i.e., tens of thousands of documents), you will want to set this to a very high number.

The two Solr flag are the URL to the root of the server (which will almost always end with /solr), and the collection or core name currently in use. (The final Solr URLs, then, will append the collection to the address.)

For debugging purposes, it is occasionally helpful to force a sync -- that is, to delete and re-create every document in Solr with the corresponding copy from MongoDB. If this behavior is desired, you can pass --force. We strongly recommend that you do not use this feature.

Configuration File

The following settings may be persistently configured by editing the configuration file, located by default at ~/.sciveyor.yaml:

mongo:
  address: string
  collection: string
  database: string
  timeout: 30
solr:
  address: string
  collection: string
verbose: true

General Options

  • --config <path>: Specify an alternative path to a YAML-format configuration file.
  • --verbose, -v: By default, basic information (and, if on an interactive terminal, progress bars) will be printed to the console. To see more information, pass the --verbose flag.

Glob Patterns

All file arguments can also be passed a glob matching pattern. We use an extended syntax with support for:

  • *: any sequence of non-separator characters
  • **: any sequence of characters, including separators (recursive glob)
  • ?: any single non-separator character
  • [class]: character classes, of the form [abcd] (character list), [a-z] (character range), or [^a-z] (negated class)
  • {alt1,alt2,...}: a finite list of alternatives

Changelog

  • v0.8: Implemented a TUI category editor.
  • v0.7: Rewrite mongo-tool as sciveyor-tool, using Cobra and Viper instead of Kong.
  • v0.6: Fix our entirely broken Mongo date handling, and export in a different format to allow for storing them in Solr date objects. Fix a small bug with batched import.
  • v0.5: Add a batch-size flag to import.
  • v0.4: Move glob handling into the app, allowing for a --unique test in validate-files.
  • v0.3: Port command-line handling to Kong, and introduce a robust sub-command interface. Rename from mongo-solr to mongo-tool. Integrate the functionality of schema-tool into mongo-tool.
  • v0.2: Store all the date values in documents as ISODate in MongoDB.
  • v0.1: Initial support for only the fields mentioned in the JSON document schema.

License

The code here is copyright © 2021–2022 Charles H. Pence, and released under the GNU GPL v3.