|
1 month ago | |
---|---|---|
schema@317dde2737 | 2 months ago | |
statik | 2 months ago | |
transform | 1 month ago | |
util | 2 months ago | |
.gitignore | 2 months ago | |
.gitmodules | 3 months ago | |
LICENSE | 3 months ago | |
README.md | 1 month ago | |
go.mod | 1 month ago | |
go.sum | 1 month ago | |
import.go | 1 month ago | |
main.go | 2 months ago | |
mongo.go | 1 month ago | |
solr.go | 1 month ago | |
sync.go | 2 months ago | |
sync_mongo.go | 2 months ago | |
sync_solr.go | 1 month ago | |
validate_files.go | 1 month ago | |
validate_mongo.go | 1 month ago |
This is a utility designed to manipulate the MongoDB and Solr servers used to store and search the documents in Sciveyor.
git clone --recurse-submodules ...
go get github.com/rakyll/statik
go generate
go build
./mongo-tool ...
A number of sub-commands can then be used to perform various maintenance tasks on the MongoDB and Solr servers. You can get a list of all those tasks by running mongo-tool --help
, and you can get more help on any command by running mongo-tool <command> --help
.
sync
: Synchronize MongoDB to SolrThe tool can be used to perform a three-step synchronization of the content from the MongoDB server to the Solr server. This is an extremely simple sync:
version
or its dataSourceVersion
parameters have changed, delete and re-create it in the Solr database.Notably, this is not a proper atomic synchronization. Documents are deleted and re-created, not partially updated (in Solr's terminology, we do not use "atomic updates"). We also do not detect any changes other than in the two version parameters. Version numbers must be bumped to trigger a sync. (This is an intentional policy choice.)
To use it, then, call mongo-tool
as follows:
./mongo-tool sync \
--mongo-address=mongodb://localhost \
--mongo-database=YourDatabase \
--mongo-collection=documents \
--mongo-timeout=SECS \
--solr-address=http://localhost:8983/solr \
--solr-collection=sciveyor
The parameters are simply the various connection options for the two servers. The mongo-address
is a URL, which can specify username, password, and port (mongodb://user:pass@address:port
). The mongo-database
parameter should be familiar from any connection to MongoDB. In almost all Sciveyor cases, the mongo-collection
should be set to documents
. The mongo-timeout
parameter controls how long we will wait for MongoDB timeouts. It defaults to 30, but might need to be much higher in some applications.
The two Solr parameters are the URL to the root of the server (which will almost always end with /solr
), and the collection or core name currently in use. (The final Solr URLs, then, will append the collection to the address.)
For debugging purposes, it is occasionally helpful to force a sync -- that is, to delete and re-create every document in Solr with the corresponding copy from MongoDB. If this behavior is desired, you can pass --force
. We strongly recommend that you do not use this feature.
import
: Import JSON Files to MongoDBThis tool should be used whenever you want to import JSON documents (once again, in the JSON schema specified by Sciveyor) into the MongoDB server.
To use it, call mongo-tool
as follows:
./mongo-tool import \
--batch-size=NUM \
--mongo-address=mongodb://localhost \
--mongo-database=YourDatabase \
--mongo-collection=documents \
--mongo-timeout=SECS \
<files> ...
For information about the MongoDB connection parameters, see the sync
command above.
The files parameter may either refer to specific files or to glob patterns.
The --batch-size
parameter can be set to any number of documents (it defaults to 100). The optimal size will depend on your connection to your MongoDB server, your document sizes, and your network configuration, but 100 works for most purposes.
Note that no schema validation at all will be done on these documents, though they will be passed through several kinds of essential transformations (for example, converting the dates from JSON string format to MongoDB date format).
validate
: Validate MongoDB DocumentsThe tool can be used to check whether or not the contents of a given MongoDB server conform to the Sciveyor JSON schema. To use it, call mongo-tool
as follows:
./mongo-tool validate [--strict] \
--mongo-address=mongodb://localhost \
--mongo-database=YourDatabase \
--mongo-collection=documents \
--mongo-timeout=SECS
For information about the MongoDB connection parameters, see the sync
command above.
If --strict
is set (it defaults to true, you may disable it by passing --strict=false
), then the validation will check to make sure not only that the attributes of each document are valid, it will also print errors if there are any fields in a document which do not appear in the JSON schema (that is, it will print errors on any "extra" fields). Strict mode is activated by default. Passing --strict=false
will silently ignore the presence of any extra fields, only printing errors if there are fields containing invalid data.
validate-files
: Validate JSON FilesThe tool can be used to check whether or not a collection of JSON files on disk conforms to the Sciveyor JSON schema. To use it, call mongo-tool
as follows:
./mongo-tool validate-files [--strict] [--unique] /path/to/*.json
The files parameter may either refer to specific files or to glob patterns.
For information about the --strict
parameter, see the validate
command above. If --unique
is passed, then the validation will parse each file, load its ID value, and check to see if there are any duplicate ID values among the JSON files that are passed. This will slow down validation, so it is disabled by default.
--verbose
, -v
: By default, basic information about the sync will be printed to the console. To see much more information (including printed dumps of the IDs present in both the Mongo and Solr databases), pass the --verbose
flag.All file parameters can also be passed a glob matching pattern. We use an extended syntax with support for:
*
: any sequence of non-separator characters**
: any sequence of characters, including separators (recursive glob)?
: any single non-separator character[class]
: character classes, of the form [abcd]
(character list), [a-z]
(character range), or [^a-z]
(negated class){alt1,alt2,...}
: a finite list of alternativesimport
.--unique
test in validate-files
.mongo-solr
to mongo-tool
. Integrate the functionality of schema-tool
into mongo-tool
.date
values in documents as ISODate
in MongoDB.The code here is copyright © 2021 Charles H. Pence, and released under the GNU GPL v3.