A collection of Python utilities that we use across various scripts
Go to file
Charles Pence f28213f7a5
Fix a missing f-string.
2023-11-27 11:54:50 +01:00
src/utils Fix a missing f-string. 2023-11-27 11:54:50 +01:00
tests Initial commit of a real module structure; still testing. 2023-08-24 12:44:28 +02:00
.gitignore Add a decorator-composition decorator. 2023-09-12 10:20:04 +02:00
LICENSE first commit 2023-08-18 09:47:50 +02:00
README.md Update README. 2023-11-27 11:05:34 +01:00
pyproject.toml Add basic configuration support. 2023-10-22 19:35:07 +02:00


Assorted Python Utility Code

This is a collection of Python utilities that we use in various scripts across our research group.


We dont publish this module to PyPI, so you need to add it to your project by directly referencing the Git repository:

dependencies = [

Youll also need to add the following to pyproject.toml, if using Hatch (as most of our projects do):

allow-direct-references = true

If youre using mypy, it wont be able to find the typing information for a package thats locally installed like this. You can silence the warnings by setting the following:

module = ["utils", "utils.*"]
ignore_missing_imports = true


All methods in this repository are carefully documented with Sphinx documentation; see the autogenerated docs for more details. (FIXME: Im not actually generating and hosting these yet; watch this space.)

Briefly, this repository includes:

  • utils.config:
    • An API for fetching a configuration object for all of our utilities. This lets us store things like API keys in a centralized location. The utils.config.load method returns a dict with the following keys:
      • pubmed_api_key: A PubMed API key for faster metadata queries
  • utils.core_ext:
    • remove_nones: recursively remove None values from dictionaries (including in nested dictionaries or lists)
  • utils.corpus
    • format_document: create a short, formatted representation of a JSON document in our corpus format
    • options: click options to add for customization of format_document
  • utils.data_files
    • check_source: check the presence of a dataSource or textSource in a document
    • add_source: add a dataSource or textSource with version and timestamp
    • save_with_backup: save out data to a JSON file, making a backup if we would overwrite an existing file
  • utils.decorators
    • compose: compose multiple decorators into a single decorator
    • fail_counter: add a failure counter attribute (essentially a function-level static variable) to a function
  • utils.net
    • download_file: download a file to the local filesystem with nice progress bar reporting
  • utils.text
    • full_strip: remove HTML tags and extra whitespace from a string


All code in this repository is released under the GNU GPL v3.