||3 months ago|
|ISH2019||3 months ago|
|ISH2021||3 months ago|
|src||3 months ago|
|.gitignore||4 months ago|
|LICENSE.txt||4 months ago|
|README-original.md||3 months ago|
|README.md||3 months ago|
|requirements.txt||3 months ago|
ISH Conference Scheduler
Do you want to run a conference? Do you need to bundle together individual submissions into sessions that both make coherent sense with one another and are different enough from the other sessions happening at the same time to prevent people from bouncing back and forth between rooms as much as possible? Does that sound like something a computer should help you with? This is the software for you.
This code was cobbled together by Charles Pence on the basis of some existing code by Prashanti Manda and an article describing that code's use, available at http://doi.org/10.7717/peerj-cs.234. The goal was to use this system to evaluate sessions for the 2021 meeting of the International Society for the History, Philosophy, and Social Studies of Biology (ISHPSSB).
How to Use
This code is set up to work with python (I developed it with 3.9.4) and virtualenv. You can use these by executing (from this directory):
virtualenv .venv source .venv/bin/activate pip install -r requirements.txt
What this Does
The scripts here all have optional support for two kinds of conferences:
- A conference with only "individual papers" – that is, individual talks are submitted, and the algorithm bundles these into sessions of a given size.
- A conference with both "individual papers" and "sessions" – that is, where a session organizer submitted a set of talks that should be kept together.
Also, these scripts have support for "blocks" in your conference – different categories of time-slot, where users can choose which blocks they are willing to present in. (Think, for example, of users picking between different times of day that are consistent with their local timezone for an online conference.)
In general, the algorithm here will perform the following steps:
- Assess the similarity between all of the paper and session abstracts in your meeting. There's a number of ways to do this, including LDA topic-modeling approaches and WordVec (word embedding) based solutions.
- Create an initial randomized schedule, consistent with user block preferences (if you're using those).
- Optimize that random schedule, by randomly swapping talks/sessions, with the
- Maximizing the similarity within a session (i.e., putting papers together that are similar)
- Minimizing the similarity between sessions in the same time-slot (i.e., reducing the incentive for people to jump between sessions)
There are two sets of sample data here – ISH2019 and ISH2021. The former was testing data, and only includes individual paper submissions. The latter was real data for the first schedule of the ISHPSSB 2021 conference, and includes both individual papers and sessions.
Brief details about the scripts found in the
src directory can be found here.
In general, all of the parameters that you can configure for these scripts can
be passed on the command line, and you can learn about them by calling
python <script.py> --help.
Note: For all three of the similarity scripts, if you are using both individual papers and sessions, you should pass all of the individual paper abstract files followed by all of the session abstract files. This will produce the right kind of combined document similarity matrix that is needed for the optimization script later on.
Similarity-LDA.py — Compute similarity with topic models
The first of three different algorithms for computing document similarity, this code creates topic models from the documents in the corpus, and then measures the distance between the representations of each talk or session in terms of those topics.
The major tunable parameters in this script are
The former sets the number of topics, and should be evaluated by examining the
metrics calculated for the models that result (there are scripts for looking at
these in the two example folders). The latter sets the number of passes through
the corpus for model training. In general, higher is better at the expense of
more time spent training the models.
By default, the distance between document vectors is computed with cosine
similarity; you can switch to Hellinger distance by passing
Similarity-WordVecWMD.py — Compute similarity with Word Mover Distance
This script uses the GloVe model for word embeddings to describe the positions of documents in semantic space, then computes the distance between them using Word Mover Distance (roughly, the amount of effort that would be required to transform the probability distribution of document A into that of document B).
There's no tuning for this algorithm. It is extremely resource intensive, and often CPU-limited; it will run in parallel to the extent possible on your hardware.
Similarity-WordVecSoftCosine.py — Compute similarity with soft cosine distance
This script also uses the GloVe model for word embeddings, but calculates distance between document vectors using soft cosine. It is much faster than WMD, though in my testing it produces lower quality results.
RandomSchedule.py — Generate random schedule
This script creates a random schedule (possibly taking into account consistency with preferences about blocks). Pass it the structure of blocks, time-slots, and sessions for your conference.
OptimizeSchedule.py — Optimize schedule
This script performs random swaps to attempt to produce an optimal schedule, using simulated annealing. There are a number of tunable parameters for this algorithm, though in the vast majority of cases the defaults will be acceptable.
Raising the value of
0.99 to, e.g.,
0.999 will allow for more
exploration of the space away from local optima, but will increase time and may
decrease final output quality. Note that
--iterations is an upper bound; the
algorithm will stop when it fails to produce an increase in solution quality for
1000 consecutive iterations.
The entire algorithm will be run
--runs times (default
200), and the final
script will print the best solution found among those runs.
PrintSchedule.py — Print readable schedule
This script prints a basic, readable version of the optimized schedule to stdout.
License and History
This code copyright 2021 Charles H. Pence and released under the GNU GPL v3. See LICENSE.txt.
This code is based upon the Automated Conference Scheduler code by Prashanti Manda, copyright 2014 and released under the GNU GPL. For information about that original project, see README-original.md.