GitLab now enforces expiry dates on tokens that originally had no set expiration date. Those tokens were given an expiration date of one year later. Please review your personal access tokens, project access tokens, and group access tokens to ensure you are aware of upcoming expirations. Administrators of GitLab can find more information on how to identify and mitigate interruption in our documentation.
This repository includes code that deals with the machine learning side of the final project.
Hierarchical Neural Story Generation [https://arxiv.org/abs/1805.04833](https://arxiv.org/abs/1805.04833)
## Structure
The _old-repo-rnn_ directory includes the initial, unfinished version of the model - does not produce very good results, I'm only keeping it for reference, not using it in the current version.
## Description and Instructions
The _fairseq-hsg_ directory hosts all files corresponding to the hierarchical model, including the script to scrape Science fiction stories from a [blog](https://blog.reedsy.com/short-stories/science-fiction/), the collected stories with their writing prompts, a script that analyses and prepares the data for training, and scripts to execute jobs on lara, using slurm. The latter include a script that creates a virtual runtime environment, a script that binarises the dataset prior to training and a training script.
The `.sh` files are scripts meant to be executed on lara via slurm. They include steps to binarise the data before training the model, the script to train the first model `train-job-01.sh` and the script that trains the fusion model `train-job-02.sh`. Files in the `environment-setup` folder need to be run at the beginning to instantiate a virtual runtime environment on slurm.
## Instructions
The two python notebooks deal with dataset preparation. `stories-scraping.ipynb` scrapes prompts (inputs) and stories (outputs) from a [blog](https://blog.reedsy.com/short-stories/science-fiction/) where weekly writing contests are held, and saves them into a temporary `raw_stories` directory. The `stories-analysis.ipynb` performs statistical analysis on the scraped data and prepares the data for training - separates words and punctuation, shortens the stories to a desired length (1800 words) and separates the data into train, valid and testing datasets. It also deletes temporary directories.
TBA
### Note
### Model Reference
If you're using a notebook, add an exclamation mark before running a python command. e.g.
[Hierarchical Neural Story Generation Paper](https://arxiv.org/abs/1805.04833)