From c19dc1533d4152f8945a5e9e7173819400db0a14 Mon Sep 17 00:00:00 2001
From: Zala Sesko <zsesk001@gold.ac.uk>
Date: Wed, 4 May 2022 11:27:03 +0000
Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D=20Update=20README.md?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 README.md | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/README.md b/README.md
index ddd12f0..e5b54f7 100644
--- a/README.md
+++ b/README.md
@@ -1,22 +1,18 @@
 # Machine Learning Hierarchical Story Generation
 
-### Model Reference
-
-Hierarchical Neural Story Generation [https://arxiv.org/abs/1805.04833](https://arxiv.org/abs/1805.04833)
+This repository includes code that deals with the machine learning side of the final project.
 
-## Structure
 
-The _old-repo-rnn_ directory includes the initial, unfinished version of the model - does not produce very good results, I'm only keeping it for reference, not using it in the current version.
+## Description and Instructions
 
-The _fairseq-hsg_ directory hosts all files corresponding to the hierarchical model, including the script to scrape Science fiction stories from a [blog](https://blog.reedsy.com/short-stories/science-fiction/), the collected stories with their writing prompts, a script that analyses and prepares the data for training, and scripts to execute jobs on lara, using slurm. The latter include a script that creates a virtual runtime environment, a script that binarises the dataset prior to training and a training script.
+The `.sh` files are scripts meant to be executed on lara via slurm. They include steps to binarise the data before training the model, the script to train the first model `train-job-01.sh` and the script that trains the fusion model `train-job-02.sh`. Files in the `environment-setup` folder need to be run at the beginning to instantiate a virtual runtime environment on slurm. 
 
-## Instructions
+The two python notebooks deal with dataset preparation. `stories-scraping.ipynb` scrapes prompts (inputs) and stories (outputs) from a [blog](https://blog.reedsy.com/short-stories/science-fiction/) where weekly writing contests are held, and saves them into a temporary `raw_stories` directory. The `stories-analysis.ipynb` performs statistical analysis on the scraped data and prepares the data for training - separates words and punctuation, shortens the stories to a desired length (1800 words) and separates the data into train, valid and testing datasets. It also deletes temporary directories.
 
-TBA
 
-### Note
+### Model Reference
 
-If you're using a notebook, add an exclamation mark before running a python command. e.g.
+[Hierarchical Neural Story Generation Paper](https://arxiv.org/abs/1805.04833)
 
-```!python train.py```
+[fairseq library](https://github.com/pytorch/fairseq/blob/main/examples/stories/README.md)
 
-- 
GitLab