|
|
# Lab 6: Database APIs
|
|
|
|
|
|
This lab exercise does not carry coursework marks directly, but you will be required to build on these techniques in future assessments (the next assessment will be in week 8).
|
|
|
|
|
|
---
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
Last week you began working on the application logic (middleware) of the catflucks application in Python. You also set up a web server that was able to execute these scripts and respond to a client with the resulting HTML.
|
|
|
|
|
|
This week you will go one step further by attaching a backend (database) to the application. This will involve using a database API. The API will provide an interface between your python scripts and a database.
|
|
|
|
|
|
At this point, you should switch to developing your scripts from your own working git repository, and regularly committing back to GitLab. This will also allow you the freedom to deviate from the class example - something that is strongly encouraged! The more experience you have applying the knowledge and skills taught on the module to different situations, the better placed you will be to sit the exam.
|
|
|
|
|
|
Please also take time to review the feedback on your Lab 5 Quiz Question 11, as this was the kind of question that would be asked in an exam, and might carry quite a few marks!
|
|
|
|
|
|
---
|
|
|
|
|
|
## Learning objectives
|
|
|
|
|
|
+ Utilise a database API to query a MySQL or MongoDB database from a Python script
|
|
|
+ Output [dynamic content](https://blog.stackpath.com/glossary/dynamic-content/) in the browser
|
|
|
+ Utilise an aggregation pipeline operator in a mongoDB query
|
|
|
+ Reference mongoDB document objects within Python scripts
|
|
|
|
|
|
---
|
|
|
|
|
|
### Task 1: Check Mongo is installed on your VS
|
|
|
|
|
|
If you were in lecture 6, you will have done this part already and can skip to task 2!
|
|
|
|
|
|
### Pre-requisites:
|
|
|
|
|
|
+ Virtual Server
|
|
|
+ MongoDB 3.2 or higher OR MySQL 5.4 or later
|
|
|
+ pymongo API OR MySql connector API
|
|
|
|
|
|
To progress from this point, you need to have completed Lab 5. In other words, you need to have installed MongoDB 3.2 or later on your virtual server. If you have not done this, you will need to revisit [Lab 5](web-server).
|
|
|
|
|
|
If you cannot remember, try checking the status of your mongo daemon:
|
|
|
|
|
|
sudo systemctl status mongodb
|
|
|
|
|
|
If you don't see some green writing, try starting it:
|
|
|
|
|
|
sudo systemctl start mongodb
|
|
|
|
|
|
If that doesn't work ,it probably isn't installed, so go back to [Lab 5](web-server).
|
|
|
|
|
|
---
|
|
|
|
|
|
## Task 2: Restore newer version of catflucks database
|
|
|
|
|
|
A newer version of the catflucks database has appeared in the lab-6 folder.
|
|
|
|
|
|
+ Connect to your virtual server and pull from lab-exercises (remember there are several options for connection depending what network you're on. See [Lab 5 Task 2](http://gitlab.doc.gold.ac.uk/data-networks-web/lab-exercises/wikis/web-server) for a reminder what these are)
|
|
|
|
|
|
ssh sharr003@myserver.doc.gold.ac.uk -p 2<ID>
|
|
|
cd dnw/lab-exercises
|
|
|
git pull origin master
|
|
|
+ If you already restored a catflucks database last week, you first need to delete the old one from the mongo shell:
|
|
|
|
|
|
mongo
|
|
|
use catflucks
|
|
|
db.dropDatabase()
|
|
|
+ `<Ctrl+c>` to exit the shell
|
|
|
+ `cd` inside the lab-6 folder (the new dump has been added there...be careful not to confuse it with the lab-5 version)
|
|
|
+ From the lab-6 folder, call `mongorestore`
|
|
|
|
|
|
cd lab-6
|
|
|
mongorestore
|
|
|
Assuming MongoDB is set up and running, you should now see some output suggesting that the database has been restored. To check, go inside the mongo interactive shell and explore your databases:
|
|
|
|
|
|
mongo
|
|
|
show dbs
|
|
|
use catflucks
|
|
|
show collections
|
|
|
db.images.findOne({})
|
|
|
|
|
|
+ Try calling the `findOne` collection method on all the collections, and make a note of how the data has been modelled.
|
|
|
+ *What do you notice has changed since last week, and why might this be significant?*
|
|
|
|
|
|
**Note:** The MongoDB interactive shell utilises a JavaScript API, which means it \`looks' like JavaScript (stylistically). The Python version of the API (pymongo) that you will use from your Python scripts provides the same set of functions, but in a Python style. For example, `findOne()` becomes `find_one()`. It is worth knowing this, as it means you can probably guess what some of the functions are, having already used them from the MongoDB interactive shell in [Lab 4](http://gitlab.doc.gold.ac.uk/data-networks-web/lab-exercises/wikis/intro-to-mongodb)!
|
|
|
|
|
|
### MySQL users
|
|
|
|
|
|
If you plan to use MySQL, you will need to repeat part of [Lab 3](intro-to-mysql) (Restore a MySQL database) on your virtual server. Technically, you could just use your existing setup on Igor, but you will then be unable to serve the app and access it from a browser...hence why it is recommended to at least test your app from a virtual server.
|
|
|
|
|
|
---
|
|
|
|
|
|
## Task 3: Establish a connection with your database
|
|
|
|
|
|
The next steps will be to establish a connection with your database from a Python script, and to test that connection. You may find it helpful to refer to the example scripts in the [lab-6/cgi-bin](http://gitlab.doc.gold.ac.uk/data-networks-web/lab-exercises/tree/master/lab-6/cgi-bin) folder.
|
|
|
|
|
|
In the lab-6 example, a mongoDB database is being referenced from the `db_connect()` function, but you could adapt this function for a MySQL database. Support for MySQL is available in this [MySQL Connector Notebook](http://gitlab.doc.gold.ac.uk/data-networks-web/lab-exercises/blob/master/lab-6/mysql_connector.ipynb). Bear in mind you will need the [MySQL connector API](https://dev.mysql.com/doc/connector-python/en/connector-python-installation.html) installed too.
|
|
|
|
|
|
+ Switch to working from *your* GitLab project clone on the Virtual Server. If you haven't already done so, clone your project directory there:
|
|
|
|
|
|
cd ~/dnw
|
|
|
git clone git@gitlab.doc.gold.ac.uk:USERNAME/YOUR-PROJECT-NAME
|
|
|
cd YOUR-PROJECT-NAME
|
|
|
+ Inside your project working directory, make a `cgi-bin` directory to house your CGI scripts:
|
|
|
|
|
|
mkdir cgi-bin
|
|
|
+ Open a new .py script from a text editor of your choice (although the choice is obviously limited unless you decide to [mount your file system over SSH](https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh). In the example, we will call this script \`utils', because it is going to contain some useful \`utility' functions we can reuse in this app, and possibly others too!
|
|
|
|
|
|
vim utils.py
|
|
|
+ Inside the new file, import the MongoClient class of the pymongo API and instantiate an instance of a client. A MongoClient represents a connection with one or more Mongo instances.
|
|
|
|
|
|
from pymongo import MongoClient
|
|
|
|
|
|
client = MongoClient('mongodb://localhost:27017/')
|
|
|
+ Make a db handle to the catflucks database:
|
|
|
|
|
|
db = client.catflucks
|
|
|
+ Execute a practice query and output the result:
|
|
|
|
|
|
result = db.images.find({}).limit(5)
|
|
|
for image in result:
|
|
|
print(image['url'], image['alt'])
|
|
|
In the above, result is a cursor object which requires you to iterate over it in order to pull out individual records from the result set.
|
|
|
|
|
|
If you want only a single document, you can call `find_one()`, which returns only a single document (hence no need for iteration):
|
|
|
|
|
|
image = db.images.find_one({})
|
|
|
print(image['url'], image['alt'])
|
|
|
At this stage in development, you don't actually need the web server to run your scripts; you can run them from bash (terminal) and check the output corresponds with what you were expecting. For example,
|
|
|
|
|
|
python3 cgi-bin/utils.py
|
|
|
|
|
|
You might also notice that, in the example version of this script, the connection functionality has been wrapped up in a function called, `db_connect`.
|
|
|
|
|
|
+ *What do you think the advantages of this are?*
|
|
|
|
|
|
The `db_connect` function utilises a try, except statement, which is a technique used in Python to handle specific errors more gracefully (i.e. to prevent a program from crashing, and to decide what to do in the event of specific types of error). You can [read more about try/except statements here](https://wiki.python.org/moin/HandlingExceptions) (they can be very useful!)
|
|
|
|
|
|
+ Define a function around the part of the code that handles the database connection. It should return a handle to your catflucks database.
|
|
|
|
|
|
---
|
|
|
|
|
|
## Task 4: Import your connection function from another script
|
|
|
|
|
|
The database connection function you defined in your utils.py script can now be reused elsewhere in the app (assuming - for now - that they are in the same location in the file system).
|
|
|
|
|
|
+ Make a script called serve_cat.py (or something else if you aren't making a catflucks app!) This sole purpose of this script will be to serve a random cat image (at least, that's the idea...)
|
|
|
+ At the top of this script, import your connection handling function from the utils script (note that you don't include the .py extension):
|
|
|
|
|
|
from utils import db_connect
|
|
|
|
|
|
+ Call the `db_connect()` function:
|
|
|
|
|
|
db_connect()
|
|
|
+ Test it by cut and pasting your test query there:
|
|
|
|
|
|
image = db.images.find_one({})
|
|
|
print(image['url'], image['alt'])
|
|
|
+ And running the script...
|
|
|
|
|
|
python3 cgi-bin/serve_cat.py
|
|
|
|
|
|
---
|
|
|
|
|
|
## Task 5: Query the database for a random document
|
|
|
|
|
|
+ Modify serve_cat.py so that it outputs a random cat image from the database.
|
|
|
|
|
|
You might recognise this bit from last week's lecture! A solution to that problem has now appeared in the lab-5 folder (splash2_fixed.py). The new serve_cat.py is virtually the same, except that the part of the code related to the database connection has been taken out and put elsewhere.
|
|
|
|
|
|
The trickiest bit of the exercise related to the retrieval of a random document from the images collection. Although you might have considered fetching the entire collection of image documents and using the random module to pull out a random element, you should also consider the downside to that solution: *what if the collection is very large?*
|
|
|
|
|
|
The `$sample` aggregation operator can be used in a mongoDB query to retrieve a number of random documents from the database. You can [read about it from the mongoDB documentation](https://docs.mongodb.com/v3.2/reference/operator/aggregation/sample/).
|
|
|
|
|
|
**Tip:** Look at the example that is given in the MongoDB documentation, and adapt it for the catflucks app.
|
|
|
|
|
|
You should also look at the page that explains [how aggregation operators work in mongoDB](https://docs.mongodb.com/v3.2/reference/operator/aggregation/). Essentially, it involves passing a collection of documents through a pipeline of operators, and returning the processed result set. There are various different types of operator that can be used in an aggregation pipeline, as you will see from the documentation.
|
|
|
|
|
|
---
|
|
|
|
|
|
## Task 6: Add a \`skip' link
|
|
|
|
|
|
This is a simple (and temporary) solution to the Task 2 exercise you encountered in Lecture 6.
|
|
|
|
|
|
For that task, you were asked to,
|
|
|
|
|
|
> *implement a `skip’ button that would simply present a new random cat image if pressed.*
|
|
|
|
|
|
One answer to that would be to output a hyperlink in the HTML which has its href attribute set to the current cgi script. When clicked, it simply forces a new `GET` request to be made to the same script on the server, causing it to be executed again.
|
|
|
|
|
|
+ Edit the HTML string in serve_cat.py to include a hyperlink which will temporarily serve as a skip button:
|
|
|
|
|
|
<a href="/cgi-bin/serve_cat.py" title="serve cat">Skip this cat</a>
|
|
|
|
|
|
+ *Can you think of any limitations to this approach?* (Think about what you might like to learn from the flucks data).
|
|
|
|
|
|
---
|
|
|
|
|
|
## Task 7: Add a \`fluckometer'
|
|
|
|
|
|
+ *Can you figure out a way of outputting the number of times the current image has been \`flucked'?*
|
|
|
|
|
|
**Tips:**
|
|
|
|
|
|
For this task, you'll need to learn how to reference a document object from a python script.
|
|
|
|
|
|
You may recall that, from the mongo shell, it was possible to do something like:
|
|
|
|
|
|
db.flucks.find( {"_id": ObjectId("5a1577a6bc93104b641721f8")} )
|
|
|
|
|
|
Unfortunately this won't work from your python script because the interpreter doesn't know what an `ObjectId` object is. Fortunately, there are [tools for working with mongoDB ObjectIds](http://api.mongodb.com/python/current/api/bson/objectid.html) from Python scripts.
|
|
|
|
|
|
The one you'll need today is the `ObjectId` class which is provided by the `bson.objectid` module.
|
|
|
|
|
|
You'll then need to explore the mongoDB documentation for something similar to MySQL's `COUNT` statement.
|
|
|
|
|
|
**Good luck, and try to resist jumping straight to the solution in the example cat_serve.py script! ;-)**
|
|
|
|
|
|
### Alternatively...
|
|
|
|
|
|
If you are deviating from the catflucks app, can you think of somewhere in your app where it would be useful to know *how many* of something there are? Adapt Task 7 accordingly!
|
|
|
|
|
|
---
|
|
|
|
|
|
## Task 8: Review and reflect
|
|
|
|
|
|
This task requires you to reflect on tasks 1-7, and to start thinking a bit more carefully about the way the app has been designed.
|
|
|
|
|
|
+ *Could you now identify and describe the 5 bugs that were in last week's splash2.py script?*
|
|
|
+ *What else is different about serve_cat.py compared with splash2_fixed.py?* (Consider how the code has been organised)
|
|
|
+ *What other improvements could be made to this script (and possibly others) to maximise the reusability of the code?*
|
|
|
|
|
|
---
|
|
|
|
|
|
## Extension: Add fluckability
|
|
|
|
|
|
You might recognise *this* task as Task 3 from the Week 6 lecture slides...
|
|
|
|
|
|
+ *How could you implement a button that would result in a fluck being inserted in the database?*
|
|
|
|
|
|
No tips this time. See what you can come up with by yourself... |