Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • L lab-exercises
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2
    • Issues 2
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • data-networks-web
  • lab-exercises
  • Wiki
  • lab 6

Last edited by Sorrel Harriet Jun 08, 2018
Page history

lab 6

Lab 6: Database APIs

This lab exercise does not carry coursework marks directly, but you will be required to build on these techniques in future assessments (the next assessment will be in week 8).


Overview

Last week you began working on the application logic (middleware) of the catflucks application in Python. You also set up a web server that was able to execute these scripts and respond to a client with the resulting HTML.

This week you will go one step further by attaching a backend (database) to the application. This will involve using a database API. The API will provide an interface between your python scripts and a database.

At this point, you should switch to developing your scripts from your own working git repository, and regularly committing back to GitLab. This will also allow you the freedom to deviate from the class example - something that is strongly encouraged! The more experience you have applying the knowledge and skills taught on the module to different situations, the better placed you will be to sit the exam.

Please also take time to review the feedback on your Lab 5 Quiz Question 11, as this was the kind of question that would be asked in an exam, and might carry quite a few marks!


Learning objectives

  • Utilise a database API to query a MySQL or MongoDB database from a Python script
  • Output dynamic content in the browser
  • Utilise an aggregation pipeline operator in a mongoDB query
  • Reference mongoDB document objects within Python scripts

Task 1: Check Mongo is installed on your VS

If you were in lecture 6, you will have done this part already and can skip to task 2!

Pre-requisites:

  • Virtual Server
  • MongoDB 3.2 or higher OR MySQL 5.4 or later
  • pymongo API OR MySql connector API

To progress from this point, you need to have completed Lab 5. In other words, you need to have installed MongoDB 3.2 or later on your virtual server. If you have not done this, you will need to revisit Lab 5.

If you cannot remember, try checking the status of your mongo daemon:

	sudo systemctl status mongodb

If you don't see some green writing, try starting it:

	sudo systemctl start mongodb

If that doesn't work ,it probably isn't installed, so go back to Lab 5.


Task 2: Restore newer version of catflucks database

A newer version of the catflucks database has appeared in the lab-6 folder.

  • Connect to your virtual server and pull from lab-exercises (remember there are several options for connection depending what network you're on. See Lab 5 Task 2 for a reminder what these are)

      ssh sharr003@myserver.doc.gold.ac.uk -p 2<ID>
      cd dnw/lab-exercises
      git pull origin master
  • If you already restored a catflucks database last week, you first need to delete the old one from the mongo shell:

      mongo
      use catflucks
      db.dropDatabase()
  • <Ctrl+c> to exit the shell

  • cd inside the lab-6 folder (the new dump has been added there...be careful not to confuse it with the lab-5 version)

  • From the lab-6 folder, call mongorestore

      cd lab-6
      mongorestore

Assuming MongoDB is set up and running, you should now see some output suggesting that the database has been restored. To check, go inside the mongo interactive shell and explore your databases:

	mongo
	show dbs
	use catflucks
	show collections
	db.images.findOne({})
  • Try calling the findOne collection method on all the collections, and make a note of how the data has been modelled.
  • What do you notice has changed since last week, and why might this be significant?

Note: The MongoDB interactive shell utilises a JavaScript API, which means it `looks' like JavaScript (stylistically). The Python version of the API (pymongo) that you will use from your Python scripts provides the same set of functions, but in a Python style. For example, findOne() becomes find_one(). It is worth knowing this, as it means you can probably guess what some of the functions are, having already used them from the MongoDB interactive shell in Lab 4!

MySQL users

If you plan to use MySQL, you will need to repeat part of Lab 3 (Restore a MySQL database) on your virtual server. Technically, you could just use your existing setup on Igor, but you will then be unable to serve the app and access it from a browser...hence why it is recommended to at least test your app from a virtual server.


Task 3: Establish a connection with your database

The next steps will be to establish a connection with your database from a Python script, and to test that connection. You may find it helpful to refer to the example scripts in the lab-6/cgi-bin folder.

In the lab-6 example, a mongoDB database is being referenced from the db_connect() function, but you could adapt this function for a MySQL database. Support for MySQL is available in this MySQL Connector Notebook. Bear in mind you will need the MySQL connector API installed too.

  • Switch to working from your GitLab project clone on the Virtual Server. If you haven't already done so, clone your project directory there:

      cd ~/dnw
      git clone git@gitlab.doc.gold.ac.uk:USERNAME/YOUR-PROJECT-NAME
      cd YOUR-PROJECT-NAME
  • Inside your project working directory, make a cgi-bin directory to house your CGI scripts:

      mkdir cgi-bin
  • Open a new .py script from a text editor of your choice (although the choice is obviously limited unless you decide to mount your file system over SSH. In the example, we will call this script `utils', because it is going to contain some useful `utility' functions we can reuse in this app, and possibly others too!

      vim utils.py
  • Inside the new file, import the MongoClient class of the pymongo API and instantiate an instance of a client. A MongoClient represents a connection with one or more Mongo instances.

      from pymongo import MongoClient
    
      client = MongoClient('mongodb://localhost:27017/')
  • Make a db handle to the catflucks database:

      db = client.catflucks
  • Execute a practice query and output the result:

      result = db.images.find({}).limit(5)
      for image in result:
      	print(image['url'], image['alt'])

In the above, result is a cursor object which requires you to iterate over it in order to pull out individual records from the result set.

If you want only a single document, you can call find_one(), which returns only a single document (hence no need for iteration):

    image = db.images.find_one({})
    print(image['url'], image['alt'])

At this stage in development, you don't actually need the web server to run your scripts; you can run them from bash (terminal) and check the output corresponds with what you were expecting. For example,

    python3 cgi-bin/utils.py 

You might also notice that, in the example version of this script, the connection functionality has been wrapped up in a function called, db_connect.

  • What do you think the advantages of this are?

The db_connect function utilises a try, except statement, which is a technique used in Python to handle specific errors more gracefully (i.e. to prevent a program from crashing, and to decide what to do in the event of specific types of error). You can read more about try/except statements here (they can be very useful!)

  • Define a function around the part of the code that handles the database connection. It should return a handle to your catflucks database.

Task 4: Import your connection function from another script

The database connection function you defined in your utils.py script can now be reused elsewhere in the app (assuming - for now - that they are in the same location in the file system).

  • Make a script called serve_cat.py (or something else if you aren't making a catflucks app!) This sole purpose of this script will be to serve a random cat image (at least, that's the idea...)

  • At the top of this script, import your connection handling function from the utils script (note that you don't include the .py extension):

      from utils import db_connect
  • Call the db_connect() function:

      db_connect()
  • Test it by cut and pasting your test query there:

      image = db.images.find_one({})
      print(image['url'], image['alt'])
  • And running the script...

              python3 cgi-bin/serve_cat.py

Task 5: Query the database for a random document

  • Modify serve_cat.py so that it outputs a random cat image from the database.

You might recognise this bit from last week's lecture! A solution to that problem has now appeared in the lab-5 folder (splash2_fixed.py). The new serve_cat.py is virtually the same, except that the part of the code related to the database connection has been taken out and put elsewhere.

The trickiest bit of the exercise related to the retrieval of a random document from the images collection. Although you might have considered fetching the entire collection of image documents and using the random module to pull out a random element, you should also consider the downside to that solution: what if the collection is very large?

The $sample aggregation operator can be used in a mongoDB query to retrieve a number of random documents from the database. You can read about it from the mongoDB documentation.

Tip: Look at the example that is given in the MongoDB documentation, and adapt it for the catflucks app.

You should also look at the page that explains how aggregation operators work in mongoDB. Essentially, it involves passing a collection of documents through a pipeline of operators, and returning the processed result set. There are various different types of operator that can be used in an aggregation pipeline, as you will see from the documentation.


Task 6: Add a `skip' link

This is a simple (and temporary) solution to the Task 2 exercise you encountered in Lecture 6.

For that task, you were asked to,

implement a `skip’ button that would simply present a new random cat image if pressed.

One answer to that would be to output a hyperlink in the HTML which has its href attribute set to the current cgi script. When clicked, it simply forces a new GET request to be made to the same script on the server, causing it to be executed again.

  • Edit the HTML string in serve_cat.py to include a hyperlink which will temporarily serve as a skip button:

      <a href="/cgi-bin/serve_cat.py" title="serve cat">Skip this cat</a>
  • Can you think of any limitations to this approach? (Think about what you might like to learn from the flucks data).


Task 7: Add a `fluckometer'

  • Can you figure out a way of outputting the number of times the current image has been `flucked'?

Tips:

For this task, you'll need to learn how to reference a document object from a python script.

You may recall that, from the mongo shell, it was possible to do something like:

	db.flucks.find( {"_id": ObjectId("5a1577a6bc93104b641721f8")} )

Unfortunately this won't work from your python script because the interpreter doesn't know what an ObjectId object is. Fortunately, there are tools for working with mongoDB ObjectIds from Python scripts.

The one you'll need today is the ObjectId class which is provided by the bson.objectid module.

You'll then need to explore the mongoDB documentation for something similar to MySQL's COUNT statement.

Good luck, and try to resist jumping straight to the solution in the example cat_serve.py script! ;-)

Alternatively...

If you are deviating from the catflucks app, can you think of somewhere in your app where it would be useful to know how many of something there are? Adapt Task 7 accordingly!


Task 8: Review and reflect

This task requires you to reflect on tasks 1-7, and to start thinking a bit more carefully about the way the app has been designed.

  • Could you now identify and describe the 5 bugs that were in last week's splash2.py script?
  • What else is different about serve_cat.py compared with splash2_fixed.py? (Consider how the code has been organised)
  • What other improvements could be made to this script (and possibly others) to maximise the reusability of the code?

Extension: Add fluckability

You might recognise this task as Task 3 from the Week 6 lecture slides...

  • How could you implement a button that would result in a fluck being inserted in the database?

No tips this time. See what you can come up with by yourself...

Clone repository
  • Home
  • lab 1
  • lab 12
  • lab 13
  • lab 14
  • lab 15
  • lab 17
  • lab 18
  • lab 2
  • lab 20
  • lab 3
  • lab 4
  • lab 5
  • lab 6
  • lab 7
View All Pages