Lab 4: Security and Integrity!
In this lab you are being asked to consider carefully how certain design decisions can impact on the security and integrity of data in a relational database. We can use the concept of 'normalisation' from relational database theory to help us evaluate and improve the design of a database, but the solutions are not always clearly defined.
Your aim will therefore be to understand and apply some basic design principles to your schema, in an effort to prevent inconsistencies and reduce data redundancy.
You will also practice backing up and restoring a mysql database - an important skill for anyone with a desire to protect data in their database!
Part 1: Data Normalisation
In Lecture 5 (24/10) you were introduced to the concept of 'data normalisation'. In relational database theory, data normalisation is essentially the process of designing/organising a relational database in such a way as to minimise data redundancy and inconsistent dependencies. Guidelines exist to help us do this, although much of it comes down to general common sense!
Group Task
In this task, you should work in a group of 3/4.
The task involves evaluating some changes I am proposing to make to my example musicstore schema, and coming up with some alternative solutions.
You should spend approximately 20 minutes discussing solutions to the problems presented. You do not need to implement your solutions!
The changes I am proposing are implemented in the lab-4 version of schema.sql. They can be summarised as follows:
- A lot of my customers are situated outside the UK. I would therefore like a user of my app to be able to see album prices in Euros as well as Pounds. I have therefore made some changes to my Album table.
- I would like to introduce a 'wish list' feature, which will enable general users of my app to store a list of albums they may like to purchase in the future. I have therefore made some changes to my Customer table.
- I would like to allow users to rate an album on a scale of 1-5. The average user rating will be displayed with each album. To enable this feature, I have modified the Album table, and created a new table, 'Rating'.
Look at the lab-4 schema.sql, and come up with alternative (and better!) solutions. We will discuss your suggestions together after the 20 minutes (and look at the example solution!)
Individual Task
In this task, you will review the schema of a colleague and advise them where you think their schema fails to satisfy the requirements for 1NF-3NF.
To help you, you are reminded of the requirements in the helper notes below.
Once you've given and received feedback, make any necessary changes to improve the integrity of your schema.
Helper Notes
The guidelines for data normalisation are represented by a set of 'normal forms'. These can be thought of as levels or degrees of normalisation. They range from 1 (the lowest form of normalisation), through to 5. Certain rules or criteria must be met to achieve each normalisation form.
The following examples illustrate the first 3 normal forms. It is unlikely you will need to go beyond that, but you can find out about higher forms in Connolly & Begg's Database Systems: A Practical Approach to Design, Implementation and Management, or through your own research.
First Normal Form (1NF)
This form deals with the 'shape' of a relation (table). It requires that:
- Each relation should be organised into records (rows)
- All occurrences of a record (row) must have the same number of fields
- Each record should have a primary key which distinguishes it as unique
For example, this is a bad design for an Employee/Manager relationship, as there may not always be the same number of employees associated with a given manager. Since, by definition, tables must have the same number of fields in each row, designing a table in this way results in wasted space.
Manager | Employee_1 | Employee_2 | Employee_3 |
---|---|---|---|
Bob | Dimitri | Alma | Jim |
Dawn | Sacha | NULL | NULL |
Zoe | Harold | NULL | NULL |
This would be better:
Employee | Manager |
---|---|
Dimitri | Bob |
Alma | Bob |
Jim | Bob |
Sacha | Dawn |
Harold | Zoe |
However, it still relies on each employee having a unique name. For the sake of readability, we'll not worry about that in these examples.
Second Normal Form (2NF)
Asks: Does this column serve to describe what the primary key identifies?
To achieve 2nd normal form, in addition to meeting the criteria for 1st normal form, we should also:
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables
- Create relationships between these new tables and their predecessors through the use of foreign keys
For example, suppose we wanted to store the office number and location of each employee. We might add Office and Location fields like so:
Employee | Manager | Office | Location |
---|---|---|---|
Dimitri | Bob | 1 | London |
Alma | Bob | 1 | London |
Jim | Bob | 1 | London |
Sacha | Dawn | 2 | London |
Harold | Zoe | 3 | Maidstone |
But this is inefficient as we have lots of repeated data. It would be better to split the table into 2 smaller tables, and define a foreign key to link them:
Office_ID | Location |
---|---|
1 | London |
2 | London |
3 | Maidstone |
Employee | Manager | Office_ID |
---|---|---|
Dimitri | Bob | 1 |
Alma | Bob | 1 |
Jim | Bob | 1 |
Sacha | Dawn | 2 |
Harold | Zoe | 3 |
Even better, if we know that each office only has one manager, we could also do:
Office_ID | Location | Manager |
---|---|---|
1 | London | Bob |
2 | London | Dawn |
3 | Maidstone | Zoe |
Employee | Office_ID |
---|---|
Dimitri | 1 |
Alma | 1 |
Jim | 1 |
Sacha | 2 |
Harold | 3 |
Now we have reduced our storage requirements and achieved second normal form.
Third Normal Form (3NF)
Asks: Are the values of all columns in this table independent of one another?
To meet third normal form, we must meet an additional criterion:
- Remove columns that are not fully dependent on the primary key
This means that all the fields in the table should be independent of one another.
For example, let's say that we want to know how much each employee is paid each week based on their contracted hours and hourly rate. We might organise our table like so:
Employee | Office_ID | Hourly_Wage | Contracted_Hrs | Weekly_Pay |
---|---|---|---|---|
Dimitri | 1 | 9.50 | 20.5 | 194.75 |
Alma | 1 | 10.25 | 16 | 164 |
Jim | 1 | 9.00 | 37.5 | 337.5 |
Sacha | 2 | 9.00 | 37.5 | 337.5 |
Harold | 3 | 10.25 | 37.5 | 384.38 |
Here, the fields Office_ID, Hourly_Wage and Contracted_Hours are independent of one another. Although we could potentially have a situation whereby every employee in a particular office was paid the same hourly wage, this is not the case here. Therefore we can say that Hourly_Wage is fully dependent on the primary key.
Conversely, Weekly_Pay may be derived from Contracted_Hrs and Hourly_Wage. It can therefore be said that the data in this column is not fully dependent on the primary key.
To reduce data redundancy and achieve 3NF, we should eliminate Weekly_Pay field, like so:
Employee | Office_ID | Hourly_Wage | Contracted_Hrs |
---|---|---|---|
Dimitri | 1 | 9.50 | 20.5 |
Alma | 1 | 10.25 | 16 |
Jim | 1 | 9.00 | 37.5 |
Sacha | 2 | 9.00 | 37.5 |
Harold | 3 | 10.25 | 37.5 |
Part 2: Backing Up and Restoring a MySQL Database
Knowing how to back up a database is an essential skill for any database administrator. Of course, backing up may not only be for the purposes of securing data; you can follow the same process should you wish to move or clone the database for any reason.
Although you may not have much cause to do this right now, it is still recommended that you follow the tutorial to familiarise yourself with these simple back-up/restore techniques. 'Doing' is the best way to commit to memory what could potentially pop up in the exam!
mysqldump
Create a back-up using A common method of backing up a MySQL database is to use the mysqldump
back-up program. This program is installed alongside MySQL, and can be invoked from the command prompt. This exercise assumes that your database is hosted on IGOR.
- SSH into the IGOR via terminal:
ssh USERNAME@doc.gold.ac.uk
-
cd
into your application'ssql
directory (e.g.cd public_html/dnw/USERNAME_recordstore/sql
) - Run the
mysqldump
program to create a back-up: mysqldump --databases DB_NAME > NAME_OF_OUTPUT_FILE.sql -p The--databases
option tells the program we want it to specify the database(s) to include in the dump. Alternatively, if you want to dump all your databases, you could replace this and the database name with,--all-databases
. Using either of these options will include theCREATE DATABASE
andUSE
queries in the dump, meaning that a fresh copy of this database, with the exact same name, will be created whenever this dump file is 'restored'. - Open up the resulting dump file in an editor and see what it contains!
mysqldump
Restore a database using - Swap dump files with the person next to you (you could do this by email, as your files will be small). If you haven't got someone to share with, you can use my example (under week-9 in lab exercises).
- Make sure their dump file is saved into your
sql
folder on IGOR - Open the file and locate the
CREATE DATABASE
andUSE
statements (near the start of the file) - In both statements, edit the database name to be something else, something like:
USERNAME_cloned_db
- Restore their database in your own MySQL namespace: mysql -u USERNAME < THEIR_DUMP_FILE.sql -p
- Check to see if a copy of their database has been cloned in your namespace, either via the phpMyAdmin application, or by accessing the mysql shell: mysql -u USERNAME -p From the mysql shell: SHOW DATABASES; -- To check the database was created USE USERNAME_cloned_db; -- To switch to the cloned database SHOW TABLES; -- To list the tables in the schema SELECT * FROM tbl_name; -- To browse the contents of a particular table
Back-up data in a text file
Another way to back-up or share data from your database is to output the result of a SELECT
query into a text file using the OUTFILE
clause. This might be useful if you want to open your data in another software, such as Excel.
Because OUTFILE
attempts to write the output file to the mysql server (which you don't have the 'privilege' to do), you will not be able to execute your query from the mysql shell directly. Instead you must execute your query from the IGOR command prompt.
- SSH into IGOR
- Decide which of your tables you would like to output the data from, and run the following (replacing
tbl_name
with your chosen table name): mysql DATABASE_NAME -e "SELECT * FROM tbl_name" > text_dump.csv -p You could also include a table join in your SQL query, if you want to make it more interesting! - Open the resulting file in Excel. You may be prompted at the start to specify the delimiter that is used in the file (in this case, tabs).
Extension
If you managed all of that without much difficulty, you may want to consider installing a local copy of MySQL DBMS, in order for you to explore its Access Control System. Please note, this is only a suggestion - not a requirement!