Lab 3: Introduction to MySQL
This lab is assessed. The assessment is in the form of a quiz, which is open on the VLE until 14:00 PM on Friday 27th October. Extensions will only be granted if you have extenuating circumstances, and must be applied for via the proper channels.
Learning Aims
In this lab you will learn to:
- access the MySQL interactive shell from Igor.
- create a new database on your MySQL account on Igor.
- run batches of SQL statements from .sql files which will set up the tables in your database, and insert some dummy data. (The schema is the same one we started working on together in Lecture 3.)
- execute some SQL statements to retrieve details about the schema and the data it contains.
- query the database to retrieve a specific subset of the data.
You will do all this in order to answer the quiz questions which form this week's assessment.
Context
The database you are working on for this lab closely resembles one that was designed by students in Lab 2. It is arguably a very creative, but not very useful form of web application.
The application is called `Cat Flucks', and its basic functionality is as follows:
- A user may register an account
- A user of the site is randomly served a cat image
- If they find they image compelling in some way, they may optionally press a button which records their reaction (known as a `Fluck') in the database
- Some statistics are displayed along with the image
In future labs, you will see the Cat Flucks application developed further, but you will then be free to adapt the exercises to a scenario that better serves your interests.
Task 1: Access the MySQL interactive shell on Igor
-
Activate your MySQL account on Igor by going to doc.gold.ac.uk/dept/intranet and clicking on `MySQL Account Status' (under `Applications and Utilities'). You'll be required to set a password which you'll use to connect to your MySQL databases from terminal, or any Python scripts you may write. Do not use a password you would use to any other system holding your personal data, as it may at some point be hard-coded into Python scripts.
-
To log into Igor:
ssh USERNAME@doc.gold.ac.uk
-
To start the MySQL shell:
mysql -u USERNAME -p
Task 2: Create a new database
-
First see what databases you already have linked to your account:
SHOW DATABASES;
You will already have a couple, but these just contain information about your mysql account (i.e. ignore them).
-
Any databases you create on Igor must be given a name in the format:
USERNAME_mydatabase
(i.e. prefixed with,USERNAME_
). To create a new database, run:CREATE DATABASE USERNAME_flucks;
In the above, USERNAME_flucks
is the name of the database. You should change the username to your own. You can call your database what you like, but flucks is suggested here because it relates to the application name.
- Log out of shell by typing any of the following:
ctrl+d
(linux only),\q
,quit
,exit
Task 3: Clone the lab-exercises repository
The SQL statements and associated data required for this exercise are in the lab-exercises repository. To access it, you should clone it on Igor, into your dnw folder.
cd public_html/dnw
git clone git@gitlab.doc.gold.ac.uk:data-networks-web/lab-exercises.git
Task 4: Restore a database
-
Change directory into the lab-3/sql folder:
cd lab-exercises/lab-3/sql
-
Do
ls
to see what is in the folder. You should see 4 files:-
Flucks_Schema.sql contains the schema definition (i.e. it contains a set of
TABLE CREATE
statements which define the tables in the database). - The other 4 files contain the
INSERT
statements which will populate the tables with dummy data.
-
Flucks_Schema.sql contains the schema definition (i.e. it contains a set of
-
Run the statements in the Flucks_Schema.sql file in one of 2 ways:
-
From terminal:
mysql DATABASE_NAME -p < Flucks_Schema.sql
-
From the MySQL shell:
USE DATABASE_NAME; SOURCE Flucks_Schema.sql;
-
The advantage of the first option is that you don't have to enter/exit the MySQL shell, but it also means that nothing is reported (printed to terminal) unless an error occurs.
Notice in Option 2 that each of the SQL statements is terminated by a semicolon. If you forget the semicolon, MySQL will continue to prompt you for input, until it receives a semicolon. You can abort a statement with ctrl+c
if you make a mistake.
Task 5: Explore the schema
-
Log back into mysql shell (if you haven't already)
-
Switch to the new database you created:
USE DATABASE_NAME;
-
See what tables were created:
SHOW TABLES;
-
See what fields have been defined within a given table:
DESCRIBE TableName;
-
See what data is in a table:
SELECT * FROM TableName LIMIT 5;
Here, the asterisk (*
) is a wildcard, which means `all'. Including the LIMIT
clause restricts how many records (rows) are returned in the result-set.
- You can now answer questions 1-3 in the quiz!
Task 6: Input the dummy data
To insert some dummy data into the database, you need to run the batches of INSERT
statements that are in the remaining .sql files.
The order in which you do this matters. Why do you think this might be? Answer questions 4 and 5 of the quiz!
-
Run each batch of INSERT statements, one at a time, from the MySQL shell:
SOURCE TableName_Inserts.sql;
In the above, replace `TableName' with the name of the table you are inserting data into. Repeat the process for the other 3 files, but in the right order!
Task 7: Query the data
Now that you have inserted the data, you are able to perform SQL queries on it. To read data from a database with SQL, we use the SELECT
statement. The basic format for a SELECT
statement looks like this:
SELECT fieldname1, fieldname2 FROM TableName;
You may also use the wildcard (*
) to return all fields in a table:
SELECT * FROM TableName;
SELECT
can be combined with various clauses to restrict/filter the records returned in the result-set. For example, the WHERE
clause can be used to return only records matching a specific criteria. For example, try running the following:
SELECT firstname, lastname FROM Account WHERE id=56;
- Write a similar query to find out usernames of all Accounts where the lastname is `Tyson'. Answer question 6! (without cheating...)
Often we want to return result-sets which include data from more than 1 table. For example, I may wish to know the urls of any `Flucked' media. To combine data from multiple tables, we use something called a JOIN
. There are different kinds of join, but we will just consider the simplest of these today, which is an INNER JOIN
.
When you use a join, you need to specify how the 2 tables are related (i.e. ON
which field are they connected?)
The basic format for a table join looks like this, where it is assumed that a foreign key (fk) in Table2 references a primary key (pk) in Table1:
SELECT Table1.fieldname, Table2.fieldname FROM Table1
INNER JOIN Table2
ON Table1.pkfieldname=Table2.fkfieldname;
-
Try out the following join:
SELECT Media.img_url, MediaServe.id FROM Media INNER JOIN MediaServe ON Media.id=MediaServe.media_id;
You can join more than 2 tables in a result-set simply by including more joins!
-
Can you replace the `BLANK's in the following query to fetch 10 urls of images that have been `Flucked', along with the username of whoever `Flucked' them? Answer question 7!
SELECT BLANK.img_url, Account.username FROM Media INNER JOIN MediaServe ON Media.id=MediaServe.media_id INNER JOIN BLANK ON FluckGiven.serve_id=MediaServe.id INNER JOIN Account ON Account.id=MediaServe.BLANK LIMIT 10;
A technique we can use to make SQL statements slightly shorter and more readable, is by utilising column and/or table aliases. Column aliases can also make working with the results of our queries easier, as they give us control over the fieldnames (column headers) in the result-set. This is particularly helpful when combining data from 2 tables in which both tables had one or more of the same fieldnames.
-
Run the following query to return the id's of MediaServes and the id's of their associated Media Answer question 8!
SELECT m.id AS media_id, ms.id AS serve_id FROM Media m INNER JOIN MediaServe ms ON m.id=ms.media_id LIMIT 10;
Task 8: Extension!
To answer questions 9 and 10 (in an honest way), you need to perform a more advanced SQL query which utilises something called an `Aggregate function'. These are a special set of functions which perform operations on columns in a table, returning the result of the operation. For example, the COUNT
function can be used to return the number of records matching a query, while the AVERAGE
function will retrieve the average value in columns containing numerical data.
We would like to know which user has given the most Flucks. We would like to see a result-set containing the names of all users who have given Flucks, and the number of Flucks they have given.
-
Execute the following query after replacing the BLANK's! Who gave the most Flucks? Answer question 9!
SELECT firstname, lastname, COUNT(*) AS num_flucks FROM BLANK fg INNER JOIN MediaServe ms ON fg.BLANK=ms.id INNER JOIN Account ac ON ms.account_id=BLANK.BLANK GROUP BY account_id;
You can change the order in which the records appear in the result-set by including the ORDER BY
clause.
- How could you modify the query to order the result-set, such that the user who gave most Flucks appears first? Answer question 10! (Tip: try out the question options)