Lab 3: Introduction to MySQL
This lab is assessed. The assessment is in the form of a quiz, which is open on the VLE until 14:00 PM on Friday 27th October. Extensions will only be granted if you have extenuating circumstances, and must be applied for via the proper channels.
In this lab you will learn to:
- access the MySQL interactive shell from Igor.
- create a new database on your MySQL account on Igor.
- run batches of SQL statements from .sql files which will set up the tables in your database, and insert some dummy data. (The schema is the same one we started working on together in Lecture 3.)
- execute some SQL statements to retrieve details about the schema and the data it contains.
- query the database to retrieve a specific subset of the data.
You will do all this in order to answer the quiz questions which form this week's assessment.
The database you are working on for this lab closely resembles one that was designed by students in Lab 2. It is arguably a very creative, but not very useful form of web application.
The application is called `Cat Flucks', and its basic functionality is as follows:
- A user may register an account
- A user of the site is randomly served a cat image
- If they find they image compelling in some way, they may optionally press a button which records their reaction (known as a `Fluck') in the database
- Some statistics are displayed along with the image
In future labs, you will see the Cat Flucks application developed further, but you will then be free to adapt the exercises to a scenario that better serves your interests.
Task 1: Access the MySQL interactive shell on Igor
Activate your MySQL account on Igor by going to doc.gold.ac.uk/dept/intranet and clicking on `MySQL Account Status' (under `Applications and Utilities'). You'll be required to set a password which you'll use to connect to your MySQL databases from terminal, or any Python scripts you may write. Do not use a password you would use to any other system holding your personal data, as it may at some point be hard-coded into Python scripts.
To log into Igor:
To start the MySQL shell:
mysql -u USERNAME -p
Task 2: Create a new database
First see what databases you already have linked to your account:
You will already have a couple, but these just contain information about your mysql account (i.e. ignore them).
Any databases you create on Igor must be given a name in the format:
USERNAME_mydatabase(i.e. prefixed with,
USERNAME_). To create a new database, run:
CREATE DATABASE USERNAME_flucks;
In the above,
USERNAME_flucks is the name of the database. You should change the username to your own. You can call your database what you like, but flucks is suggested here because it relates to the application name.
- Log out of shell by typing any of the following:
Task 3: Clone the lab-exercises repository
The SQL statements and associated data required for this exercise are in the lab-exercises repository. To access it, you should clone it on Igor, into your dnw folder.
cd public_html/dnw git clone email@example.com:data-networks-web/lab-exercises.git
Task 4: Restore a database
Change directory into the lab-3/sql folder:
lsto see what is in the folder. You should see 4 files:
Flucks_Schema.sql contains the schema definition (i.e. it contains a set of
TABLE CREATEstatements which define the tables in the database).
- The other 4 files contain the
INSERTstatements which will populate the tables with dummy data.
- Flucks_Schema.sql contains the schema definition (i.e. it contains a set of
Run the statements in the Flucks_Schema.sql file in one of 2 ways:
mysql DATABASE_NAME -p < Flucks_Schema.sql
From the MySQL shell:
USE DATABASE_NAME; SOURCE Flucks_Schema.sql;
The advantage of the first option is that you don't have to enter/exit the MySQL shell, but it also means that nothing is reported (printed to terminal) unless an error occurs.
Notice in Option 2 that each of the SQL statements is terminated by a semicolon. If you forget the semicolon, MySQL will continue to prompt you for input, until it receives a semicolon. You can abort a statement with
ctrl+c if you make a mistake.
Task 5: Explore the schema
Log back into mysql shell (if you haven't already)
Switch to the new database you created:
See what tables were created:
See what fields have been defined within a given table:
See what data is in a table:
SELECT * FROM TableName LIMIT 5;
Here, the asterisk (
*) is a wildcard, which means `all'. Including the
LIMIT clause restricts how many records (rows) are returned in the result-set.
- You can now answer questions 1-3 in the quiz!
Task 6: Input the dummy data
To insert some dummy data into the database, you need to run the batches of
INSERT statements that are in the remaining .sql files.
The order in which you do this matters. Why do you think this might be? Answer questions 4 and 5 of the quiz!
Run each batch of INSERT statements, one at a time, from the MySQL shell:
In the above, replace `TableName' with the name of the table you are inserting data into. Repeat the process for the other 3 files, but in the right order!
Task 7: Query the data
Now that you have inserted the data, you are able to perform SQL queries on it. To read data from a database with SQL, we use the
SELECT statement. The basic format for a
SELECT statement looks like this:
SELECT fieldname1, fieldname2 FROM TableName;
You may also use the wildcard (
*) to return all fields in a table:
SELECT * FROM TableName;
SELECT can be combined with various clauses to restrict/filter the records returned in the result-set. For example, the
WHERE clause can be used to return only records matching a specific criteria. For example, try running the following:
SELECT firstname, lastname FROM Account WHERE id=56;
- Write a similar query to find out usernames of all Accounts where the lastname is `Tyson'. Answer question 6! (without cheating...)
Often we want to return result-sets which include data from more than 1 table. For example, I may wish to know the urls of any `Flucked' media. To combine data from multiple tables, we use something called a
JOIN. There are different kinds of join, but we will just consider the simplest of these today, which is an
When you use a join, you need to specify how the 2 tables are related (i.e.
ON which field are they connected?)
The basic format for a table join looks like this, where it is assumed that a foreign key (fk) in Table2 references a primary key (pk) in Table1:
SELECT Table1.fieldname, Table2.fieldname FROM Table1 INNER JOIN Table2 ON Table1.pkfieldname=Table2.fkfieldname;
Try out the following join:
SELECT Media.img_url, MediaServe.id FROM Media INNER JOIN MediaServe ON Media.id=MediaServe.media_id;
You can join more than 2 tables in a result-set simply by including more joins!
Can you replace the `BLANK's in the following query to fetch 10 urls of images that have been `Flucked', along with the username of whoever `Flucked' them? Answer question 7!
SELECT BLANK.img_url, Account.username FROM Media INNER JOIN MediaServe ON Media.id=MediaServe.media_id INNER JOIN BLANK ON FluckGiven.serve_id=MediaServe.id INNER JOIN Account ON Account.id=MediaServe.BLANK LIMIT 10;
A technique we can use to make SQL statements slightly shorter and more readable, is by utilising column and/or table aliases. Column aliases can also make working with the results of our queries easier, as they give us control over the fieldnames (column headers) in the result-set. This is particularly helpful when combining data from 2 tables in which both tables had one or more of the same fieldnames.
Run the following query to return the id's of MediaServes and the id's of their associated Media Answer question 8!
SELECT m.id AS media_id, ms.id AS serve_id FROM Media m INNER JOIN MediaServe ms ON m.id=ms.media_id LIMIT 10;
Task 8: Extension!
To answer questions 9 and 10 (in an honest way), you need to perform a more advanced SQL query which utilises something called an `Aggregate function'. These are a special set of functions which perform operations on columns in a table, returning the result of the operation. For example, the
COUNT function can be used to return the number of records matching a query, while the
AVERAGE function will retrieve the average value in columns containing numerical data.
We would like to know which user has given the most Flucks. We would like to see a result-set containing the names of all users who have given Flucks, and the number of Flucks they have given.
Execute the following query after replacing the BLANK's! Who gave the most Flucks? Answer question 9!
SELECT firstname, lastname, COUNT(*) AS num_flucks FROM BLANK fg INNER JOIN MediaServe ms ON fg.BLANK=ms.id INNER JOIN Account ac ON ms.account_id=BLANK.BLANK GROUP BY account_id;
You can change the order in which the records appear in the result-set by including the
ORDER BY clause.
- How could you modify the query to order the result-set, such that the user who gave most Flucks appears first? Answer question 10! (Tip: try out the question options)