Lab 8: Data Security
Data security is a big topic, that could take up a whole module by itself. The aim of this lab is therefore to encourage you to give consideration to the security of sensitive data when designing and developing data-driven applications, and also when communicating sensitive data over a network.
The lab is two-fold: Part 1 is more of a revision activity and relates tos security of data in transit, whereas Part 2 relates to the security of data held in a database.
Part 1: The networked systems game!
The purpose of this game-based revision task is to help you to better understand what you are doing when you use various protocols and authentication methods to communicate data between different systems on a network.
Don't worry, it is more fun then it sounds!
Learning Objectives
- To be confident describing the process of connecting to a remote machine over SSH
- To identify situations when it would be appropriate to use SSH, HTTP and HTTPS
- To explain when and/or why you might use SSH with key authentication rather than password authentication
How to play
- 4-6 players are needed for this game
- Each group of players will receive a bunch of cards. The cards are categorised as follows:
- systems
- protocols
- authentication methods
- unix commands
- tasks
- With the exception of the task cards, all cards are shuffled at the start of the game and placed face down on the table.
- Each player starts the game with 1 task card, and 7 from the deck. At any point in the game, they may have only 1 task card, and a minimum of 7 other cards in their hand.
- The objective is to obtain the necessary systems, protocols, authentication methods and unix commands to be able to complete the task in hand.
- Players take turns to take a card from the deck. They may look at the card and choose whether or not to keep or discard it. If they choose to keep it, they must discard another card from their hand.
- Discarded cards are placed in a separate pile face up on the table.
- When a card is discarded, another player may bid for it. The bidding process is as follows:
- The bidder must say something factually accurate about the thing on the card. It must be something that has not been said before.
- If nobody wishes to challenge them, they get to take the card and any cards beneath it.
- If another player challenges them, the other players act as mediators.
- Whoever is deemed to be most well-informed gets the option to take the cards.
- When a player completes a task, they should arrange the pieces in front of them and allow other players to inspect the `diagram' for inaccuracies.
- If nobody challenges them for inaccuracies, they pick up another task card.
- If another player wishes to challenge them, they must state what was wrong with the diagram. If the remaining players agree that the challenge is justified, the challenger may take the cards that were placed on the table, but ONLY if they have the necessary cards to complete the task correctly. If not, the challenged player gets to keep the cards, but must put them back in their hands.
- The game will run for 45 minutes. When the time is up, the player with the most completed tasks wins the game.
Twists
- The tutor and lab assistants will act as mobile challengers, lookng and listening out for inaccuracies in completed tasks or bid statements.
- You can use Google, your notes and other learning resources as you wish. Remember, you cannot repeat something that has been said before, so do some digging!
- After a round or 2, feel free to introduce new rules to the game! (Assuming they are agreed through democratic processes).
Part 2: Working with hashed data
In the context of data-driven web application development, we have another kind of security concern to worry about: protecting data held in a database. In particular, we should worry about protecting sensitive data. A large part of this falls to the database and/or system administrator(s), but it is also a concern for middleware and frontend developers.
Learning Objectives
- Identify and describe sensitive data in a database
- Identify a Python module we can use to compare a hashed password value (from a database) with an unhashed string (from a form)
- Describe the difference between encryption and hashing
Task 1: Security discussion
Working with a partner, discuss and research answers to the following:
- What kinds of data should be considered sensitive?
- What can be done by a middleware developer (i.e. person who writes the application code) to protect sensitive data?
- What could be done by a backend developer or database administrator?
Task 2: Test a simple login script
You may recall from Lab 7 that, while the Catflucks application successfully inserts new flucks in the database, the new flucks have no associated account data. That's because we don't yet have any form of login functionality, and therefore no way of know who is doing the flucking.
Clearly, to prevent all future flucks data from being anonymous, we need to implement a login feature of kinds.
In the lab-8 implementation of Catflucks, you will witness the beginnings of a very basic login system. You will be able to improve on this system using the knowledge you have yet to gain in Term 2!
Nevertheless, by testing out this simple version, you may start to get an idea how hash functions can help to keep sensitive data more secure.
- Pull the latest version of lab-exercises
- cd into the lab-8 folder
- Try serving the app as it is:
./simpleServer.py
, navigating tocgi-bin/splash.py
- Try logging in! Oh dear, why can't you log in?!
- Try to access an account document from the mongo shell...will this bring you any closer to logging in?
- To facilitate testing of the login facility, you need to make a dummy account which you know both the username and unhashed password for.
In case you haven't guessed by now, the passwords stored in the database are hashed. Hashing is 1-way operation (i.e. not easily reversible). It ensures that the data, once inserted in the database, cannot easily be revealed. Why do you think this is important?
To test the login feature, you need to add an account to the database which you know both the unhashed and hashed password for. The login script then uses a method of the bcrypt module, ckeckpw, to check the unhashed version you enter in the form against the hashed version it retrieves from the database.
-
Open mongo shell and switch to catflucks database:
mongo use catflucks
-
Insert a new account document. Use the password value that's in the following example:
db.accounts.insertOne({ ... "joined": new Date(), ... "username":"tester", ... "name":{"last":"Harriet","first":"Sorrel"}, ... "password":"$2b$12$DXvBoOurVX7GnCr35P.iZ.DG2cH.qQarJL7m9xhhRvrqkDsE1/aFC", ... "is_admin":1, ... "email":"sharr003@gold.ac.uk" ... })
You can change the details as you see fit, but the bit you shouldn't change is the password, which corresponds with the string, password. If you're interested, you could generate your own hashed values using the bcrypt module (which was used here). You'd need to install it first, as it is not part of the Python Standard Library (do pip3 install bcrypt
).
- Once you have it working, have a look inside the login.py script to see what code is behind this functionality. What module is being used here to compare the passwords?
Task 3: Do some research!
Ok, so what does hashing actually mean?
- Do a bit of research around hashing and encryption (perhaps look up that module's documentation too?)
- Contribute to the discussion on this week's forum!
We will talk about this more in tomorrow's lecture.