CITS3007 lab 10 (week 11) – Cryptography – solutions

1. Cryptography libraries

We will investigate how to perform basic encryption tasks using a cryptography library called Sodium, which is written in C. It is well-documented (you can find the documentation here), well-tested, highly portable, and used by many other projects. It allows us to perform tasks like encryption, decryption, signature checking, and password hashing.

Although Sodium is a C library, we will use it from the Python language, as that requires much less boilerplate code. In the CITS3007 SDE, we need to install the Python library PyNaCl, which “wraps” the C Sodium library, and provides a “Pythonic” interface to it (the documentation for PyNaCl is available here).1 Run the following commands in your development VM:

$ sudo apt-get update
$ sudo apt-get install python3-pip
$ pip install pynacl

This ensures we have the pip command available for managing Python libraries, then uses it to install PyNaCl. We’ll show how to use the PyNaCl library to create a public–private key pair (like those used by GitHub to allow repositories to be cloned or pushed without using a password). The lecture slides contain more information about public key cryptosystems like this, as does the PyNaCl documentation, here.

Exercise

Suppose Alice and Bob are both using a public-key cryptosystem, and both make their public keys available on the Web for anyone to access. Explain how could they use their keys so that Alice can securely send an encrypted message or file which can only be read by Bob.

1.1. Generating a key pair

In this section and the following ones, we will generate public–private key pairs, and use them to transfer encrypted content in exactly the way Alice and Bob could, in the previous exercise.

Save the following as keygen.py:

import nacl.utils
from nacl.public import PrivateKey
from nacl.encoding import HexEncoder

def write(name, hex, suffix):
    filename = 'key_' + name + suffix 
    with open(filename, 'wb') as ofp:
      ofp.write(hex)

def make_keys(name):
    secretKey = PrivateKey.generate()
    write(name, secretKey.encode(encoder=HexEncoder), '.sk')
    publicKey = secretKey.public_key
    write(name, publicKey.encode(encoder=HexEncoder), '.pk')

key_name = input("Enter a name for the key pair to generate: ")

make_keys(key_name)

Run it by executing python3 keygen.py, and entering a name (this could be a particular purpose you’re generating the key pair for – for instance, secret-hushmoney-communications-with-my-accountant – or just your own name).

This will generate two files, key_[NAME].sk and key_[NAME].pk, which hold our private and public keys, respectively. If you inspect those files (e.g. by using less) you will see that they simply contain a long sequence of hexadecimal digits.

In detail, here’s how the code works:

The secret key (in the “.sk” file) can be used by the user, you, to encrypt, decrypt and sign messages. The public key (in the “.pk” file) can be published to others, and can be used by other people to encrypt messages written to you, or decrypt messages written by you.

1.2. Using the key pair to encrypt

If possible, get another person in the lab to generate a key pair, and exchange public keys. Alternatively, create a second key pair with a different name (e.g. “other”), and choose this to be the “other person”.

Encrypt a message using the recipient’s public key and your private key. Save the following script as encrypt.py:

import nacl.utils
from nacl.public import PrivateKey, PublicKey, Box
from nacl.encoding import HexEncoder

class EncryptFile :
    def __init__(self, sender, receiver):
        self.sender = sender
        self.receiver = receiver
        self.sk = PrivateKey(self.get_key(sender, '.sk'), encoder=HexEncoder)
        self.pk = PublicKey(self.get_key(receiver, '.pk'), encoder=HexEncoder)

    def get_key(self, name, suffix):
        filename = 'key_' + name + suffix
        file = open(filename, 'rb')
        data = file.read()
        file.close()
        return data

    def encrypt(self, textfile, encfile):
        box = Box(self.sk, self.pk)
        tfile = open(textfile, 'rb')
        text = tfile.read()
        tfile.close()
        etext = box.encrypt(text)
        efile = open(encfile, 'wb')
        efile.write(etext)
        efile.close()

sender = input("Enter the name for your key pair: ")
recip = input("Enter the name for the recipient's key pair: ")
encrypter = EncryptFile(sender, recip)
target_file = input("Enter a file to encrypt: ")
encrypter.encrypt(target_file, f'{target_file}.enc')
print('Done!')

Run it with the command python3 encrypt.py. You will need to provide the name of your key pair (from the previous exercise), the recipient’s key pair, and a file to encrypt (you can just choose the encrypt.py script if you have no other text file handy).

The script should create a binary file ORIG_FILE.enc (where ORIG_FILE is whatever the name of the original file was) – this is the encrypted file.

In more detail, here is what the script does:

1.3. Using the key pair to decrypt

Save the following as decrypt.py:

import nacl.utils
from nacl.public import PrivateKey, PublicKey, Box
from nacl.encoding import HexEncoder
import sys

class DecryptFile:
    def __init__(self, sender, receiver):
        self.sender = sender
        self.receiver = receiver
        self.sk = PrivateKey(self.get_key(receiver, '.sk'), encoder=HexEncoder)
        self.pk = PublicKey(self.get_key(sender, '.pk'), encoder=HexEncoder)

    def get_key(self, name, suffix):
        filename = 'key_' + name + suffix
        try:
            with open(filename, 'rb') as file:
                data = file.read()
            return data
        except FileNotFoundError:
            print(f"Key file '{filename}' not found.")
            sys.exit(1)

    def decrypt(self, encfile, textfile):
        box = Box(self.sk, self.pk)
        try:
            with open(encfile, 'rb') as efile:
                etext = efile.read()
            dtext = box.decrypt(etext)
            with open(textfile, 'wb') as tfile:
                tfile.write(dtext)
            print(f"Decrypted file saved as '{textfile}'")
        except FileNotFoundError:
            print(f"Encrypted file '{encfile}' not found.")
            sys.exit(1)

sender = input("Enter the name for the sender's key pair: ")
recip = input("Enter your name for your key pair: ")
decrypter = DecryptFile(sender, recip)
enc_file = input("Enter the name of the encrypted file to decrypt: ")
target_file = input("Enter the name for the decrypted output file: ")
decrypter.decrypt(enc_file, target_file)

To use the script, you need to have an encrypted file (with a .enc extension) generated by the “encrypt.py” script in the same directory – ideally, swap with another person and attempt to decrypt their .enc file – together with your private key (with an .sk extension), and the other person’s public key (with a .pk extension).

Run python3 decrypt.py and follow the prompts: enter the sender’s key pair name, your key pair name, the name of the encrypted file to decrypt, and the name for the decrypted output file.

The script will decrypt the file using the private key associated with your name and the sender’s public key and save the decrypted content to the specified output file.

1.4. Challenge task

As a challenge task, you might like to research how to use libsodium to sign a message with your key pair so that other users can verify a (plaintext) message came from you.

2. Cryptography questions and exercises

See if you can answer the following questions, after reviewing the material on cryptography in the lectures.

Question 2(a)

Suppose in the CITS3007 SDE you create the MD5 hash of some password, using a command like:

$ printf mypassword | md5sum

In what format is the hash displayed? How large is the hash, in bytes? How would you write it in C syntax?

Sample solution

If we run the commands, we get output like the following:

$ printf mypassword | md5sum
d8578edf8458ce06fbc5bb76a58c5ca4  -

The first “word” of output is the actual hash; the “-” represents the name of the file being hashed (in this case, “-” represents standard input).

The hash is a sequence of hexadecimal digits, and represents 16 bytes (or 128 bits).

Each pair of characters in the original hash represents one byte, so if stored in C as an array of bytes, we could write it as follows:

  // fragment 1
  char somehash[] = {0xd8, 0x57, 0x8e, 0xdf, 0x84, 0x58, 0xce, 0x6,
                     0xfb, 0xc5, 0xbb, 0x76, 0xa5, 0x8c, 0x5c, 0xa4};

Strings in C also allow us to use hexadecimal escape sequences, so we could also write the following:

  // fragment 2
  char somehash[] = "\xd8\x57\x8e\xdf\x84\x58\xce\x6"
                    "\xfb\xc5\xbb\x76\xa5\x8c\x5c\xa4";

The difference is that in fragment 1, somehash is a “plain” array or buffer, of size 16 elements, but in fragment 2, somehash is a null-terminated C string, so the array will be of size 17.

Question 2(b)

What is the purpose of salting passwords, when creating a password hash?

Sample solution

Salting passwords prevents several common attacks on passwords.

If a password is used unsalted, directly as is, then every user in the world who happens to use the password “qwerty”, for instance, will have exactly the same hash for their password (assuming the same algorithm is used). If we used the MD5 hashing algorithm,2 then every user who uses the password “qwerty” will get the hash d8578edf8458ce06fbc5bb76a58c5ca4:

$ printf qwerty | md5sum
d8578edf8458ce06fbc5bb76a58c5ca4  -

That means if an attacker happens to get hold of the list of hashed passwords, it’s extremely easy for them to find out what the user’s password is – despite the fact that hashes are “difficult to reverse”.

The attacker knows that many people choose very common passwords and that “qwerty” is one of these, and that the MD5 hash of “qwerty” is d8578edf8458ce06fbc5bb76a58c5ca4. So if the attacker has a list of the hashes of common passwords, they’ll easily recognize them whenever they appear. (A rainbow table is used by hackers when attacking lists of hashes, and is simply a data structure designed to efficiently store many precomputed password/hash pairs.)

Adding a random salt to the password destroys this straightforward correspondence between password and hash.

Question 2(c)

Look up Wikipedia to refresh your memory of what a hash collision is. Explain why hash collisions necessarily occur. That is, why must there always be two different plaintexts that have the same hash value?

Sample solution

Every hash function outputs results which are of some exact, fixed size; the exact size will depend on the function. (For instance, we’ve seen above that the MD5 algorithm always outputs hashes of 16 bytes.)

The input to a hash function, however, is a sequence (usually of bytes) of arbitrary length. The input domain is therefore infinite, but the output range of the function is finite: hence, for any one hash value, there must always be an infinite number of plaintexts which produce that hash value.

We can see this more straightforwardly if we imagine a hash function that produces outputs of only one byte in length. Such a function would not be very useful for cryptography purposes (can you explain why?), but we could use it for example to distribute items across a hash table of size less than 256.

The output of the function is one byte (256 different values), but it will operate on any arbitrary sequence of input bytes. It therefore follows that for each of the 256 output results, there must be an infinite number of inputs which produce it.

3. CITS3007 project

You can use your lab time to work on the CITS3007 project. You may wish to discuss your project tests and code design with other students or the lab facilitators (although the actual code you submit must be your own, individual work).


  1. There are actually multiple Python libraries which provide access to the C Sodium library, which can be confusing, but they have quite different purposes. PyNaCl, which we use, provides a fairly high-level interface to Sodium, and allows Python programmers to use Python types (such as classes and lists) which they are familiar with.
        Two other Python libraries are pysodium and libnacl. These are not high-level – they pretty directly wrap the exact C functions exposed by the C Sodium library, and allow them to be called from Python.↩︎

  2. Note that as per the lectures, MD5 should not be used in practice as a password hashing function; a dedicated function like SCrypt should be used instead.↩︎