We will investigate how to perform basic encryption tasks using a cryptography library called Sodium, which is written in C. It is well-documented (you can find the documentation here), well-tested, highly portable, and used by many other projects. It allows us to perform tasks like encryption, decryption, signature checking, and password hashing.
Although Sodium is a C library, we will use it from the Python language, as that requires much less boilerplate code. In the CITS3007 SDE, we need to install the Python library PyNaCl, which “wraps” the C Sodium library, and provides a “Pythonic” interface to it (the documentation for PyNaCl is available here).1 Run the following commands in your development VM:
$ sudo apt-get update
$ sudo apt-get install python3-pip
$ pip install pynacl
This ensures we have the pip
command available for
managing Python libraries, then uses it to install PyNaCl. We’ll show
how to use the PyNaCl library to create a public–private key pair (like
those used by GitHub to allow repositories to be cloned or pushed
without using a password). The lecture slides contain more information
about public key cryptosystems like this, as does the PyNaCl
documentation, here.
Suppose Alice and Bob are both using a public-key cryptosystem, and both make their public keys available on the Web for anyone to access. Explain how could they use their keys so that Alice can securely send an encrypted message or file which can only be read by Bob.
In this section and the following ones, we will generate public–private key pairs, and use them to transfer encrypted content in exactly the way Alice and Bob could, in the previous exercise.
Save the following as keygen.py
:
import nacl.utils
from nacl.public import PrivateKey
from nacl.encoding import HexEncoder
def write(name, hex, suffix):
filename = 'key_' + name + suffix
with open(filename, 'wb') as ofp:
ofp.write(hex)
def make_keys(name):
secretKey = PrivateKey.generate()
write(name, secretKey.encode(encoder=HexEncoder), '.sk')
publicKey = secretKey.public_key
write(name, publicKey.encode(encoder=HexEncoder), '.pk')
key_name = input("Enter a name for the key pair to generate: ")
make_keys(key_name)
Run it by executing python3 keygen.py
, and entering a
name (this could be a particular purpose you’re generating the key pair
for – for instance,
secret-hushmoney-communications-with-my-accountant
– or
just your own name).
This will generate two files, key_[NAME].sk
and
key_[NAME].pk
, which hold our private and public keys,
respectively. If you inspect those files (e.g. by using
less
) you will see that they simply contain a long sequence
of hexadecimal digits.
In detail, here’s how the code works:
Lines 1-4 import several modules:
nacl.utils
: This module provides general utility
functions for working with libsodium.nacl.public.PrivateKey
: This is a class from the
nacl.public
module used to generate a pair of public and
private keys for encryption.nacl.encoding.HexEncoder
: This class from the
nacl.encoding
module is used to encode binary data as
hexadecimal strings.The code defines two functions:
write(name, hex, suffix)
: This function is responsible
for writing the hexadecimal representation of a key (either a secret key
or a public key) to a file with a specific name and suffix.make_keys(name)
: This function generates a pair of
public and private keys, writes them to separate files, and takes a
user-provided name for the key pair.Lines 11-15:
make_keys(name)
function, a secret key is
generated using PrivateKey.generate()
. This secret key will
be used for encryption and decryption by you (the user creating the key
pair).HexEncoder
and written to a file with the .sk
suffix..pk
suffix.Lines 17–19:
The code uses input()
to prompt the user to enter a name
for the key pair they want to generate, then calls
make_keys
to do the generation.
The secret key (in the “.sk” file) can be used by the user, you, to encrypt, decrypt and sign messages. The public key (in the “.pk” file) can be published to others, and can be used by other people to encrypt messages written to you, or decrypt messages written by you.
If possible, get another person in the lab to generate a key pair, and exchange public keys. Alternatively, create a second key pair with a different name (e.g. “other”), and choose this to be the “other person”.
Encrypt a message using the recipient’s public key and your private
key. Save the following script as encrypt.py
:
import nacl.utils
from nacl.public import PrivateKey, PublicKey, Box
from nacl.encoding import HexEncoder
class EncryptFile :
def __init__(self, sender, receiver):
self.sender = sender
self.receiver = receiver
self.sk = PrivateKey(self.get_key(sender, '.sk'), encoder=HexEncoder)
self.pk = PublicKey(self.get_key(receiver, '.pk'), encoder=HexEncoder)
def get_key(self, name, suffix):
filename = 'key_' + name + suffix
file = open(filename, 'rb')
data = file.read()
file.close()
return data
def encrypt(self, textfile, encfile):
box = Box(self.sk, self.pk)
tfile = open(textfile, 'rb')
text = tfile.read()
tfile.close()
etext = box.encrypt(text)
efile = open(encfile, 'wb')
efile.write(etext)
efile.close()
sender = input("Enter the name for your key pair: ")
recip = input("Enter the name for the recipient's key pair: ")
encrypter = EncryptFile(sender, recip)
target_file = input("Enter a file to encrypt: ")
encrypter.encrypt(target_file, f'{target_file}.enc')
print('Done!')
Run it with the command python3 encrypt.py
. You will
need to provide the name of your key pair (from the previous exercise),
the recipient’s key pair, and a file to encrypt (you can just choose the
encrypt.py
script if you have no other text file
handy).
The script should create a binary file ORIG_FILE.enc
(where ORIG_FILE
is whatever the name of the original file
was) – this is the encrypted file.
In more detail, here is what the script does:
Define a class EncryptFile
.
EncryptFile
. This
class is designed to handle file encryption operations.__init__
) of this class initializes
the sender and receiver names and loads the sender’s private key (with
extension .sk
) and the receiver’s public key (with
extension .pk
) from files.get_key
method is used to read the contents of a
key file (either a .sk
or .pk
file) and return
it as binary data.encrypt
method is used to encrypt a file. It loads
the contents of a text file specified by textfile
, encrypts
it using the sender’s private key (sk
) and the recipient’s
public key (pk
), and then writes the encrypted data to a
new file specified by encfile
.lines 29–34:
sender
) and the recipient’s key pair (recip
)
and a file to encrypt.EncryptFile
class is created with
the sender’s name and the recipient’s name.encrypt
method of the EncryptFile
instance is called with the target file and the name of the encrypted
output file (the encrypted file will have a .enc
extension).Save the following as decrypt.py
:
import nacl.utils
from nacl.public import PrivateKey, PublicKey, Box
from nacl.encoding import HexEncoder
import sys
class DecryptFile:
def __init__(self, sender, receiver):
self.sender = sender
self.receiver = receiver
self.sk = PrivateKey(self.get_key(receiver, '.sk'), encoder=HexEncoder)
self.pk = PublicKey(self.get_key(sender, '.pk'), encoder=HexEncoder)
def get_key(self, name, suffix):
= 'key_' + name + suffix
filename try:
with open(filename, 'rb') as file:
= file.read()
data return data
except FileNotFoundError:
print(f"Key file '{filename}' not found.")
1)
sys.exit(
def decrypt(self, encfile, textfile):
= Box(self.sk, self.pk)
box try:
with open(encfile, 'rb') as efile:
= efile.read()
etext = box.decrypt(etext)
dtext with open(textfile, 'wb') as tfile:
tfile.write(dtext)print(f"Decrypted file saved as '{textfile}'")
except FileNotFoundError:
print(f"Encrypted file '{encfile}' not found.")
1)
sys.exit(
= input("Enter the name for the sender's key pair: ")
sender = input("Enter your name for your key pair: ")
recip = DecryptFile(sender, recip)
decrypter = input("Enter the name of the encrypted file to decrypt: ")
enc_file = input("Enter the name for the decrypted output file: ")
target_file decrypter.decrypt(enc_file, target_file)
To use the script, you need to have an encrypted file (with a
.enc
extension) generated by the “encrypt.py” script in the
same directory – ideally, swap with another person and attempt to
decrypt their .enc
file – together with your private key
(with an .sk
extension), and the other person’s public key
(with a .pk
extension).
Run python3 decrypt.py
and follow the prompts: enter the
sender’s key pair name, your key pair name, the name of the encrypted
file to decrypt, and the name for the decrypted output file.
The script will decrypt the file using the private key associated with your name and the sender’s public key and save the decrypted content to the specified output file.
As a challenge task, you might like to research how to use libsodium to sign a message with your key pair so that other users can verify a (plaintext) message came from you.
See if you can answer the following questions, after reviewing the material on cryptography in the lectures.
Suppose in the CITS3007 SDE you create the MD5 hash of some password, using a command like:
$ printf mypassword | md5sum
In what format is the hash displayed? How large is the hash, in bytes? How would you write it in C syntax?
If we run the commands, we get output like the following:
$ printf mypassword | md5sum
d8578edf8458ce06fbc5bb76a58c5ca4 -
The first “word” of output is the actual hash; the “-” represents the name of the file being hashed (in this case, “-” represents standard input).
The hash is a sequence of hexadecimal digits, and represents 16 bytes (or 128 bits).
Each pair of characters in the original hash represents one byte, so if stored in C as an array of bytes, we could write it as follows:
// fragment 1
char somehash[] = {0xd8, 0x57, 0x8e, 0xdf, 0x84, 0x58, 0xce, 0x6,
0xfb, 0xc5, 0xbb, 0x76, 0xa5, 0x8c, 0x5c, 0xa4};
Strings in C also allow us to use hexadecimal escape sequences, so we could also write the following:
// fragment 2
char somehash[] = "\xd8\x57\x8e\xdf\x84\x58\xce\x6"
"\xfb\xc5\xbb\x76\xa5\x8c\x5c\xa4";
The difference is that in fragment 1, somehash
is a
“plain” array or buffer, of size 16 elements, but in fragment 2,
somehash
is a null-terminated C string, so the array will
be of size 17.
What is the purpose of salting passwords, when creating a password hash?
Salting passwords prevents several common attacks on passwords.
If a password is used unsalted, directly as is, then every user in
the world who happens to use the password “qwerty”, for instance, will
have exactly the same hash for their password (assuming the same
algorithm is used). If we used the MD5 hashing algorithm,2
then every user who uses the password “qwerty” will get the hash
d8578edf8458ce06fbc5bb76a58c5ca4
:
$ printf qwerty | md5sum
d8578edf8458ce06fbc5bb76a58c5ca4 -
That means if an attacker happens to get hold of the list of hashed passwords, it’s extremely easy for them to find out what the user’s password is – despite the fact that hashes are “difficult to reverse”.
The attacker knows that many people choose very
common passwords and that “qwerty” is one of these, and that the MD5
hash of “qwerty” is d8578edf8458ce06fbc5bb76a58c5ca4
. So if
the attacker has a list of the hashes of common passwords, they’ll
easily recognize them whenever they appear. (A rainbow table is
used by hackers when attacking lists of hashes, and is simply a data
structure designed to efficiently store many precomputed password/hash
pairs.)
Adding a random salt to the password destroys this straightforward correspondence between password and hash.
Look up Wikipedia to refresh your memory of what a hash collision is. Explain why hash collisions necessarily occur. That is, why must there always be two different plaintexts that have the same hash value?
Every hash function outputs results which are of some exact, fixed size; the exact size will depend on the function. (For instance, we’ve seen above that the MD5 algorithm always outputs hashes of 16 bytes.)
The input to a hash function, however, is a sequence (usually of bytes) of arbitrary length. The input domain is therefore infinite, but the output range of the function is finite: hence, for any one hash value, there must always be an infinite number of plaintexts which produce that hash value.
We can see this more straightforwardly if we imagine a hash function that produces outputs of only one byte in length. Such a function would not be very useful for cryptography purposes (can you explain why?), but we could use it for example to distribute items across a hash table of size less than 256.
The output of the function is one byte (256 different values), but it will operate on any arbitrary sequence of input bytes. It therefore follows that for each of the 256 output results, there must be an infinite number of inputs which produce it.
You can use your lab time to work on the CITS3007 project. You may wish to discuss your project tests and code design with other students or the lab facilitators (although the actual code you submit must be your own, individual work).
There are actually multiple Python libraries which
provide access to the C Sodium library, which can be confusing, but they
have quite different purposes. PyNaCl, which we use, provides a fairly
high-level interface to Sodium, and allows Python programmers
to use Python types (such as classes and lists) which they are familiar
with.
Two other Python libraries are pysodium and libnacl. These are
not high-level – they pretty directly wrap the exact C
functions exposed by the C Sodium library, and allow them to be called
from Python.↩︎
Note that as per the lectures, MD5 should not be used in practice as a password hashing function; a dedicated function like SCrypt should be used instead.↩︎