CS579A/CpE579A: Foundations of Cryptography

Lab 1: Personal Vault


[ Introduction | Software Setup | Lab Specification | Design Guidelines | Collaboration Policy | Hand-in Procedure | Useful Functions | References ]


Introduction

In this lab, you will build a simple utility to keep your file safe from the scrutiny of your system administrator ;-) The utility will allow you to generate a symmetric encryption/decryption key for an Adaptive Chosen-Ciphertext (CCA) secure Symmetric-Key Encryption scheme. You can then encrypt your files so that only you (and those to whom you give the symmetric key) can recover the original plaintext. By doing the lab, you will gain an insight on how simple cryptographic tasks are implemented.

We provide you with a simple cryptographic library (libdcrypt), containing an implementation of all the cryptographic tools you will need for the lab. (See Lab0 for instructions on how to install this library in your machine.)

Software Setup

We have prepared skeleton files to help you get started with the lab. To set up the files on your account, download lab1.tar.gz, and type the following:

% tar xzf lab1.tar.gz
% cd lab1
% 

Lab Specification

Your Personal Vault utility will consist of three programs: pv_keygen, pv_encrypt and pv_decrypt. We provide you with a skeleton source directory (lab1.tar.gz), containing the following files:

% ls lab1/
Makefile        pv_decrypt.c   pv_keygen.c
pv.h            pv_encrypt.c   pv_misc.c
%

***(The provided Makefile assumes that you will be doing the assignment on the linux-lab cluster (either remotely via ssh, or in person at Burchard 126). If you are working on your machine, see the note below.)***

Once you have implemented the necessary functions (see below), you will build the three programs pv_keygen, pv_encrypt and pv_decrypt using make:

% make
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror
-I. -I/usr/local/include/ -I/home/nicolosi/cs579/devel/include/ -c
pv_keygen.c pv_misc.c
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -o
pv_keygen pv_keygen.o pv_misc.o -L. -L/usr/local/lib/
-L/home/nicolosi/cs579/devel/lib/ -ldcrypt  -lgmp
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror
-I. -I/usr/local/include/ -I/home/nicolosi/cs579/devel/include/ -c
pv_encrypt.c pv_misc.c
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -o
pv_encrypt pv_encrypt.o pv_misc.o -L. -L/usr/local/lib/
-L/home/nicolosi/cs579/devel/lib/ -ldcrypt  -lgmp
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror
-I. -I/usr/local/include/ -I/home/nicolosi/cs579/devel/include/ -c
pv_decrypt.c pv_misc.c
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -o
pv_decrypt pv_decrypt.o pv_misc.o -L. -L/usr/local/lib/
-L/home/nicolosi/cs579/devel/lib/ -ldcrypt  -lgmp
%

You should now be able to create your own symmetric key, and use it to encrypt/decypt your files as follows:

% ./pv_keygen my_key.b64 
% yes "test" | head -1000 > a_file
% ./pv_encrypt my_key.b64 a_file an_encrypted_file
% ./pv_decrypt my_key.b64 an_encrypted_file a_decrypted_file 
% diff a_file a_decrypted_file
%

Note on setting up lab1 if you are *not* working on the linux-lab cluster
If you have installed the libraries on your own machine, you may need to edit the Makefile that was provided in lab1.tar.gz.
Locate the following lines (toward the beginning of Makefile):

INCLUDES = /usr/include/                                                        
LIBS = /usr/lib/                                                                
DCRYPTINCLUDE = /home/nicolosi/cs579/devel/include/          
DCRYPTLIB = /home/nicolosi/cs579/devel/lib/

Next, edit these lines so that:

Design Guidelines

Now that you know how your personal vault utility is supposed to work, let's get into doing something.

We provide you with an incomplete implementation of pv_keygen.c, pv_encrypt.c, and pv_decrypt.c. (Some miscellaneous auxiliary tasks are implemented for you in pv.h and pv_misc.c.) Your job is to fill in the code where you see the comment line /* YOUR CODE HERE */. (But of course, feel free to change any part of the code as you see fit.) Mainly, you'll be working on three places:

Dealing with the symmetric key file

The task of write_skfile is to encode the raw binary symmetric key using the data serialization functions described below. Conversely, import_sk_from_file should allocate and fill in a raw binary buffer, big enough to contain the deserialization of the ASCII data in the key file.

Encrypting the content

The task of encrypt_file is to read the content from the file descriptor fin, encrypt it using raw_sk, and place the resulting ciphertext in a file named ctxt_fname.

The encryption should be CCA-secure, which is the level of cryptographic protection that you should always expect of any implementation of an encryption algorithm.

Here are some guidelines, but you are welcome to make variations, as long as you can argue that your code still attains CCA security.

One approach is to use AES in CBC-mode, and then append an HSHA-1 mac of the resulting ciphertext. (Always mac after encrypting!) The dcrypt library contains implementations of AES and of HMAC SHA-1 (cf. Useful functions). However, you should take care of using AES in CBC-mode, as the library only gives access to the basic AES block cipher functionality, which is not CCA-secure.

Notice that the key used to compute the HMAC SHA-1 mac must be different from the one used by AES. Never use the same cryptographic key for two different purposes: bad interference could occur.
For this reason, the key raw_sk actually consists of two pieces, one for AES and one for HMAC SHA-1. The length of each piece (and hence the cryptographic strength of the encryption) is specified by the constant CCA_STRENGTH in pv.h; the default is 128 bits, or 16 bytes.

Recall that AES can only encrypt blocks of 128 bits, so you should use some padding in the case that the length (in bytes) of the plaintext is not a multiple of 16. This should be done in a way that allow proper decoding after decryption: in particular, the recipient must have a way to know where the padding begins so that it can be chopped off. (Using a different mode of operation, like CFB, would not require padding.)

One possible design is to add enough 0 bytes to the plaintext so as to make its length a multiple of 16, and then append a byte at the end specifying how many zero-bytes were appended.

Thus, the overall layout of an encrypted file will be:

         +--------------------------+---+--------+
         |             Y            | W | padlen |
         +--------------------------+---+--------+

where Y = CBC-AES (K_AES, {plaintext, 0^padlen})
      W = HMAC-SHA-1 (K_HSHA-1, Y)
      padlen = no. of zero-bytes added to the plaintext to make its
               length a multiple of 16

As for the sizes of the various components of a ciphertext file, notice that:

Decrypting the content

The task of decrypt_file is to read the ciphertext from the file descriptor fin, decrypt it using sk, and place the resulting plaintext in a file named ptxt_fname.

This procedure basically should just "undo" the operations performed by encrypt_file; for this reason, decrypt_file expects a ciphertext featuring the structure described above. Reading Y (and then the mac and the pad length) is a bit tricky: below we sketch one possible approach, but you are free to implement this as you wish.

The idea is based on the fact that the ciphertext file ends with 21 bytes (i.e., the size of a hash + 1) used up by the HSHA-1 mac and by the pad length. Thus, we will repeatedly attempt to perform "long reads" of (aes_blocklen + sha1_hashsize + 2) bytes: once we get to the end of the ciphertext and only the last chunk of Y has to be read, such "long reads" will encounter the end-of-file, at which point we will know where Y ends, and how to finish reading the last bytes of the ciphertext.

Cipher-Block Chaining (CBC) Mode

For encrypting a stream of bytes that does not require random access, people often employ a technique known as Cipher-Block Chaining (CBC). To encrypt in CBC mode, one thinks of the stream of bytes as a sequence of block, each of the size of the block cipher being used (AES in your case); then, one XORs each plaintext block with the encryption of the previous block before encrypting, as shown here:

Ciphertext-Block Chaining

If the plaintext blocks are m1, m2, ..., and the ciphertext blocks c1, c2, ..., then encryption and decryption in CBC mode are performed as follows:

ci = E(mi XOR ci-1)
mi = D(ci) XOR ci-1

The first plaintext block is XORed with an initialization vector, or IV (which you can think of as c0, since there is no m0). The IV can be publicly known, but should be chosen afresh at random each time the same key will be used to encrypt, so that each ciphertext uses a different IV.

Collaboration Policy

You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assigment. You are not allowed to look at anyone else's solution. You may discuss the assignments with other students, but you may not look at or copy each others' code. You may not copy code that might be available online.

Hand-In Procedure

You must submit two files:

To build a software distribution, run the following commands (from the directory where your source files are located):

% cd ..
% tar cf pv.tar lab1/
% gzip pv.tar
%

(If the name of the directory containing your sources is not lab1, then substitute the appropriate name in the second command above.)

To create a script file, use the script command. When you run script, everything you type gets saved in a file called typescript. Press CTRL-D to finish the script. For example:

% script
Script started, output file is typescript
% ./pv_keygen my_key.b64 
% yes "test" | head -1000 > a_file
% ./pv_encrypt my_key.b64 a_file an_encrypted_file
% ./pv_decrypt my_key.b64 an_encrypted_file a_decrypted_file
% diff a_file a_decrypted_file
% 
% ...
% yes "test" | head -1015 > a_file
% ./pv_encrypt my_key.b64 a_file an_encrypted_file
% ./pv_decrypt my_key.b64 an_encrypted_file a_decrypted_file
% diff a_file a_decrypted_file
% 
% ^D
% ^D Script done, output file is typescript 
% 

To turn in your distribution and script file, e-mail the files pv.tar.gz and typescript to me at "nicolosi AT cs DOT stevens DOT edu" by April 9, 11:55pm.

This completes the lab.


Useful Functions

Below is a description of some of the functions implemented in the dcrypt library that you may find useful in completing the assignment. You will need to include the dcrypt.h header file to access these functions. You may also want to take a look at these sample programs (tst.c, tst_sha1.c) to see some of these functions in action.

Data serialization

Pseudo-Random Number Generation Functions

The libraries you are using contain a cryptographic pseudo-random number generator, whose state is kept in a global 16-byte array called prng_state. Before using the random number generator, you must initialize it.

Symmetric-Key Encryption Functions

For actually encrypting and decrypting file data, you will use the Rijndael [FIPS-197] block cipher (also called AES—Advanced Encryption Standard). Rijndael is a 128-bit block cipher. It supports two operations--encryption, and decryption. Encryption transforms 16 bytes (128 bits) of plaintext data into 16 bytes of ciphertext data using a secret key. Someone who does not know the secret key cannot recover the plaintext from the ciphertext. The decryption algorithm, given knowledge of the secret key, transforms ciphertext into plaintext.

The libraries you are using define a struct called aes_ctx that you should use to hold secret keys for AES. You should manipulate your AES secret keys with the following functions:

Cryptographic Hash Functions

The SHA-1 [FIPS-180-1] hash function hashes an arbitrary-length input (up to 2^64 bytes) to a 20-byte output. SHA-1 is known as a cryptographic hash function. While nothing has been formally proven about the function, it is generally assumed that SHA-1 is one-way and collision-resistant. These properties are defined as follows:

The libraries you are using contain an implementation of SHA-1.

Sometimes the input that you want to hash is so long that it is inconvenient to store it entirely in memory before being able to hash it. This is the case for example when hashing the entire content of a file into a short digest.
For this reason, the libraries you are using allow you to process a long input "one chunk at a time." To do that, you should use a struct called sha1_ctx, which will store the "partial digest" as you keep providing new input to be hashed. You should manipulate sha1_ctx structs with the following functions:

Message Authentication Codes (MACs)

Message Authentication Codes (MACs) are a symmetric-key primitive allowing you to check the integrity of the information to which the MAC is applied. Recall that encryption does not guarantee integrity! The fact that you were able to decrypt a ciphertext is not enough to be sure that nobody tampered with its content. For integrity, you should always append a MAC to the content.

You use MACs as follows. Let's say that you want to store a file on your file-server, but you are afraid that its content will be changed behind your back. Then, you use a secret key to "mac" the file, and store the resulting MAC along with the file. Now, when you check back with your file-server and retrieve your file, you will also retrieve the MAC that you appended. Then, you will use the secret key to compute the MAC again, and if the MAC you just computed is the same as the value that you retrieved from the file-server, then you are sure that nobody touched your file. This is because, if somebody had changed the file, then they should have computed the corresponding MAC in order to fool you. However, secure MACs are concocted such that, without knowing the secret key, it is computationally intractable to compute the right MAC, even after having seen a lot of valid (message, MAC) pairs.

The Keyed-Hash Message Authentication Code (HMAC) [FIPS-198a] is a secure Message Authentication Code based on the use of any cryptographic hash function, like SHA-1. The libraries you are using contain an implementation of HMAC, instantiated with the SHA-1 cryptographic hash function.

Similarly to what discussed for the case of SHA-1, the libraries you are using allow you to process a long input "one chunk at a time."

References

[FIPS-180-1] FIPS-180-1, Secure Hash Standard.
U.S. Department of Commerce/N.I.S.T., 1994
[FIPS-197] FIPS-197, Announcing the Advanced Encryption Standard.
U.S. Department of Commerce/N.I.S.T., 2001
[FIPS-198a] FIPS-198a, The Keyed-Hash Message Authentication Code (HMAC).
U.S. Department of Commerce/N.I.S.T., 2002

Credits: David Mazières, Antonio Nicolosi, and Nelly Fazio
Versions of this Lab also assigned in classes taught at NYU and CUNY/City College.