[ Introduction | Software Setup | Lab Specification | Design Guidelines | Collaboration Policy | Hand-in Procedure | Useful Functions | References ]
In this lab, you will build a simple utility to keep your file safe from the scrutiny of your system administrator ;-) The utility will allow you to generate a symmetric encryption/decryption key for an Adaptive Chosen-Ciphertext (CCA) secure Symmetric-Key Encryption scheme. You can then encrypt your files so that only you (and those to whom you give the symmetric key) can recover the original plaintext. By doing the lab, you will gain an insight on how simple cryptographic tasks are implemented.
We provide you with a simple cryptographic library (libdcrypt), containing an implementation of all the cryptographic tools you will need for the lab. (See Lab0 for instructions on how to install this library in your machine.)
We have prepared skeleton files to help you get started with the lab. To set up the files on your account, download lab1.tar.gz, and type the following:
% tar xzf lab1.tar.gz % cd lab1 %
Your Personal Vault utility will consist of three programs:
pv_keygen, pv_encrypt and
pv_decrypt. We provide you with a skeleton source
directory (lab1.tar.gz), containing
the following files:
% ls lab1/
Makefile pv_decrypt.c pv_keygen.c
pv.h pv_encrypt.c pv_misc.c
%
***(The provided Makefile assumes that you will be doing the assignment on the linux-lab cluster (either remotely via ssh, or in person at Burchard 126). If you are working on your machine, see the note below.)***
Once you have implemented the necessary functions (see below), you will build the three programs
pv_keygen, pv_encrypt and
pv_decrypt using make:
% make
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror
-I. -I/usr/local/include/ -I/home/nicolosi/cs579/devel/include/ -c
pv_keygen.c pv_misc.c
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -o
pv_keygen pv_keygen.o pv_misc.o -L. -L/usr/local/lib/
-L/home/nicolosi/cs579/devel/lib/ -ldcrypt -lgmp
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror
-I. -I/usr/local/include/ -I/home/nicolosi/cs579/devel/include/ -c
pv_encrypt.c pv_misc.c
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -o
pv_encrypt pv_encrypt.o pv_misc.o -L. -L/usr/local/lib/
-L/home/nicolosi/cs579/devel/lib/ -ldcrypt -lgmp
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror
-I. -I/usr/local/include/ -I/home/nicolosi/cs579/devel/include/ -c
pv_decrypt.c pv_misc.c
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -o
pv_decrypt pv_decrypt.o pv_misc.o -L. -L/usr/local/lib/
-L/home/nicolosi/cs579/devel/lib/ -ldcrypt -lgmp
%
You should now be able to create your own symmetric key, and use it to encrypt/decypt your files as follows:
% ./pv_keygen my_key.b64 % yes "test" | head -1000 > a_file % ./pv_encrypt my_key.b64 a_file an_encrypted_file % ./pv_decrypt my_key.b64 an_encrypted_file a_decrypted_file % diff a_file a_decrypted_file %
Note on setting up lab1 if you are
*not* working on the linux-lab cluster
If you have installed the
libraries on your own machine, you may need to edit the
Makefile that was provided in
lab1.tar.gz.
Locate the following lines (toward the beginning of
Makefile):
INCLUDES = /usr/include/ LIBS = /usr/lib/ DCRYPTINCLUDE = /home/nicolosi/cs579/devel/include/ DCRYPTLIB = /home/nicolosi/cs579/devel/lib/
Next, edit these lines so that:
INCLUDES and LIBS point to the
directories containing respectively the headers and lib files for
gmp and dmalloc (this will likely be one of
/usr/lib/, /usr/local/lib/,
/usr/shared/lib/, or similar);
DCRYPTINCLUDE and DCRYPTLIB point to the
directories containing respectively the headers and lib files for
libdcrypt.
Now that you know how your personal vault utility is supposed to work, let's get into doing something.
We provide you with an incomplete implementation of
pv_keygen.c,
pv_encrypt.c,
and pv_decrypt.c.
(Some miscellaneous auxiliary tasks are implemented for you
in pv.h and
pv_misc.c.)
Your job is to fill in the code where you see the comment line
/* YOUR CODE HERE */. (But of course, feel free to change
any part of the code as you see fit.)
Mainly, you'll be working on three places:
void write_skfile (const char *skfname, void *raw_sk, size_t
raw_sklen) (in pv_keygen.c)
void encrypt_file (const char *ctxt_fname, const char *sk,
int fin) (in pv_encrypt.c)
void decrypt_file (const char *ptxt_fname, const char *sk,
int fin) (in pv_decrypt.c)
The task of write_skfile is to encode the raw binary
symmetric key using the data serialization functions described below.
Conversely, import_sk_from_file should allocate and
fill in a raw binary buffer, big enough to contain the
deserialization of the ASCII data in the key file.
The task of encrypt_file is to read the content from the
file descriptor fin, encrypt it using raw_sk,
and place the resulting ciphertext in a file named ctxt_fname.
The encryption should be CCA-secure, which is the level of cryptographic protection that you should always expect of any implementation of an encryption algorithm.
Here are some guidelines, but you are welcome to make variations, as long as you can argue that your code still attains CCA security.
One approach is to use AES in CBC-mode, and then append an HSHA-1
mac of the resulting ciphertext. (Always mac after encrypting!)
The dcrypt library contains implementations of AES
and of HMAC SHA-1 (cf. Useful functions).
However, you should take care of using AES in
CBC-mode, as the library only gives access to the basic AES block
cipher functionality, which is not CCA-secure.
Notice that the key used to compute the HMAC SHA-1 mac must be
different from the one used by AES. Never use
the same cryptographic key for two different purposes: bad
interference could occur.
For this reason, the key
raw_sk actually consists of two pieces, one for AES and
one for HMAC SHA-1. The length of each piece (and hence the
cryptographic strength of the encryption) is specified by the
constant CCA_STRENGTH in
pv.h; the default is 128 bits, or 16 bytes.
Recall that AES can only encrypt blocks of 128 bits, so you should use some padding in the case that the length (in bytes) of the plaintext is not a multiple of 16. This should be done in a way that allow proper decoding after decryption: in particular, the recipient must have a way to know where the padding begins so that it can be chopped off. (Using a different mode of operation, like CFB, would not require padding.)
One possible design is to add enough 0 bytes to the plaintext so as to make its length a multiple of 16, and then append a byte at the end specifying how many zero-bytes were appended.
Thus, the overall layout of an encrypted file will be:
+--------------------------+---+--------+
| Y | W | padlen |
+--------------------------+---+--------+
where Y = CBC-AES (K_AES, {plaintext, 0^padlen})
W = HMAC-SHA-1 (K_HSHA-1, Y)
padlen = no. of zero-bytes added to the plaintext to make its
length a multiple of 16
As for the sizes of the various components of a ciphertext file, notice that:
Y (in bytes) is a multiple of 16,
HSHA-1 (K_HSHA-1, Y) is 20-byte-long, and
padlen is a sigle byte.
The task of decrypt_file is to read the ciphertext from
the file descriptor fin, decrypt it using
sk, and place the resulting plaintext in a file named
ptxt_fname.
This procedure basically should just "undo" the operations performed
by encrypt_file; for this reason,
decrypt_file expects a ciphertext featuring the structure
described above.
Reading Y (and then the mac and the pad length) is a
bit tricky: below we sketch one possible approach, but you are free
to implement this as you wish.
The idea is based on the fact that the ciphertext file ends with 21
bytes (i.e., the size of a hash + 1) used up by the HSHA-1 mac and by
the pad length. Thus, we will repeatedly attempt to perform "long
reads" of (aes_blocklen + sha1_hashsize + 2)
bytes: once we get to the end of the ciphertext and only the last
chunk of Y has to be read, such "long reads" will
encounter the end-of-file, at which point we will know where
Y ends, and how to finish reading the last bytes of the
ciphertext.
For encrypting a stream of bytes that does not require random access, people often employ a technique known as Cipher-Block Chaining (CBC). To encrypt in CBC mode, one thinks of the stream of bytes as a sequence of block, each of the size of the block cipher being used (AES in your case); then, one XORs each plaintext block with the encryption of the previous block before encrypting, as shown here:
If the plaintext blocks are m1, m2, ..., and the ciphertext blocks c1, c2, ..., then encryption and decryption in CBC mode are performed as follows:
ci = E(mi XOR ci-1)
mi = D(ci) XOR ci-1
The first plaintext block is XORed with an initialization vector, or IV (which you can think of as c0, since there is no m0). The IV can be publicly known, but should be chosen afresh at random each time the same key will be used to encrypt, so that each ciphertext uses a different IV.
You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assigment. You are not allowed to look at anyone else's solution. You may discuss the assignments with other students, but you may not look at or copy each others' code. You may not copy code that might be available online.
You must submit two files:
To build a software distribution, run the following commands (from the directory where your source files are located):
% cd .. % tar cf pv.tar lab1/ % gzip pv.tar %
(If the name of the directory containing your sources is not
lab1, then substitute the appropriate name in the
second command above.)
To create a script file, use the script command. When you run script, everything you type gets saved in a file called typescript. Press CTRL-D to finish the script. For example:
% script Script started, output file is typescript % ./pv_keygen my_key.b64 % yes "test" | head -1000 > a_file % ./pv_encrypt my_key.b64 a_file an_encrypted_file % ./pv_decrypt my_key.b64 an_encrypted_file a_decrypted_file % diff a_file a_decrypted_file % % ... % yes "test" | head -1015 > a_file % ./pv_encrypt my_key.b64 a_file an_encrypted_file % ./pv_decrypt my_key.b64 an_encrypted_file a_decrypted_file % diff a_file a_decrypted_file % % ^D % ^D Script done, output file is typescript %
To turn in your distribution and script file, e-mail the files
pv.tar.gz and typescript to me at
"nicolosi AT cs DOT stevens DOT edu"
by April 9, 11:55pm.
This completes the lab.
Below is a description of some of the functions implemented in the dcrypt library that you may find useful in completing the assignment. You will need to include the dcrypt.h header file to access these functions. You may also want to take a look at these sample programs (tst.c, tst_sha1.c) to see some of these functions in action.
void putint (void *dp, u_int32_t val);
void puthyper (void *dp, u_int64_t val);putint function puts the 32-bit integer value of
val into memory in big-endian order at location
dp. dp does not need to be aligned. The
bytes stored at dp will be the same on big- and
little-endian machines. puthyper is like
putint but puts a 64-bit value into 8 bytes of memory.
u_int32_t getint (const void *dp);
u_int64_t gethyper (const void *dp);getint and gethyper routines retrieve
values stored by putint and puthyper
respectively.
char *armor64 (const void *dp, size_t len);len bytes from the binary string pointed by
dp to a longer, base-64, printable ASCII string. You
will need to use this to transform random session keys (which could
contain zero-bytes) into a NULL-terminated ANSI C string.
ssize_t dearmor64 (void *out, const char *s);armor64 function, and return the number of
bytes that were placed at out. The return value is
negative if the NULL-terminated ANSI C string
s is not the output of armor64.
ssize_t armor64len (const char *s);s. If some prefix of s represents a valid
armor64 string, then the length of such prefix is returned. Otherwise,
-1 is returned, indicating that s is not the output of armor64.
ssize_t dearmor64len (const char *s);s. If some prefix of s represents a valid
armor64 string, then the length of the decoded data that would result
by "dearmoring" s is returned. Otherwise, -1 is
returned, indicating that s is not the output of armor64.
The libraries you are using contain a cryptographic pseudo-random
number generator, whose state is kept in a global 16-byte array called
prng_state. Before using the random number generator,
you must initialize it.
void prng_seed (void *buf, size_t len);len bytes from buf as seed. Providing
a good seed may be a difficult task; some Operating Systems
(including FreeBSD, OpenBSD and most Linux distributions) provide
you with a source of randomness under /dev/random (or one
of its variants: /dev/srandom, /dev/urandom,
etc.). If a random
device is available, you should read (at least) 128 bits from it and
use it as a seed; otherwise, as a very rough
approximation, you could supply some information about your local
machine (e.g., time of the day, PID/GID value) that is
difficult to predict.
void prng_getbytes (void *buf, size_t len);len pseudo-random bytes to memory at location
buf.
u_int32_t prng_getword ();u_int64_t prng_gethyper ();For actually encrypting and decrypting file data, you will use the Rijndael [FIPS-197] block cipher (also called AES—Advanced Encryption Standard). Rijndael is a 128-bit block cipher. It supports two operations--encryption, and decryption. Encryption transforms 16 bytes (128 bits) of plaintext data into 16 bytes of ciphertext data using a secret key. Someone who does not know the secret key cannot recover the plaintext from the ciphertext. The decryption algorithm, given knowledge of the secret key, transforms ciphertext into plaintext.
The libraries you are using define a struct called
aes_ctx that you should use to hold secret keys for AES.
You should manipulate your AES secret keys with the following functions:
void aes_setkey (aes_ctx *aes, const void *key, u_int
len);len bytes fro
the buffer key. The key must be 16-, 24-, or
32-long.
void aes_encrypt (const aes_ctx *aes, void *buf, const void
*ibuf);aes_encrypt transforms 16 bytes of plaintext data at
ibuf into 16 bytes of ciphertext data which it writes to
buf. It uses the secret key previously stored within
aes using the aes_setkey function.
void aes_decrypt (const aes_ctx *aes, void *buf, const void
*ibuf);aes_decrypt decrypts 16 bytes, inverting the
aes_encrypt function.
void aes_clrkey (aes_ctx *aes);aes, thus wiping out the secret
encryption key that was previosly stored there. The SHA-1 [FIPS-180-1] hash function hashes an arbitrary-length input (up to 2^64 bytes) to a 20-byte output. SHA-1 is known as a cryptographic hash function. While nothing has been formally proven about the function, it is generally assumed that SHA-1 is one-way and collision-resistant. These properties are defined as follows:
For someone who steals the file of password hashes, there is no known way of recovering passwords more efficient than guessing passwords and verifying the guesses. (Of course, the fact that users often choose easily-guessed passwords is a problem.)
Collision-resistant functions have many uses, stemming from the fact that the short output value effectively uniquely specifies an arbitrary-length input. One cannot recover the input from the output, but given the input, one can verify that it does, indeed, match the output. One might, for instance, implement a web cache in which contents is indexed by a SHA-1 hash of the URL. Having fixed-length names for stored content would simplify the implementation.
The libraries you are using contain an implementation of SHA-1.
void sha1_hash (void *digest, const void *buf, size_t
len);len bytes of data at buf, and places
the resulting 20 bytes at digest.
Sometimes the input that you want to hash is so long that it is
inconvenient to store it entirely in memory before being able to
hash it. This is the case for example when hashing the entire content
of a file into a short digest.
For this reason, the libraries you are using allow you to process a
long input "one chunk at a time." To do that, you should use a
struct called sha1_ctx, which will store the
"partial digest" as you keep providing new input to be hashed. You
should manipulate sha1_ctx structs with the
following functions:
void sha1_init (sha1_ctx *sc);sha1_ctx struct that will contain the
partial hash.
void sha1_update (sha1_ctx *sc, const void *data, size_t
len);len bytes at data to the input being
hashed, but does not produce a result. Thus, one can hash a large
amount of data without having it all in memory, by calling
sha1_update on one chunk at a time.
void sha1_final (sha1_ctx *sc, void *digest);digest.
Message Authentication Codes (MACs) are a symmetric-key primitive allowing you to check the integrity of the information to which the MAC is applied. Recall that encryption does not guarantee integrity! The fact that you were able to decrypt a ciphertext is not enough to be sure that nobody tampered with its content. For integrity, you should always append a MAC to the content.
You use MACs as follows. Let's say that you want to store a file on your file-server, but you are afraid that its content will be changed behind your back. Then, you use a secret key to "mac" the file, and store the resulting MAC along with the file. Now, when you check back with your file-server and retrieve your file, you will also retrieve the MAC that you appended. Then, you will use the secret key to compute the MAC again, and if the MAC you just computed is the same as the value that you retrieved from the file-server, then you are sure that nobody touched your file. This is because, if somebody had changed the file, then they should have computed the corresponding MAC in order to fool you. However, secure MACs are concocted such that, without knowing the secret key, it is computationally intractable to compute the right MAC, even after having seen a lot of valid (message, MAC) pairs.
The Keyed-Hash Message Authentication Code (HMAC) [FIPS-198a] is a secure Message Authentication Code based on the use of any cryptographic hash function, like SHA-1. The libraries you are using contain an implementation of HMAC, instantiated with the SHA-1 cryptographic hash function.
void hmac_sha1 (const char *key, size_t keylen, void *out, const void
*data, size_t dlen);len bytes of data at
buf using the key key, and places
the resulting 20 bytes at out.
Similarly to what discussed for the case of SHA-1, the libraries you are using allow you to process a long input "one chunk at a time."
void hmac_sha1_init (const char *key, size_t keylen,
sha1_ctx *sc);sha1_ctx struct that will contain the
partial HMAC under the key key.
void hmac_sha1_update (sha1_ctx *sc, const void *data, size_t
len);len bytes from data to the input being
hmac'ed, but does not produce a result. Thus, one can hmac a large
amount of data without having it all in memory, by calling
hmac_sha1_update on one chunk at a time.
void hmac_sha1_final (const char *key, size_t keylen,
sha1_ctx *sc, void *out);out. key used in
hmac_sha1_final is different from the one initially used in
hmac_sha1_init.
| [FIPS-180-1] |
FIPS-180-1, Secure Hash Standard. U.S. Department of Commerce/N.I.S.T., 1994 |
|---|---|
| [FIPS-197] |
FIPS-197, Announcing the Advanced Encryption Standard. U.S. Department of Commerce/N.I.S.T., 2001 |
| [FIPS-198a] |
FIPS-198a, The Keyed-Hash Message Authentication Code
(HMAC). U.S. Department of Commerce/N.I.S.T., 2002 |