CS578---Privacy in a Networked World Instructor: Antonio R. Nicolosi Notes for blackboard presentation, Week 4 (11 February 2008) Motivation ========== Why do we need to look at Crypto: - Data Secrecy is among the many goals of Privacy. Some people even identify Privacy with Data Secrecy, and use the term Data Privacy for that. In this class, we saw that Privacy is actually more than just "keeping stuff secret"; nevertheless, the ability to store data securely is instrumental to maintaining informational privacy. Sample settings where we want data secrecy: when we transmit data to an information collector (we do not want others to listen); when we store our personal data on untrusted storage; when organizations store information about us on any storage (to prevent unauthorized access to personal info in case of loss or theft of the storage device, eg in the case of misplaced laptops). - Data Integrity is also important for privacy. It allows you to detect whether somebody has tampered with your data (File storage example). It's also a separate requirement for the secure data transmission (second example): data secrecy only guarantees others won't understand the data while in flight, but it does not ensure that they won't be able to maul it. You need to worry about that separately. - (Entity) Authentication: Crypto can give protocols to allow people to prove they are who they claim to be, without having to relay too much on identifying credentials. - Data authentication/origin also useful. Taxonomy of basic crypto techniques =================================== Keyless, symmetric-key, public-key. * Clarification about keys in the sense of random bits, vs. database keys as unique identifiers. Keyless techniques ================== * Cryptographic hash functions. - Family of functions; not just one function Clarification using MD5/SHA1 example: just prepend a 80-bit string to all inputs: you get a different function for each choice of the 80 bits. (NB: this 80-bit string sometimes is called a key: but we won't need it being secret here) - Three flavors: Universal, Universal one-way, collision-resistant - Common theme: make it hard for the following equation to arise: f_k (x_0) = f_k (x_1) (called a "collision") where k is the index; f_k (.) is one of the functions from the family; and x_0, x_1 are two inputs (eg two files if f is SHA1). - Why preventing this? We want f_k (x) to serve as a shorthand for x. So we don't like different things (x_0 and x_1) have the same shorthand form. - Who picks k, x_0, x_1? k is chosen at random. x_0 and x_1 are arbitrarily chosen by an (efficient) attacker. (Efficient =~= no computation requiring 100 years is allowed.) But the order of choosing makes the difference: [ Times elapses as you move right: ----+-------+-------+----> ] | | | + Universal hash functions: x_0 x_1 k + Universal one-way hash functions x_0 k x_1 + Collision resistance k x_0 x_1 - Which property is stronger/more difficult to achieve? Collision resistance is the stronger property. If you look at the collision resistance scenario above, you'll see that the attacker knows which function from the family will be used __before__ having to choose the inputs for the function. So the attacker can play and see whether she can find a pair of inputs that "collide." If you compare this to the first scenario, you'll notice that there the attacker cannot even tell whether she will be successful or not until she has "tied her own hands," since k is chosen after both x_0 and x_1. Notice that collision-resistance attack against MD5 are known; SHA1 is not nearly as broken but should not be used for collision-resistance anymore; rather, SHA-256 or SHA-512 should be used. On the other hand, universal one-wayness of both MD5 and SHA1 are still more or less okay (to my knowledge, to date). - Why do people say that "SHA-1 is a CRHF," rather than talk about a family of functions, as we did in class? The point is that you could imagine the key k for the SHA-1 function as having been fixed, once and for all, with the code for SHA-1. Since in the CRHF game the attacker gets to see the key k before having to choose the inputs x_0 and x_1, that's okay. (Counter-question: But can we say that the pre-fixed key in the SHA-1 source code was chosen at random? Answer: That's way more technical than what we are doing here ;) - How do CRHFs help for data integrity? Say you have an account on the CS cluster of machines (lab.cs.stevens.edu), and you want to store a file in your home directory. Say you are afraid the sys-admin will change it "behind your back." Then, you store the file, compute the SHA1 hash of the file, and write this hash on a piece of paper (40 characters). You keep the piece of paper with you, and later come back to check your file. You hash the file again, and compare the hash value you get with the characters in your piece of paper. Denoting the file with F, the hash with h, then the goal of the attacker is to come up with F' such that SHA-1 (F) = SHA-1 (F') which is a collision of the above kind. Notice that the attacker here does not get to choose but files: still, if SHA-1 behaves like a CRHFs, this only makes the task of the attacker harder. * One-way functions and one-way permutations - How do OWFs differ from the above Cryptographic hash functions? The point of OWFs is to prevent the attacker from obtaining a different kind of equation: f_k (x) = y where the attacker is given k and y, and needs to find x. A common special case is when all f_k's are permutations (one-way permutations): in this case, there is only one x that "hits" any given y, so that really we are asking the attacker to "invert" the function f_k. When she cannot do so, we say the function is one-way: you can go in one direction: f_k x |-----------------> y but can never go in the opposition direction: f_k^{-1} x <------- X -------| y - Why do we care about one-wayness? Many reasons: for our purposes, they are useful in user authentication resistant to "snooping": S/key login. * Pseudo-random generators * Keyless techniques: Key Derivation functions Symmetric-key techniques ======================== * Symmetric-key encryption - One-time pad: its glory ("in XOR we trust") and its problems (one-time usage; length requirements). - Overcoming the XOR problems: Pseudo-random generators and stream ciphers (Problem: stateful---need to remember which portion of the pseudo-random stream was used already) - Block ciphers (AES, Blowfish, DES, 3-DES) and modes of operations. - Remark: Symmetric-key encryption only addresses data secrecy!! Attacker could maul the ciphertext, and recipient never notice. - Need data integrity on top of encryption - Can we use CRHFs? Not quite. Try to see what goes wrong with the following: Alice ---------- c, h ------------> Bob where c = Enc_k (m), h = SHA-1 (c) - Solution: Message Authentication Codes (MACs) * Message Authentication Codes (MACs) * Challenge-Response protocols - (used in password-based authentication; suffers from offline dictionary attacks)