ISE Cryptography — Lecture 04

Message Integrity

Announcements

Midterm is taking place during class next week!
- Be at your desk and ready to start at 09:00
- Exam will be on Brightspace, but can’t be taken remotely
Questions will be a mix of MCQs and written responses
- Based on the content of lectures, exercises and assigned reading
- Covering Weeks 1 through 4. Weaker Qs from weekly quizzes may be repeated!
What kind of questions? A mixed approach:
- Remember, Understand, Apply, Analyze, Evaluate
- I don’t ask trick questions, but I do ask challenging questions
- If the exam is really easy and everyone gets an A, it’s not a useful assessment
- If the exam is really difficult and everyone gets an F, it’s also not useful
- Expect it to be appropriately challenging!
Learning off the slides is usually good for a C. You need understanding to get an A!

The Malleability Problem

Confidentiality is not enough!

Confidentiality vs Integrity

We’ve built up strong defenses against eavesdropping adversaries
- Passive adversaries
- Can read transmitted messages, but can’t alter them or send their own
But we’ve neglected active adversaries so far
- Malicious adversaries can modify messages in transit!
If Bob receives a message from Alice…
- …can he convince himself that it wasn’t modified in transit?
- This is the question of message integrity
- Can any of the schemes we’ve looked at guarantee this?
Integrity is orthogonal to message secrecy
- You don’t have to understand a message to alter it
Many cases where integrity without secrecy is all that’s required
- Ethernet uses a CRC32, TCP uses a 16-bit checksum… any other examples?

CTR Mode is Malleable

Consider a CTR-mode encrypted payment: Alice sends Bob a transfer
- The plaintext has a structured format with an amount, sender and recipient
An adversary intercepts the ciphertext \(c = m \oplus F(k, \text{nonce})\)
- They don’t know the key. They can’t decrypt. But they don’t need to!
CTR mode uses XOR: flipping a bit in \(c\) flips the corresponding bit in the decrypted \(m\)
- The adversary knows the structure of the message
- They XOR the appropriate ciphertext bytes to change the amount
Bob decrypts successfully and sees a different amount
- The cipher did exactly what it was supposed to do
- It delivered confidentiality. Nobody read the message in transit
But the adversary changed the message without detection
- CTR mode, CBC mode, and all CPA-secure ciphers that we’ve seen so far are vulnerable to this
- We asked the wrong question: we need integrity, not just secrecy

Types of Integrity

The adversary can intercept a message, alter it and forward it
- Checksum-style ideas can help with this
- But anyone can compute a valid checksum! We need to include a secret
The adversary can also attempt a replay attack by resending old messages
- Messages can also be reordered
- Some messages may be dropped entirely
Embedding some kind of sequence number in the message can help
- …but only if we ensure that the sequence number is valid!
Keep this idea of tagging a message with associated data in mind for later
- Message metadata should be tamper-proof, but not necessarily secret

Authenticity

Integrity asks if the message Bob receives has been altered in transit
- Some integrity checks guard against problems with the channel, e.g. Ethernet
- We’re interested in guarding against malicious third-party interference!
Authenticity asks a related, but distinct, question
- Is the message actually from the supposed sender?
- Did it really come from Alice, or from an adversary?
- Can we use a shared secret to guarantee that the message is authentic?
We’re going to tackle both these questions today!

Keyed and Keyless

Keyless integrity checks are designed to detect transmission errors
- Random errors due to interference in the transmission medium
- Anyone can compute a valid CRC32 tag for an Ethernet message
- No secret involved, analogous to encoding rather than encryption
We’ll discuss keyed integrity checks for our purposes
- An adversary shouldn’t be able to make valid tags for arbitrary messages
- Only those who know the secret key should be able to do so!
- More like encryption than encoding
Tags are generally short, fixed-length bitstrings
- Only a small transmission overhead

MAC Definition

Tagging messages with a shared secret.

Message Authentication Code (MAC)

A Message Authentication Code (MAC) ensures that a message has not been tampered with during transmission
- Uses a shared secret key between two parties to verify message integrity
- Why? So the adversary can’t produce valid MACs!
A MAC system \(\mathcal{I} = (S, V)\) consists of two efficient algorithms:
- \(S\) is the signing algorithm used to generate tags
- \(V\) is the verification algorithm used to validate tags
MACs are defined over the key, message and tag spaces \((\mathcal{K}, \mathcal{M}, \mathcal{T})\)
\(S\) is a probabilistic algorithm invoked as \(t \leftarrow S(k, m)\)
- \(t\) is the tag, \(k\) is the key, and \(m\) is the message
- \(S\) can be probabilistic, but doesn’t have to be
\(V\) is a deterministic algorithm invoked as \(r \leftarrow V(k, m, t)\)
- The output \(r\) is either accept or reject
MACs must satisfy their correctness property for all keys \(k\) and messages \(m\)
- Pr[\(V(k, m, S(k, m))\) = accept] = 1

Randomised vs Deterministic

Our definitions for computational ciphers also allowed for encryption to be probabilistic
- And we did! CBC with a random IV is probabilistic encryption
- With MACs, probabilistic versions are also common in practice
Deterministic MAC system
- Same input \((k, m)\) always produces the same tag t
- Verification is simple: accept if \(S(k, m) = t\)
- Guarantees unique tag per message
Randomized MAC system
- Same \((k, m)\) might produce different valid tags on different runs
- May provide better trade-offs in terms of security and efficiency
Both types can be secure!

The UF-CMA Game

Existential unforgeability under chosen message attack.

Defining MAC Security

Let’s take a conservative approach to security. If a MAC can hold up in a hostile environment against powerful adversaries, then it’s good for day-to-day use too!
As usual, let’s build an attack game to demonstrate!
Let’s hold the key \(k\) constant, like we did for cipher attack games
The adversary can request tags for millions of arbitrary messages of their choice
- They can request \(t = S(k, m)\) for arbitrary \(m\)
- The result is a signed pair \((m, t)\)
- This is called a chosen message attack
We challenge the adversary to forge a new valid message-tag pair \((m, t)\)
- New? Has to be different from all of the signed pairs generated so far
- This is called an existential MAC forgery

Why Chosen Messages? Why Existential Forgery?

Chosen message attack seems too generous to the adversary. Why allow it?
- In practice, adversaries often can trigger MACs on messages of their choice
- A web server might MAC cookie values an attacker controls
- A protocol might authenticate attacker-influenced metadata
- If the MAC is secure even when the adversary picks the messages, it’s secure everywhere
Existential forgery seems too weak. Who cares about forging a random message?
- If the adversary can’t forge a tag for any message of their choosing…
- …they certainly can’t forge one for a specific target message!
- Conservative definitions make for stronger guarantees

MAC Security: Formal Definition

The challenger picks a random \(k \in \mathcal{K}\)
\(\mathcal{A}\) issues \(Q\) signing queries to the challenger
- The \(i\)th signing query is a message \(m_i \in \mathcal{M}\)
- The challenger computes \(t_i \leftarrow S(k, m_i)\) and returns \(t_i\) to \(\mathcal{A}\)
- \(\mathcal{A}\) stores the signed pair \((m_i, t_i)\)
\(\mathcal{A}\) outputs a candidate forgery pair \((m, t) \in \mathcal{M} \times \mathcal{T}\)
- Where \((m, t)\) is different from all of the signed pairs queried by \(\mathcal{A}\)
\(\mathcal{A}\) wins if \((m, t)\) is a valid pair under \(k\), i.e. if \(V(k, m, t)\) = accept
- \(\text{MAC}_\text{adv}[\mathcal{A}, \mathcal{I}] = \Pr[V(k, m, t) = \texttt{accept}]\)
\(\mathcal{A}\) is a \(Q\)-query MAC adversary
A MAC system \(\mathcal{I}\) is secure if for all efficient adversaries \(\mathcal{A}\)
- \(\text{MAC}_\text{adv}[\mathcal{A}, \mathcal{I}]\) is negligible
Secure MAC systems are existentially unforgeable under chosen message attack

MAC Security: Attack Game

Deterministic vs Randomised Security

A MAC system can be deterministic or randomized
For deterministic MAC systems, security implications are simpler
- The adversary must produce a valid tag for a new message
- Similar to block cipher unpredictability
For randomized MAC systems, this is a stronger security property
- May be many valid tags for a given message
- A new, valid tag for an already-queried message is still an existential forgery!
In general, having a valid tag for a message gives no useful information…
- To generate a valid tag for a different message
- To generate a new, valid tag for the same message
We could also allow the adversary to perform verification queries in addition to signing queries. This seems like an even more powerful adversary…
- But it turns out that the two attack games are equivalent (with an error term)
- Intuition: the challenger can just reply reject to all verification queries. If that answer is ever wrong, the adversary already submitted a valid forgery!

Why Freshness? Replay is Trivial

Note that the UF-CMA game requires the forgery \((m, t)\) to be fresh
- The adversary can’t just replay a pair they already received
This isn’t because replay attacks don’t matter. They absolutely do!
- Replaying a valid “transfer funds” message is a real attack
But replay protection is a protocol-level concern, not a MAC-level concern
- Sequence numbers, timestamps, and nonces handle replay at the protocol layer
The MAC’s job is to prevent the adversary from creating new valid tags
- If they can’t forge, they can only replay (which the protocol blocks)

One-Time MACs

Information-theoretic security for a single message.

The Polynomial MAC

Before building multi-message MACs from PRFs, let’s see that perfect one-time security is achievable
- No computational assumptions needed! Secure against unbounded adversaries
- But only for a single message
This mirrors the OTP/stream cipher pattern from Lecture 01
- The OTP is perfectly secret for one message; PRG-based ciphers extend to many
- The one-time MAC is perfectly unforgeable for one message; PRF-based MACs extend to many
- The limitation motivates why we need computational assumptions for multi-message security
Construction: work in \(\mathbb{Z}_p\) for a large prime \(p\)
- Key is a pair \((k_1, k_2) \xleftarrow{R} \mathbb{Z}_p^2\)
- For a message \(m = (a_1, \ldots, a_v)\) with each \(a_i \in \mathbb{Z}_p\):
- \(S((k_1, k_2), m) = a_1 k_1^v + a_2 k_1^{v-1} + \cdots + a_v k_1 + k_2 \mod p\)
- Evaluate the message as a polynomial at \(k_1\), then mask with \(k_2\)
- Example: if \(m = (a_1, a_2, a_3)\), the tag is \(a_1 k_1^3 + a_2 k_1^2 + a_3 k_1 + k_2\)
- The message defines the polynomial; the secret \(k_1\) is where we evaluate it

The core idea: each message block becomes a coefficient of a polynomial. Different messages produce different polynomials. Evaluating at the secret point \(k_1\) maps the message to a tag.

Why polynomial evaluation? Two different polynomials of degree \(v\) can agree on at most \(v\) points (out of \(p\) total). So knowing the value at one secret point tells you almost nothing about the value at any other point. This is what makes forgery hard.

Why \(k_2\)? It acts as a one-time pad on the output. Without it, the adversary could exploit algebraic relationships between tags (e.g., linearity). With \(k_2\), the tag is uniformly random and independent of the polynomial structure.

Connection to practice: this is exactly how Poly1305 works in ChaCha20-Poly1305, but evaluated over GF(\(2^{130}-5\)) instead of \(\mathbb{Z}_p\).

One-Time MAC Security

With a single signing query, the adversary learns one point on the polynomial
- The polynomial has degree \(v\), so one point reveals almost nothing about \(k_1\)
- And \(k_2\) acts as a one-time pad on the output
For any message \(m' \neq m\), the probability of guessing the correct tag is at most \((v+1)/p\)
- With \(p \approx 2^{128}\), this is negligible
But with two signing queries, the adversary learns two points
- Two points on a degree-\(v\) polynomial with a linear mask: \(k_1\) can be recovered
- \(k_2\) is then trivially computed
- The MAC is completely broken after reuse!
This is the fundamental limitation: one-time security requires fresh keys
- For multi-message security, we need computational assumptions (PRFs)
- But one-time MACs are not just a curiosity: Poly1305 (used in ChaCha20-Poly1305) is exactly this kind of polynomial MAC

Walk through the security argument step by step:

The adversary sees one (message, tag) pair: \((m, t)\) where \(t = P_m(k_1) + k_2\). This is one equation in two unknowns (\(k_1\) and \(k_2\)). For every possible value of \(k_1\), there is exactly one \(k_2\) that fits. So the single observation gives zero information about \(k_1\).

To forge a tag \(t'\) for a new message \(m'\), the adversary needs \(t' = P_{m'}(k_1) + k_2\). Since \(k_2 = t - P_m(k_1)\), this simplifies to \(t' = P_{m'}(k_1) - P_m(k_1) + t\). The difference \(P_{m'}(x) - P_m(x)\) is a non-zero polynomial (because \(m' \neq m\)) of degree at most \(v\). For any fixed guess of \(t'\), this equation has at most \(v\) solutions for \(k_1\) out of \(p\) total values. So the forgery probability is at most \(v/p\).

Why does a second query break it? Two observations give two equations in two unknowns. With high probability, \(k_1\) is uniquely determined (the system has a unique solution), and then \(k_2\) follows immediately. The entire key is recovered.

Building MACs from PRFs

The PRF-to-MAC reduction.

When You Have a Hammer…

Let’s say we already have a secure PRF. Can we turn it into a MAC?
- Remember what a PRF is from Lecture 03?
- Block ciphers like AES (under certain conditions) are secure PRFs
- No point inventing something brand-new, right?
It’s surprisingly easy! Here’s how we can do it:
- Define \(S(k, m) = F(k, m)\), i.e. just apply the PRF to the message.
- To verify, check if \(F(k, m) = t\). If so, accept; if not, reject.
This gives us a deterministic MAC
- A single valid tag \(t\) exists for a given message \(m\) under a fixed key \(k\)
The security comes from the PRF’s unpredictability
- If \(F\) is a secure PRF, attackers can’t predict tags for new messages
Well, that was suspiciously easy, wasn’t it?
- How big is the tag if we use AES? 128 bits, so forgery probability is \(1/2^{128}\). That’s fine!
- How big is the message if we use AES? Also 128 bits. That’s a problem!

The Reduction: Intuition

Let’s backtrack to our definition of MAC security. Does a secure PRF pass the test?
To make this easier, let’s pretend that our PRF \(F\) is a truly random function
The adversary can make as many queries as they want to, but it won’t help
When they want to find \(F(m)\) for a new \(m\) they haven’t queried yet…
- They’re completely in the dark!
- \(F(m)\) is random and independent of previous signed pairs
Their best strategy is to guess randomly, so the output space must be big enough
- Chance of guessing correctly is \(1/|\mathcal{Y}|\)
- \(|\mathcal{Y}|\) should be super-poly to make \(1/|\mathcal{Y}|\) negligible
All very nice, but how does this help with a secure PRF?
- Secure PRFs are indistinguishable from truly random functions…
- And guessing or brute-forcing random outputs isn’t feasible…
The same reduction pattern from Lecture 02 applies
- We’ll build an adversary \(\mathcal{B}\) that turns any MAC forger into a PRF distinguisher
- If the PRF is secure, the MAC forger can’t exist

Remind students of L02’s reduction framework: “real world vs ideal world.” In L02, the real world was the PRG output and the ideal world was a truly random string. Here, the real world is the PRF and the ideal world is a truly random function.

The logical structure: we want to show “secure PRF implies secure MAC.” The contrapositive is “insecure MAC implies insecure PRF.” So if someone can break the MAC, we use them to break the PRF. This is proof by contradiction with a constructive twist: we actually build a specific adversary B that uses the forger A as a subroutine.

Why start with the random function case? Because it’s easy to analyze: if the signing oracle is truly random, outputs are uniform and independent, so forging is just guessing. This gives us the “ideal world baseline.” The gap between the real world (PRF) and the ideal world (random function) is bounded by the PRF advantage.

The Reduction: Formal Argument

Suppose adversary \(\mathcal{A}\) can forge a MAC with non-negligible advantage
- Build a PRF distinguisher \(\mathcal{B}\) that uses \(\mathcal{A}\) as a subroutine
- \(\mathcal{B}\) receives oracle access to either \(F(k, \cdot)\) or a random function \(f\)
- \(\mathcal{B}\) forwards \(\mathcal{A}\)’s signing queries to its own oracle
- When \(\mathcal{A}\) outputs a forgery \((m, t)\), \(\mathcal{B}\) checks if \(t\) equals the oracle’s output on \(m\)
If the oracle is random, \(\mathcal{A}\) succeeds with probability at most \(1/|\mathcal{Y}|\)
- If the oracle is \(F(k, \cdot)\) and \(\mathcal{A}\) forges, \(\mathcal{B}\) distinguishes with the same advantage
So \(\text{MAC}_\text{adv}[\mathcal{A}, \mathcal{I}] \leq \text{PRF}_\text{adv}[\mathcal{B}, F] + 1/|\mathcal{Y}|\)
- Secure PRF implies secure MAC
- This is the course’s first full proof by reduction!
We proved MAC security without knowing anything about how \(F\) works internally
- Only that \(F\) is a secure PRF
- The same reduction template recurs for digital signatures, CCA security, and key exchange

Walk through each step of the reduction carefully:

Step 1 (Setup): Suppose, for contradiction, that adversary A can forge MACs with non-negligible probability. We’ll use A as a black box to build B, a PRF distinguisher.

Step 2 (B’s construction): B plays the PRF game: it receives oracle access to some function, which is either F(k, .) for a random key k, or a truly random function. B doesn’t know which. B’s goal is to figure out which one it got. B’s strategy: run A internally. Every time A asks for a tag on some message \(m_i\), B forwards the query to its own oracle and returns the oracle’s answer. B is pretending to be a MAC challenger, but using its PRF-game oracle as the signing function.

Step 3 (Forgery handling): Eventually A outputs (m, t) as its forgery attempt. B queries its oracle on m and checks whether the oracle’s answer equals t. If it matches, B outputs “this was the PRF.” If not, B outputs “this was random.”

Step 4 (Case analysis): If the oracle is a truly random function, A is trying to guess f(m) for a new message m that it never queried. Since f(m) is uniform in Y and independent of everything A has seen, A succeeds with probability exactly 1/|Y|. If the oracle is F(k, .), A is playing the real MAC game. By assumption, A forges with non-negligible probability \(\varepsilon\).

Step 5 (The bound): B’s PRF advantage is the gap between these two cases: \(\varepsilon - 1/|\mathcal{Y}|\). Rearranging: \(\varepsilon \leq \text{PRF}_\text{adv}[\mathcal{B}, F] + 1/|\mathcal{Y}|\). If F is a secure PRF, the first term is negligible. If |Y| is large (e.g., \(2^{128}\) for AES), the second term is also negligible. So \(\varepsilon\) must be negligible, contradicting our assumption.

Long Messages

As we’ve pointed out, there’s a minor problem with building MACs directly from secure PRFs like AES
- Messages tend to be a lot longer than the block size
- A MAC that can only verify integrity for 128-bit strings is pretty useless
But wait! We’ve solved this problem for block ciphers already, haven’t we?
Given a secure PRF on short inputs, let’s try to build a secure PRF on long inputs
- This is an analogous idea to block cipher modes of operation
- In fact, some modes of operation come with built-in authentication and integrity checks!

CBC-MAC and the Length Extension Problem

When prefix-freedom matters.

CBC-MAC

Let’s start by setting some conditions for the secure PRF we’ll use as a building block
\(F\) is a secure PRF that maps n-bit inputs to n-bit outputs
- \(F\) is defined over \((\mathcal{K}, \mathcal{X}, \mathcal{X})\) where \(\mathcal{X} = \{0, 1\}^n\)
Our new PRF \(F_\text{CBC}\) maps messages in \(\mathcal{X}^{\leq\ell}\) to tags in \(\mathcal{X}\)
- \(\ell\) is poly-bounded
CBC-MAC is based on CBC mode: chain message blocks through a PRF to get one final output to use as the tag. Changing any message block changes the tag!
- Split the message into blocks as usual, so \(m = (a_1, a_2, \ldots, a_v)\)
- CBC-MAC uses a fixed, all-zero IV, so start with \(t = 0^n\)
- For each block, let \(t \leftarrow F(k, a_i \oplus t)\), and the final value of \(t\) is the tag!
Just like CBC encryption, but…
- The IV is fixed, not unpredictable
- No intermediate outputs, just the final tag
If adversaries are restricted to prefix-free queries, CBC-MAC behaves like a random function
- Prefix-free: no query \(m_i\) is a prefix of any other query \(m_j\)
- Example: \(\{(a_1), (a_1, a_2)\}\) is NOT prefix-free because the first message is a prefix of the second
- That makes CBC-MAC a prefix-free secure PRF

CBC Construction: \(F_\text{CBC}(k, m)\)

XORing the all-zero IV is a no-op, so it’s excluded from the diagram.

The Length Extension Attack on CBC-MAC

CBC-MAC on its own is not a secure MAC for variable-length messages!
- Pick an arbitrary \(a_1 \in \mathcal{X}\)
- Request the tag \(t\) for the message \(m = (a_1)\)
- Let \(a_2 = a_1 \oplus t\)
- Return \(t\) as a forgery for the message \((a_1, a_2)\)
How does this work? Let’s see…
- We defined \(a_2 = a_1 \oplus t\) where \(t = F(k, a_1)\), so \(F(k, a_1) \oplus a_2 = t \oplus (a_1 \oplus t) = a_1\)
- \(F_\text{CBC}(k, (a_1, a_2)) = F(k, F(k, a_1) \oplus a_2)\)
- \(= F(k, a_1)\)
- \(= t\)
\(((a_1, a_2), t)\) is an existential forgery for CBC-MAC, so \(F_\text{CBC}\) must be insecure!
- The root cause: the extension property of CBC-MAC lets an adversary continue the chain
- CBC-MAC is only secure when all messages have the same fixed length

Extension Attack on CBC-MAC

Fixing CBC-MAC: Three Approaches

Given a prefix-free secure PRF like CBC-MAC, how do we make it fully secure?
Approach 1: Encrypted CBC-MAC (ECBC)
- Encrypt the final output with a second key: \(\text{ECBC}((k_1, k_2), m) = F(k_2, F_\text{CBC}(k_1, m))\)
- Hides the internal state; the extension property is broken
- Two independent keys are essential!
Approach 2: Prepend message length
- \(pf(m) = (\langle v \rangle, a_1, \ldots, a_v)\) where \(\langle v \rangle\) is a fixed-width encoding of the block count \(v\)
- No message can be a prefix of another if lengths differ
- Drawback: message length must be known before processing starts (no streaming)
Approach 3: CMAC
- XOR the last block with a derived subkey before the final PRF evaluation
- Different subkeys for full and partial last blocks (avoids padding collisions)
- Standardized by NIST in 2005; used in AES-CMAC
- Streaming, efficient, no extra PRF invocations, no dummy blocks

ECBC Construction: \(\text{ECBC}((k_1, k_2), m)\)

HMAC

Failure cascade and construction.

Building MACs from Hash Functions

CBC-MAC and CMAC are built from block ciphers (PRFs)
- What about building MACs from cryptographic hash functions instead?
Cryptographic hash functions like SHA-256 are keyless: \(H : \mathcal{M} \to \mathcal{T}\)
- No secret key involved, so \(H(m)\) alone is not a MAC
- Anyone could compute \(H(m)\)!
We need to incorporate a secret key somehow
- The obvious approaches all fail (in instructive ways!)
- Let’s walk through the failure cascade

Attempt 1: \(H(k \| m)\)

Prepend the key to the message and hash
This fails due to the length extension property of Merkle-Damgard hashes (a chained compression-function design used by SHA-256, which we’ll formalise shortly)
- SHA-256 processes input in blocks through a compression function
- The final internal state is the hash output
- Given \(H(k \| m)\), an attacker can continue hashing from that state
- They can compute \(H(k \| m \| \text{padding} \| m')\) without knowing \(k\)!
This is exactly the cascade extension attack from CBC-MAC, in a different setting
- The hash function’s internal chaining is exploitable

Attempt 2: \(H(m \| k)\)

Append the key to the message instead
This blocks the length extension attack: the key comes last, so an adversary can’t extend
But it’s vulnerable to collision attacks on \(H\)
- If the adversary finds \(m_0 \neq m_1\) with \(H(m_0) = H(m_1)\) (two distinct inputs with the same hash, called a collision)…
- …then \(H(m_0 \| k) = H(m_1 \| k)\) for any key \(k\)
- The adversary can forge without knowing the key!
Collision resistance of the hash function directly limits MAC security
- For broken hashes (MD5, SHA-1), this is a real problem

Attempt 3: \(H(k \| m \| k)\)

Sandwich the message between two copies of the key
This blocks both extension attacks and simple collision attacks
But the security analysis is fragile and depends on hash internals
- No clean reduction to standard hash function properties
- Difficult to prove secure in any standard model

HMAC: The Solution

HMAC uses a nested construction with two derived keys
Let \(B\) be the block size of the hash function in bytes
- If key \(k\) is longer than \(B\) bytes, hash it: \(k' \leftarrow H(k)\)
- Otherwise, pad \(k\) with zero bytes until it’s \(B\) bytes long
Derive two keys:
- Let ipad = the byte 0x36 repeated \(B\) times, opad = the byte 0x5C repeated \(B\) times
- \(k'_1 \leftarrow k' \oplus \text{ipad}\), \(k'_2 \leftarrow k' \oplus \text{opad}\)
\(\mathrm{HMAC}(k, m) = H(k'_2 \| H(k'_1 \| m))\)
- Inner hash: \(H(k'_1 \| m)\) compresses the keyed message to a fixed-size digest
- Outer hash: \(H(k'_2 \| \text{inner})\) applies a second keyed hash to the result
Why does this work?
- Length extension on the inner hash gives \(H(k'_1 \| m \| \text{ext})\), but that value gets hashed again with a different key in the outer hash
- Collision attacks on the inner hash don’t help because the outer hash re-keys

Walk through how HMAC defeats each attack from the failure cascade:

vs. length extension (Attempt 1): The inner hash \(H(k'_1 \| m)\) is technically vulnerable to extension: an attacker who knows this value can compute \(H(k'_1 \| m \| \text{padding} \| m')\). But that extended value gets fed into the outer hash with a different key \(k'_2\). The attacker can’t extend the outer hash because they don’t know \(k'_2\), and the inner extension changes the input to the outer hash in a way the attacker can’t predict or control.

vs. collision (Attempt 2): If an attacker finds \(m_0 \neq m_1\) with \(H(m_0) = H(m_1)\), they could forge \(H(m_0 \| k) = H(m_1 \| k)\). But in HMAC, the collision would need to be on \(H(k'_1 \| m_0) = H(k'_1 \| m_1)\), which is a keyed collision. The adversary can’t generate collisions on the keyed inner hash because they don’t know the key \(k'_1\).

The two layers serve different purposes: the inner hash compresses the message with a key (blocking extension from the outside), the outer hash provides a second independent keying (blocking collision transfer from the inner hash).

HMAC Structure

HMAC in Practice

HMAC is provably secure when the compression function of \(H\) is a PRF
The chosen hash is appended to the name of the MAC
- E.g. HMAC-MD5, HMAC-SHA1 and HMAC-SHA256/384 are all found in TLS 1.2
- Dropped from TLS 1.3 in favour of AEAD
HMAC’s security depends on that of the hash function selected
- Don’t throw a generic hash function into HMAC and expect good results!
HMAC works with almost any strong cryptographic hash function
- And even some broken ones!
- HMAC-MD5 has no practical attacks (but shouldn’t be used)
Tends to be much faster than CBC-MAC and other alternatives

Hash Function Properties

Collision resistance, preimage resistance, and the Random Oracle Model.

Cryptographic Hash Functions

What makes a hash function a cryptographic hash function (CHF)?
- SHA-256 is an example of a CHF
- MD5 is an example of an extremely broken CHF
Collision resistance: it should be hard to find any two distinct messages \(m_0 \neq m_1\) such that \(H(m_0) = H(m_1)\)
- Can’t find two messages that produce the same digest!
- The birthday attack finds collisions in about \(2^{n/2}\) evaluations for an \(n\)-bit hash
- This is why SHA-256 (256-bit output) provides only 128-bit collision resistance
Preimage resistance: given a digest \(t \in \mathcal{T}\), it should be hard to find any \(m \in \mathcal{M}\) such that \(H(m) = t\)
- The hash function should be irreversible
Second preimage resistance: given some \(m_0\), it should be hard to find \(m_1 \neq m_0\) such that \(H(m_0) = H(m_1)\)
- Can’t change the message without changing the hash
- Avalanche effect: even a small change to the input should drastically change the output

These Properties are Not Equivalent

For compressing hash functions, the implications form a chain:
- Collision resistance \(\Rightarrow\) 2nd-preimage resistance \(\Rightarrow\) preimage resistance
- Finding \(m_1\) given \(m_0\) with \(H(m_0) = H(m_1)\) is already a collision (CR \(\Rightarrow\) SPR)
- Inverting a hash gives a second preimage for most inputs (SPR \(\Rightarrow\) OW)
The converses do NOT hold:
- Preimage resistance does NOT imply collision resistance
- A hash can be hard to invert but easy to find collisions for
Precision matters when evaluating hash function security
- MD5 and SHA-1 have broken collision resistance
- Their preimage resistance is still considered intact (no practical attacks)
- But you still shouldn’t use them for new systems!
SHA-256 and SHA-3 have no known breaks for any of these properties
- SHA-256 is Merkle-Damgard based (vulnerable to length extension, but not broken)
- SHA-3 uses the sponge construction (no length extension vulnerability)

Walk through each implication carefully:

CR implies SPR: If you can find \(m_1\) given \(m_0\) such that \(H(m_0) = H(m_1)\) with \(m_1 \neq m_0\), then \((m_0, m_1)\) is a collision. So if finding collisions is hard, finding second preimages must also be hard. The second preimage finder is a special case of a collision finder (one that starts from a fixed \(m_0\)).

SPR implies OW: Given a target \(t\), suppose you can find any \(m\) with \(H(m) = t\). Use that ability to find second preimages: given \(m_0\), compute \(t = H(m_0)\), then find \(m\) with \(H(m) = t\). If \(m \neq m_0\), you’ve found a second preimage. This works “for most inputs” because the hash is compressing: many inputs map to each output, so a random preimage is very unlikely to be \(m_0\) itself. (This is why the implication requires a compressing hash.)

Why the converses fail: A hash could be hard to invert but easy to find collisions for. The function might have a structural pattern that makes collision pairs easy to spot, while the exponentially many preimages for each output make inverting any specific target hard. MD5 is a real-world example: collision resistance was broken in 2004 (you can find collisions in seconds), but no practical preimage attack exists.

The Random Oracle Model

Sometimes we need a stronger assumption than collision resistance
The Random Oracle Model (ROM) treats the hash function as a truly random function
- For every input, the output is uniformly random and independent
- The only way to learn \(H(m)\) is to query the oracle
This is a heuristic, not a theorem: real hash functions are deterministic algorithms
- But the ROM gives us a framework for proving security of constructions like HMAC
You’ll encounter “secure in the ROM” in Lectures 05 and 06
- RSA-FDH signatures (Lecture 06) rely on the ROM for their security proof
- It’s worth understanding what this means and what its limitations are

Authenticated Encryption

Combining secrecy and integrity.

The Composition Problem

We now have two independent tools:
- CPA-secure encryption (confidentiality, from Lectures 02 & 03)
- Secure MACs (integrity, from this lecture)
Real-world adversaries are active: they can tamper with ciphertexts in transit
- WiFi, Ethernet and Bluetooth can all be interfered with relatively easily
- CPA-secure schemes allow undetected tampering with ciphertext
- But breaking integrity can also break secrecy!
We should always combine encryption with authentication
- But the order of composition matters!

Three Compositions

Encrypt-then-MAC: encrypt \(m\), then MAC the ciphertext
- \(c \leftarrow E(k_{\text{encrypt}}, m)\), \(t \leftarrow S(k_{\text{mac}}, c)\), output \((c, t)\)
- To decrypt: check MAC first, reject if invalid, then decrypt
- Secure in general! If the MAC is secure, the adversary can’t submit valid modified ciphertexts
MAC-then-Encrypt: MAC the message, then encrypt message and tag together
- \(t \leftarrow S(k_{\text{mac}}, m)\), \(c \leftarrow E(k_{\text{encrypt}}, (m, t))\), output \(c\)
- Not generally secure! Padding attacks are devastating
- The POODLE attack against SSL 3.0 exploits exactly this weakness
- All TLS 1.0-1.2 CBC cipher suites used MAC-then-Encrypt; TLS 1.3 dropped them entirely
Encrypt-and-MAC: encrypt \(m\) and MAC \(m\) independently
- \(c \leftarrow E(k_{\text{encrypt}}, m)\), \(t \leftarrow S(k_{\text{mac}}, m)\), output \((c, t)\)
- Not generally secure! The MAC tag may leak information about \(m\)
Both keys must be independent in all cases: \(k_{\text{encrypt}} \neq k_{\text{mac}}\)

Encrypt-then-MAC

The CCA Theorem

Encrypt-then-MAC with a CPA-secure cipher and a secure MAC achieves IND-CCA2 security
- CCA2 is the strongest standard notion of encryption security
- We’ll define IND-CCA2 formally in Lecture 06 and prove this theorem in Lecture 07
The intuition: the MAC prevents the adversary from submitting modified ciphertexts
- Any modification invalidates the tag
- So the CCA2 decryption oracle is effectively neutered
- The adversary is left with only chosen-plaintext capability, which CPA handles
This is why modern protocols exclusively use authenticated encryption
- There is no good reason to use encryption without authentication

One-Time Security

Sometimes we can make do with a weaker property instead of AE security
In some contexts, like encrypted emails, we might use ephemeral keys
- And each ephemeral key might be used for only a single message
- In that case, being AE-secure is more than sufficient
We could weaken the definition to one-time AE security
- The adversary will only get a single attempt at each key in practice
- So why not make the definition of security reflect that?
Typically, we use the same CPA-secure symmetric cipher as usual
- Remember the polynomial one-time MAC? Combined with a CPA-secure cipher, it gives one-time AE
- Unforgeable if the adversary only gets a single chosen-message query

CCA Security

The good news is that an AE-secure symmetric cipher is also CCA-secure
- So long as we’re doing authenticated encryption, we’re secure against chosen-ciphertext attacks
There’s little-to-no reason to use any symmetric cipher that isn’t AE-secure
TLS 1.3 includes some gold-standard AE constructs
- AES-GCM
- AES-CCM
- ChaCha20-Poly1305
These are AEAD schemes: authenticated encryption with associated data

AEAD in Practice

Associated data and nonce discipline.

AEAD: Authenticated Encryption with Associated Data

Real-world protocols often need to authenticate metadata alongside the message
- Packet headers must be readable for routing but protected from tampering
- Metadata in secure file systems should be visible but not malleable
A nonce-based AEAD cipher \(\mathcal{E} = (E, D)\) takes four inputs:
- Key \(k \in \mathcal{K}\), message \(m \in \mathcal{M}\), associated data \(d \in \mathcal{D}\), nonce \(\mathcal{n} \in \mathcal{N}\)
Encryption: \(c = E(k, m, d, \mathcal{n})\)
Decryption: \(m = D(k, c, d, \mathcal{n})\), or reject if integrity check fails
The authentication tag covers both the ciphertext and the associated data
- Tampering with either one is detected on decryption
If \(d\) is empty, behaves like ordinary authenticated encryption
If \(m\) is empty, behaves like a MAC over the associated data

AES-GCM

Galois Counter Mode (GCM) is the most widely deployed AEAD cipher
- Standardized by NIST in 2007; used in TLS 1.3, IPsec, and many other protocols
GCM follows encrypt-then-MAC internally:
- Encryption: AES in counter mode (CTR)
- Authentication: a polynomial MAC (GHASH) applied to the ciphertext and associated data
Hardware support makes AES-GCM extremely fast on modern processors
- AES-NI for encryption, PCLMULQDQ for Galois field multiplication

ChaCha20-Poly1305

The alternative when AES hardware support is unavailable
- ChaCha20 is a stream cipher (from Lecture 02); Poly1305 is a polynomial one-time MAC (recall the construction from earlier)
- Combined as an AEAD cipher in RFC 7539
Both AES-GCM and ChaCha20-Poly1305 are available in TLS 1.3
- Either is a good choice for the EPIC project’s secure messaging component

Nonce Discipline: The Operational Cliff Edge

AEAD schemes require a unique nonce for every encryption under the same key
- The nonce doesn’t have to be secret, just unique
- Can be transmitted in the clear alongside the ciphertext
- TLS 1.3 derives a per-connection IV and XORs it with the record sequence number, guaranteeing uniqueness without randomness
Nonce reuse in AES-GCM is catastrophic
- Two messages encrypted with the same key and nonce: plaintexts are leaked via XOR
- The authentication key \(k_m\) is exposed, breaking ciphertext integrity entirely
- Not just the two affected messages: all messages under that key are compromised
AES-GCM-SIV provides some nonce misuse resistance at the cost of performance
- Still leaks whether two plaintexts are equal under nonce reuse
- But doesn’t catastrophically break authentication
GCM uses a 32-bit internal block counter, limiting each key to \(2^{32}\) encrypted blocks
- Beyond this, the authentication guarantee degrades
- Once the limit is approached, keys must be rotated

Conclusion

What did we learn?

Where do we go from here?

We’ve solved the integrity problem that CTR mode and CPA-secure ciphers left open
- Message authentication via MAC systems (UF-CMA security)
- Building MACs from PRFs (with a full security reduction)
- Building MACs from hash functions (HMAC and the failure cascade)
- Combining encryption and authentication (authenticated encryption)
But every symmetric primitive assumes a pre-shared key
- How did Alice and Bob get the key in the first place?
- Kerckhoffs’s principle means the key is the only secret
- Key establishment is the existential problem of symmetric cryptography
Next lecture: public key cryptography and key exchange

For next time…

“Knowledge cutoff” for the midterm is now!
- If you can explain/teach the material, then you understand it
- Attend, review and revise the material to get maximum benefit
Bloom’s taxonomy is a good guide for learning and exam prep
- Remember, Understand, Apply, Analyze, Evaluate, and Create
No formally assigned reading for Week 5, focus on the midterm!
But if you’d like more detail, then look at the relevant sections of Crypto 101
Chapters 6, 7 & 8 of A Graduate Course in Applied Cryptography
- Obviously much more detailed and proof-heavy than today’s lecture
- About 120 pages, so skim for value-add sections
- But reading proofs can help intuitive understanding. Up to you!

Questions?

Ask now, catch me after class, or email eoin@eoin.ai