Condensed Unpredictability

We consider the task of deriving a key with high HILL entropy from an unpredictable source. Previous to this work, the only known way to transform unpredictability into a key that was $\eps$ indistinguishable from having min-entropy was via pseudorandomness, for example by Goldreich-Levin (GL) hardcore bits. This approach has the inherent limitation that from a source with $k$ bits of unpredictability entropy one can derive a key of length (and thus HILL entropy) at most $k-2\log(1/\epsilon)$ bits. In many settings, e.g. when dealing with biometric data, such a $2\log(1/\epsilon)$ bit entropy loss in not an option. Our main technical contribution is a theorem that states that in the high entropy regime, unpredictability implies HILL entropy. The loss in circuit size in this argument is exponential in the entropy gap $d$. To overcome the above restriction, we investigate if it's possible to first"condense"unpredictability entropy and make the entropy gap small. We show that any source with $k$ bits of unpredictability can be condensed into a source of length $k$ with $k-3$ bits of unpredictability entropy. Our condenser simply"abuses"the GL construction and derives a $k$ bit key from a source with $k$ bits of unpredicatibily. The original GL theorem implies nothing when extracting that many bits, but we show that in this regime, GL still behaves like a"condenser"for unpredictability. This result comes with two caveats (1) the loss in circuit size is exponential in $k$ and (2) we require that the source we start with has \emph{no} HILL entropy (equivalently, one can efficiently check if a guess is correct). We leave it as an intriguing open problem to overcome these restrictions or to prove they're inherent.

derive a "good" key K = h(X, S) from X by means of some efficient key-derivation function h, possibly using public randomness S.
In practice, one often uses a cryptographic hash function like SHA3 as the key derivation function h(.) [Kra10, DGH + 04], and then simply assumes that h(.) behaves like a random oracle [BR93].
In this paper we continue the investigation of key-derivation with provable security guarantees, where we don't make any computational assumption about h(.). This problem is fairly well understood for sources X|Z that have high min-entropy (we'll formally define all the entropy notions used in 2 below), or are computationally indistinguishable from having so (in this case, we say X|Z has high HILL entropy). In the case where X|Z has k bits of min-entropy, we can either use a strong extractor to derive a k − 2 log −1 key that is -close to uniform, or a condenser to get a k bit key which is -close to a variable with k − log log −1 bits of min-entropy. Using extractors/condensers like this also works for HILL entropy, except that now we only get computational guarantees (pseudorandom/high HILL entropy) on the derived key.
Often one has to derive a key from a source X|Z which has no HILL entropy at all. The weakest assumption we can make on X|Z for any kind of key-derivation to be possible, is that X is hard to predict given Z. This has been formalized in [HLR07a] by saying that X|Z has k bits of unpredictability entropy, denoted H unp s (X|Z) k, if no circuit of size s can predict X given Z with advantage 2 −k (to be more general, we allow an additional parameter δ 0, and H unp δ,s (X|Z) k holds if (X, Z) is δ-close to some distribution (Y, Z) with H unp s (Y |Z) k). We will also consider a more restricted notion, where we say that X|Z has k bits of list-unpredictability entropy, denoted H * unp s (X|Z) k, if it has k bits of unpredictability entropy relative to an oracle Eq which can be used to verify the correct guess (Eq outputs 1 on input X, and 0 otherwise). 1 We'll discuss this notion in more detail below. For now, let us just mention that for the important special case where it's easy to verify if a guess for X is correct (say, because we condition on Z = f (X) for some one-way function 2 f ), the oracle Eq does not help, and thus unpredictability and listunpredictability coincide. The results proven in this paper imply that from a source X|Z with k bits of list-unpredictability entropy, it's possible to extract a k bit key with k − 3 bits of HILL entropy Let S ∈ {0, 1} n×k be uniformly random and K = X T S ∈ {0, 1} k , then the unpredictability entropy of K is H unp s/2 2k poly(m,n),γ (K|Z, S) k − 3 (2) and the HILL entropy of K is with 3 t = s · 7 2 2k poly(m,n) . 1 We chose this name as having access to Eq is equivalent to being allowed to output a list of guesses. This is very similar to the well known concept of list-decoding.
2 To be precise, this only holds for injective one-way functions. One can generalise list-unpredictability and let Eq output 1 on some set X , and the adversary wins if she outputs any X ∈ X . Our results (in particular Theorem 1) also hold for this more general notion, which captures general one-way functions by letting X = f −1 (f (X)) be the set of all preimages of Z = f (X). 3 We denote with poly(m, n) some fixed polynomial in (n, m), but it can denote different polynomial throughout the paper. In particular, the poly here is not the same as in (2) as it hides several extra terms.
Proposition 1 follows from two results we prove in this paper. First, in Section 4 we prove Theorem 1 which shows how to "abuse" Goldreich-Levin hardcore bits by generating a k bit key K = X T S from a source X|Z with k bits of list-unpredictability. The Goldreich-Levin theorem [GL89] implies nothing about the pseudorandomness of K|(Z, S) when extracting that many bits. Instead, we prove that GL is a good "condenser" for unpredictability entropy: if X|Z has k bits of list-unpredictability entropy, then K|(Z, S) has k − 3 bits of unpredictability entropy (note that we start with list-unpredictability, but only end up with "normal" unpredictability entropy). This result is used in the first step in Proposition 1, showing that (1) implies (2).
Second, in Section 5 we prove our main result, Theorem 2 which states that any source X|Z which has |X|−d bits of unpredictability entropy, has the same amount of HILL entropy (technically, we show that it implies the same amount of metric entropy against deterministic real-valued distinguishers. This notion implies the same amount of HILL entropy as shown by Barak et al. [BSW03]). The security loss in this argument is exponential in the entropy gap d. Thus, if d is very large, this argument is useless, but if we first condense unpredictability as just explained, we have a gap of only d = 3. This result is used in the second step in Proposition 1, showing that (2) implies (3). In the two sections below we discuss two shortcomings of Theorem 1 which we hope can be overcome in future work. 4 1.0.1 On the dependency on 2 k in Theorem 1.
As outlined above, our first result is Theorem 1, which shows how to condense a source with k bits of list-unpredictability into a k bit key having k − 3 bits of unpredictability entropy. The loss in circuit size is 2 2k poly(m, n), and it's not clear if the dependency on 2 k is necessary here, or if one can replace the dependency on 2 k with a dependency on poly( −1 ) at the price of an extra term in the distinguishing advantage. In many settings log( −1 ) is in the order of k, in which case the above difference is not too important. This is for example the case when considering a k bit key for a symmetric primitive like a block-cipher, where one typically assumes the hardness of the cipher to be exponential in the key-length (and thus, if we want to be in the same order, we have log( −1 ) = Θ(k)). In other settings, k can be superlinear in log( −1 ), e.g., if the the high entropy string is used to generate an RSA key.
Our Theorem 1 shows how to condense a source where X|Z has k bits of list-unpredictability entropy into a k bit string with k − 3 bits unpredictability entropy. It's an open question to which extent it's necessary to assume list-unpredictability here, maybe "normal" unpredictability is already sufficient? Note that list-unpredictability is a lower bound for unpredictability as one always can ignore the Eq oracle, i.e., H unp ,s (X|Z) H * unp ,s (X|Z), and in general, list-unpredictability can be much smaller than unpredictability entropy. 5 4 After announcing this result at a workshop, we learned that Colin Jia Zheng proved a weaker version of this result. Theorem 4.18 in this PhD thesis, which is available via http://dash.harvard.edu/handle/1/11745716 also states that k bits of unpredictability imply k bits of HILL entropy. Like in our case, the loss in circuit size in his proof is polynomial in −1 , but it's also exponential in n (the length of X), whereas our loss is only exponential in the entropy gap ∆ = n − k. 5 E.g., let X by uniform over {0, 1} n and Z arbitrary, but independent of X, then for s = exp(n) we have H unp s (X|Z) = n but H * unp s (X|Z) = 0 as we can simply invoke Eq on all {0, 1} n until X is found.
Interestingly, we can derive a k bit key with almost k bits of HILL entropy from a source X|Z which k bits unpredictability entropy H unp ,s (X|Z) k in two extreme cases, namely, if either 1. if X|Z has basically no HILL entropy (even against small circuits).
2. or when X|Z has (almost) k bits of (high quality) HILL entropy.
In case 1. we observe that if H HILL ,t (X|Z) ≈ 0 for some t s, or equivalently, given Z we can efficiently distinguish X from any X = X, then the Eq oracle used in the definition of listunpredictability can be efficiently emulated, which means it's redundant, and thus X|Z has the same amount of list-unpredictability and unpredictability entropy, H unp s, (X|Z) ≈ H * unp s , (X|Z) for ( , s ) ≈ ( , s). Thus, we can use Theorem 1 to derive a k bit key with k − O(1) bits of HILL entropy in this case. In case 2., we can simply use any condenser for min-entropy to get a key with HILL entropy k − log log −1 (cf. Figure 2). As condensing almost all the unpredictability entropy into HILL entropy is possible in the two extreme cases where X|Z has either no or a lot of HILL entropy, it seems conceivable that it's also possible in all the in-between cases (i.e., without making any additional assumptions about X|Z at all).

GL vs. Condensing.
Let us stress as this point that, because of the two issues discussed above, our result does not always allow to generate more bits with high HILL entropy than just using the Goldreich-Levin theorem. Assuming k bits of unpredictability we get k − 3 of HILL, whereas GL will only give k − 2 log(1/ ). But as currently our reduction has a quantitatively larger loss in circuit size than the GL theorem, in order to get HILL entropy of the same quality (i.e., secure against (s, δ) adversaries for some fixed (s, δ)) we must consider the unpredictability entropy of the source X|Z against more powerful adversaries than if we're about to use GL. And in general, the amount of unpredictability (or any other computational) entropy of X|Z can decrease as we consider more powerful adversaries.

Entropy Notions
In this section we formally define the different entropy notions considered in this paper. We denote with D . We use X ∼ ,s Y to denote computational indistinguishability of variables X and Y , formally 6 X ∼ Y denotes that X and Y have statistical distance , i.e., X ∼ ,∞ Y , and with X ∼ Y we denote that they're identically distributed. With U n we denote the uniform distribution over {0, 1} n . For a pair (X, Z) of random variables, the average min-entropy of X conditioned on Z is HILL entropy is a computational variant of min-entropy, where X (conditioned on Z) has k bits of HILL entropy, if it cannot be distinguished from some Y that (conditioned on Z) has k bits of min-entropy, formally Let (X, Z) be a joint distribution of random variables. Then X has conditional HILL entropy Barak, Sahaltiel and Wigderson [BSW03] define the notion of metric entropy, which is defined like HILL, but the quantifiers are exchanged. That is, instead of asking for a single distribution (Y, Z) that fools all distinguishers, we only ask that for every distinguisher D, there exists such a distribution. For reasons discussed in Section 2.0.4, in the definition below we make the class of distinguishers considered explicit. Like HILL entropy, also unpredictability entropy, which we'll define next, can be seen as a computational variant of min-entropy. Here we don't require indistinguishability as for HILL entropy, but only that the variable is hard to predict.

Definition 4 ( [HLR07a]
). X has unpredictability entropy k conditioned on Z, denoted by where no probabilistic circuit of size s can predict Y given Z with probability better than 2 −k , i.e., We also define a notion called "list-unpredictability", denoted H * unp ,s (X|Z) ≥ k, which holds if H unp ,s (X|Z) ≥ k as in (5), but where C additionally gets oracle access to a function Eq(.) which outputs 1 on input y and 0 otherwise. So, C can efficiently test if some candidate guess for y is correct. 7 7 We name this notion "list-unpredictability" as we get the same notion when instead of giving C oracle access to Eq(.), we allow C(z) to output a list of guesses for y, not just one value, and require that Pr (y,z)←(Y,Z) [y ∈ C(z)] 2 −k . This notion is inspired by the well known notion of list-decoding.
Remark 1 (The parameter). The parameter in the definition above is not really necessary, following [HLR07b], we added it so we can have a "smooth" notion, which is easier to compare to HILL or smooth min-entropy. If = 0, we'll simply omit it, then the definition simplifies to Let us also mention that unpredictability entropy is only interesting if the conditional part Z is not empty as (already for s that is linear in the length of X) we have H unp s (X) = H ∞ (X) which can be seen by considering the circuit C (that gets no input as Z is empty) which simply outputs the constant x maximizing Pr[X = x].

Metric vs. HILL.
We will use a lemma which states that deterministic real-valued metric entropy implies the same amount of HILL entropy (albeit, with some loss in quality). This lemma has been proven by [BSW03] for the unconditional case, i.e., when Z in the lemma below is empty, it has been observed by [FR12,CKLR11] that the proof also holds in the conditional case as stated below (X|Y ) k − log(δ −1 ), i.e., we must allow for a δ > 0 loss in distinguishing advantage, and this will at the same time result in a loss of log(δ −1 ) in the amount of entropy. For this reason, it is crucial that in Theorem 2 we show that unpredictability entropy implies deterministic real-valued metric entropy, so we can then apply Lemma 1 to get the same amount of HILL entropy. Dealing with real-valued distinguishers is the main source of technical difficulty in the proof of the Theorem 2, proving the analogous statement for deterministic boolean distinguishers is much simpler.

Known Results on Provably Secure Key-Derivation
We say that a cryptographic scheme has security α, if no adversary (from some class of adversaries like all polynomial size circuits) can win some security game with advantage α if the scheme is instantiated with a uniformly random string. 9 Below we will distinguish between unpredictability applications, where the advantage bounds the probability of winning some security game (a typical example are digital signature schemes, where the game captures the existential unforgeability under chosen message attacks), and indistinguishability applications, where the advantage bounds the distinguishing advantage from some ideal object (a typical example is the security definition of pseudorandom generators or functions).

Key-Derivation from Min-Entropy
Strong Extractors. Let (X, Z) be a source where H ∞ (X|Z) k, or equivalently, no adversary can guess X given Z with probability better than 2 −k (cf. Def. 1). Consider the case where we want to derive a key K = h(X, S) that is statistically close to uniform given (Z, S). For example, X could be some physical source (like statistics from keystrokes) from which we want to generate almost uniform randomness. Here Z models potential side-information the adversary might have on X. This setting is very well understood, and such a key can be derived using a strong extractor as defined below.
k we can extract a key K = Ext(X, S) of length k − 2 log(1/ ) that is close to uniform [HILL99]. The entropy gap 2 log(1/ ) is optimal by the so called "RT-bound" [RTS00], even if we assume the source is efficiently samplable [DPW14].
If instead of using a uniform bit key for an α secure scheme, we use a key that is close to uniform, the scheme will still be at least β = α + secure. In order to get security β that is of the same order as α, we thus must set ≈ α. When the available amount k of min-entropy is small, for example when dealing with biometric data [DORS08, BDK + 05], a loss of 2 log(1/ ) bits (that's 160 bits for a typical security level = 2 −80 ) is often unacceptable.
Condensers. The above bound is basically tight for many indistinguishability applications like pseudorandom generators or pseudorandom functions. 10 Fortunately, for many applications a close to uniform key is not necessary, and a key |K| with min-entropy |K|−∆ for some small ∆ is basically as good as a uniform one. This is the case for all unpredictability applications, which includes OWFs, digital-signatures and MACs. 11 It's not hard to show that if the scheme is α secure with a uniform key it remains at least β = α2 ∆ secure (against the same class of attackers) if instantiated with any key K that has |K| − ∆ bits of min-entropy. 12 Thus, for unpredictability applications we don't have to extract an almost uniform key, but "condensing" X into a key with |K| − ∆ bits of min-entropy for some small ∆ is enough.
[DPW14] show that a (log + 1)-wise independent hash function Cond : 1} is a condenser with the following parameters. For any (X, Z) where H ∞ (X|Z) , for a random seed S (used to sample a (log + 1)-wise independent hash function), the distribution − log log(1/ ). Using such an bit key (condensed from a source with bits min-entropy) for an unpredictability application that is α secure (when using a uniform bit key), we get security β α2 log log(1/ ) + , which setting = α gives β α(1 + log(1/α)) security, thus, security degrades only by a logarithmic factor.

Key-Derivation from Computational Entropy
The bounds discussed in this section are summarised in Figures 1 and 2 in Appendix A. The last row of Figure 2 is the new result proven in this paper.
HILL Entropy. As already discussed in the introduction, often we want to derive a key from a distribution (X, Z) where there's no "real" min-entropy at all H ∞ (X|Z) = 0. This is for example the case when Z is the transcript (that can be observed by an adversary) of a key-exchange protocol like Diffie-Hellman, where the agreed value X = g ab is determined by the transcript Z = (g a , g b ) [Kra10,GKR04]. Another setting where this can be the case is in the context of side-channel attacks, where the leakage Z from a device can completely determine its internal state X.
If X|Z has k bits of HILL entropy, i.e., is computationally indistinguishable from having minentropy k (cf. Def. 2) we can derive keys exactly as described above assuming X|Z had k bits of min-entropy. In particular, if X|Z has |K| + 2 log(1/ ) bits of HILL entropy for some negligible , we can derive a key K that is pseudorandom, and if X|Z has |K|+log log(1/ ) bits of HILL entropy, we can derive a key that is almost as good as a uniform one for any unpredictability application.
Unpredictability Entropy. Clearly, the minimal assumption we must make on a distribution (X, Z) ∈ {0, 1} n × {0, 1} m for any key derivation to be possible at all is that X is hard to compute given Z, that is, X|Z must have some unpredictability entropy as in Definition 4. Goldreich and Levin [GL89] show how to generate pseudorandom bits from such a source. In particular, the Goldreich-Levin theorem implies that if X|Z has at least 2 log −1 bits of list-unpredictability, then the inner product R T X of X with a random vector R is indistinguishable from uniformly random (the loss in circuit size is poly(n, m)/ 4 ). Using the chain rule for unpredictability entropy, 13 we can generate an = k − 2 log −1 bit long pseudorandom string that is indistinguishable (the extra factor comes from taking the union bound over all bits) from uniform.
Thus, we can turn k bits of list-unpredictability into k − 2 log −1 bits of pseudorandom bits (and thus also that much HILL entropy) with quality roughly . The question whether it's possible to generate significantly more than k − 2 log −1 of HILL entropy from a source with k bits of (list-)unpredictability seems to have never been addressed in the literature before. The reason might be that one usually is interested in generating pseudorandom bits (not just HILL entropy), and for this, the 2 log −1 entropy loss is inherent. The observation that for many applications high HILL entropy is basically as good as pseudorandomness is more recent, and recently gained attention by its usefulness in the context of leakage-resilient cryptography [DP08, DY13].
In this paper we prove that it's in fact possible to turn almost all list-unpredictability into HILL entropy.

Condensing Unpredictability
Below we state Theorem 1 whose proof is in Appendix B, but first, let us give some intuition. Let X|Z have k bits of list-unpredictability, and assume we start extracting Goldreich-Levin hardcore bits A 1 , A 2 , . . . by taking inner products A i = R T i X for random R i . The first extracted bits A 1 , A 2 , . . . will be pseudorandom (given the R i and Z), but with every extracted bit, the listunpredictability can also decrease by one bit. As the GL theorem requires at least 2 log −1 bits of list-unpredictability to extract an secure pseudorandom bit, we must stop after k − 2 log −1 bits. In particular, the more we extract, the worse the pseudorandomness of the extracted string becomes. Unlike the original GL theorem, in our Theorem 1 we only argue about the unpredictability of the extracted string, and unpredictability entropy has the nice property that it can never decrease, i.e., predicting A 1 , . . . , A i+1 is always at least as hard as predicting A 1 , . . . , A i . Thus, despite the fact that once i approaches k it becomes easier and easier to predict A i (given A 1 , . . . , A i−1 , Z and the R i 's) 14 this hardness will still add up to k − O(1) bits of unpredictability entropy.
The proof is by contradiction, we assume that A 1 , . . . , A k can be predicted with advantage 2 −k+3 (i.e., does not have k − 3 bits of unpredictability), and then use such a predictor to predict X with advantage > 2 −k , contradicting the k bit list-unpredictability of X|Z.
If A 1 , . . . , A k can be predicted as above, then there must be an index j s.t. A j can be predicted with good probability conditioned on A 1 , . . . , A j−1 being correctly predicted. We then can use the Goldreich-Levin theorem, which tells us how to find X given such a predictor. Unfortunately, j can be close to k, and to apply the GL theorem, we first need to find the right values for A 1 , . . . , A j−1 on which we condition, and also can only use the predictor's guess for A j if it was correct on the first j − 1 bits. We have no better strategy for this than trying all possible values, and this is the reason why the loss in circuit size in Theorem 1 depends on 2 k .
In our proof, instead of using the Goldreich-Levin theorem, we will actually use a more finegrained variant due to Hast which allows to distinguish between errors and erasures (i.e., cases where we know that we don't have any good guess. As outlined above, this will be the case whenever the predictor's guess for the first j − 1 inner products was wrong, and thus we can't assume anything about the jth guess being correct). This will give a much better quantitative bound than what seems possible using GL.

High Unpredictability implies Metric Entropy
In this section we state our main results, showing that k bits of unpredictability entropy imply the same amount of HILL entropy, with a loss exponential in the "entropy gap". The proof is in Appendix C.
Theorem 2 (Unpredictability Entropy Implies HILL Entropy). For any distribution (X, Z) over then, with ∆ = n − k denoting the entropy gap, X|Z has (real valued, deterministic) metric entropy By Lemma 1 this further implies that X|Z has, for any δ > 0, HILL entropy

A Figures
Deriving a (pseudo)random key of length |K| = k − 2 log −1 from a source (X, Z) ∈ {0, 1} n × {0, 1} m where X|Z has k bits (min/HILL/list-unpredictability) entropy Entropy Entropy quantity and Derive key K of Quality of derived key type quality of source length k − 2 log −1 as H * unp δ,s (X|Z) = k K = GL(X, S) = S T X = m + δ s = s · 4 /poly(m, n) Figure 1: Bounds on deriving a (pseudo)random key K of length |K| = k − 2 log −1 bit from a source X|Z with k bits of min, HILL or list-unpredictability entropy. Ext is a strong extractor (e.g. leftover hashing), and GL denotes the Goldreich-Levin construction, which for X ∈ {0, 1} n and S ∈ {0, 1} n×|K| is simply defined as GL(X, S) = S T X. Leftover hashing requires a seed of length |S| = 2n (extractors with a much shorter seed |S| = O(log n+log −1 ) that extract k−2 log −1 −O(1) bits also exist), whereas Goldreich-Levin requires a longer |S| = |K|n bit seed. The above bound for HILL entropy even holds if X|Z only has k bits of probabilistic boolean metric entropy (a notion implying the same amount of HILL entropy, albeit with a loss in circuit size), as shown in Theorem 2.

of [FR12]
Deriving k bit key K with high HILL entropy from X|Z with k bits (min/HILL/list-unpredictability) entropy Entropy Entropy quantity and Derive key of Quantity and quality of HILL entropy of K type quality of soucre length |K| = k as H * unp δ,s (X|Z) = k K = GL(X, S) = S T X = + δ s = s · 7 /2 2k poly(m, n) ∆ = 3 Figure 2: Bounds on deriving a key of length k with min (or HILL) entropy k−∆ from a source X|Z with k bits of min, HILL or unpredictability entropy. Cond denotes a (log + 1) wise independent hash function, which is shown to be a good condenser (as stated in the table) for min-entropy in [DPW14]. The bounds for HILL entropy follow directly from the bound for min-entropy. The last row follows from the results in this paper as stated in Proposition 1.

B Proof of Theorem 1
We will use the following theorem due Hast [Has03] on decoding Hadamard code with errors and erasures.

Theorem 3 ( [Has03]).
There is an algorithm LD that, on input l and n and with oracle access to a binary Hadamard code of x (where |x| = n) with an e-fraction of errors and an s-fraction of erasures, can output a list of 2 l elements in time O(nl2 l ) asking n2 l oracle queries such that the probability that x is contained in the list is at least 0.8 if l log 2 (20n(e + c)/(c − e) 2 + 1), where c = 1 − s − e (the fraction of the correct answers from the oracle).
We'll often consider sequences v 1 , v 2 , . . . of values and will use the notation v b a to denote (v a , . . . , v b ), of Theorem 1. It's sufficient to prove the theorem for = 0, the general case 0 then follows directly by the definition of unpredictability entropy. To prove the theorem we'll prove its contraposition The left-hand side of (8) means there exists a circuit A of size |A| t such that It will be convenient to assume that A initially flips a coin b, and if b = 0 outputs a uniformly random guess. This loses at most a factor 2 in A's advantage, i.e., but now we can assume that for any z, r and w ∈ {0, 1} k Using Markov eq.(10) gives us Note that by eq.(12), (z, x) ← (Z, X) is good with probability 2 −k+∆−2 . We will use A to construct a new circuit B of size s = O(t2 2k poly(n)) where Which with (14) and (12) further gives contradicting the right-hand side of (8), and thus proving the theorem. We'll now construct B satisfying (14), for this, consider any good (x, z). Let R = R k = (R 1 , . . . , R k ) be uniformly random and let Thus, here exists an i s.t., i 2 −k+∆−2 k = 1 2 + δ with δ ≈ ∆−2 k · ln(2) 2 . We fix this i (we don't know which i is good, and later will simply try all of them). Then We call r i−1 good if (note that by the previous equation a random r i−1 is good with probability δ/2).
From now on, we fix some good r i−1 and assume we know a i−1 = r i−1 .x (later we'll simply try all possible choices for a i−1 ). We define a predictor P i (r i ) that tries to predict r i .x given a random r i (and also knows z, r i−1 , a i−1 as above) as follows consists of the fixed r i−1 , the input r i and the randomly sampled r k i+1 .
Using (11), which implies Pr[Â i−1 = a i−1 ] 2 −i , and (17) we can lower bound P i 's rate and advantage as In terms of Theorem 3, we have a binary Hadamard code with e + c = Pr[Â i−1 = a i−1 ], c − e = δ · Pr[Â i−1 = a i−1 ], which implies that (e + c)/(c − e) 2 2 i δ 2 . Now Theorem 3 implies that given such a predictor P we can output a list that contains x with probability > 0.8 in time O(2 i poly(m, n)) = O(2 k poly(m, n)), as we assume access to an oracle Eq with outputs 1 on input x and 0 otherwise, we can find x in this list with the same probability.
Using this, we can now construct an algorithm as claimed in (14) as follows: B will sample i ∈ {1, . . . , k} and then r i−1 at random. Then B calls P i with all possible a i−1 ∈ {0, 1} i−1 . We note that with probability δ/2k (we lose a factor k for the guess of i, and δ/2 is the probability of sampling a good r i−1 ) the predictor P i will satisfy (18).
If x is not found, B repeats the above process, but stops if x is not found after 2k/δ iterations. The success probability of B is ≈ (1 − 1/e)0.8 > 0.5 as claimed, the overall running time we get is O (2 2k poly(m, n)).

C Proof of Theorem 2
It's sufficient to prove the theorem for γ = 0, the case γ > 0 then follows directly by definition of unpredictability entropy. Suppose for the sake of contradiction that (7) We will show how to construct an efficient algorithm that given Z uses D to predict X with probability at least 2 −k , contradicting (6). The core of the algorithm is the procedure Predictor described below.
Function Predictor(z, D , )  (19)), we know that x being the correct guess for X is positively correlated with the value D(x, Z). The probability that Predictor(Z, D, ) returns some particular value x as guess for X will be linear in D(x, Z).
Predictor(Z, D, ) may also output ⊥, which means it failed to sample an x according to this distribution. The probability of outputting ⊥ goes exponentially fast to 0 as grows.
A toy example: predicting X when Z is empty and D is boolean.
for all Y such that H ∞ (Y ) k. And assume that D(.) is boolean (not real valued as in our theorem). Then Predictor(∅, D, ) will output a guess for X that (if it's not ⊥) is a random value x satisfying D(x) = 1. The probability that this guess for X is correct equals ED(X)/|D| where |D| = x D(x). Consider now the distribution Y of min-entropy k that maximizes ED(Y ). We can assume that Y is flat and supported on those 2 k elements x for which the value D(x) is the biggest possible. Observe that since ED(X) − ED(Y ) > 0, we have ED(Y ) < 1 and since D is boolean, the support of Y contains all the elements x satisfying D(x) = 1. Therefore we obtain ED(Y ) = 2 −k |D|. Now we can estimate the predicting probability from below as follows: The above probability holds for = ∞, i.e., when predictor never outputs ⊥. For efficiency reasons, we must use a finite, and not too big . The predictor will output ⊥ with probability (1 − 2 −n |D|) and thus With a little bit of effort one can prove that setting = 1 + 2 n−k / ≈ 2 ∆ / yields the success probability 2 −k independently of |D|.

Proof in general case -important issues
Unfortunately, what we have proven above cannot be generalized easily to the case considered in the theorem, there are two obstacles. First, in the theorem we consider a conditional distribution X|Z (i.e., the conditional part Z is not empty as above). Unfortunately we cannot simply make the above argument separately for all possible choices Z = z of the conditional part, as we cannot guarantee that the conditional advantages (z) = ED(X|Z = z, z) − ED(Y |Z = z, z) are all positive; we only know that their average = E z←Z (z) is positive. Second, so far we assumed that D is boolean. This would only prove the theorem where the derived entropy in (7) is against deterministic boolean distinguishers, and this is not enough to conclude that we have the same amount of HILL entropy as discussed in Section 2.0.4.
Actual proof -preliminaries For real-valued distinguishers in the conditional case, just invoking Predictor(Z, D, ) on a D satisfying (19), will not give a predictor for X with advantage > 2 −k in general. Instead, we first have to transform D into a new distingusiher D that has the same distinguishing advantage, and for which we can prove that the predictor will work. The way in which we modify D depends on the distribution Y |Z that minimizes the left-hand side of (19). This distribution can be characterized as follows: The distribution Y |Z = Y * |Z satisfying H ∞ (Y * |Z) = k is optimal for (20) if and only if there exist real numbers t(z) and a number λ 0 such that for every z Proof. The proof is a straightforward application of the Kuhn-Tucker conditions given in Appendix.
Remark 2. The characterization can be illustrated in an easy and elegant way. First, it says that the area under the graph of D(x, z) and above the threshold t(z) is the same, no matter what z is (see Figure 3).  Note that because of "freedom" in defining the distribution on elements x satisfying D(x, z) = t(z) (2, point (b)), there could be many distributions Y * |Z corresponding to fixed numbers λ and t(z) that satisfy the characterization above, and this way are optimal to (20) with k = H ∞ (Y * |Z). For the sake of completeness we characterize bellow the all possible values of k that match to λ and t(z). We note that this fact might be used to modify our nonuniform guessing algorithm into a uniform one.
Predicting given the thresholds t(z). We use the numbers t(z) to modify D and then we call the procedure Predictor on the modified distinguisher. Lemma 3 below shows that we could efficiently predict X from Z, assuming we knew the numbers t(z) for all z in the support of Z (later, we'll show how to efficiently approximate them) Lemma 3. Let Y * |Z be the distribution satisfying H ∞ (Y * |Z) = k and maximizing ED(Y, Z) over H ∞ (Y |Z) k, where k < n and D satisfies (19). Let t(z) be as in Lemma 2. Define and set = 2 · 2 n−k −1 in the algorithm Predictor. Then we have Proof. We start by calculating the probability on the left-hand side of(23) Claim 1. For any 16 D , the algorithm Predictor outputs X given Z = z with probability where U is uniform over {0, 1} n and g is defined by g(d) of Claim. It is easy to observe that In turn, for every round i = 1, . . . , of the execution, the probability that Predictor stops and outputs x is equal to Pr[U = x ]D (x , z)/2 = 2 −n−1 D (x , z), the probability that it outputs anything (and thus leaves the while loop) is thus . So the probability of not leaving the while loop for rounds (in this case the output is ⊥) is 16 We will only use the claim for the distinguisher D as constructed above, but the claim holds in general.
Now we can see why we cannot apply the algorithm Predictor using the distinguisher D satisfying only (19) directly. According to the last formula, the success probability would be an averaged sum of products g(ED(U, z)) · ED(X|Z = z, z) over z. We know the average of the second factors of these products, but in general cannot compare the values of ED(U, z) for different z's. The crucial observation is that the distinguisher D we defined satisfies the same inequality (19) as D (though, D has the range [0, 2] not [0, 1] as D). Moreover D has a special form which allows us to simplify expression (23). The details are given in the next two claims The proof of (a) follows now by taking the average over z. The proof of (b) follows by observing that D satisfies the characterization in (2) with t(z) = 0 for all z.
Proof. Lemma 2 implies x D (x, z) = λ for every z. We can define λ = 2 −n λ and then it remains to show λ < 2 n and λ > 0. Observe that the case t(z) < 0 in Lemma 2 is possible if and only if P Y * |Z=z (x) = max x P Y * |Z=z (x ) for all x, which means H ∞ (Y * |Z = z) = n. Since k < n, we have t(z) 0 for at least one z and then λ = x max(D(x, z) − t(z), 0) x D(x, z) which essentially means λ 2 n . Lemma 2 guarantees that λ 0 , therefore we need to show that λ ∈ {0, 2 n }. Observe that if λ = 0 then the condition x D (x, z) = λ implies D (x, z) = 0 for all x and z, contradicting to Claim 2 because > 0. In turn, if λ = 2 n then from Lemma 2 we get D(·, z) ≡ 1 and t(z) = 0 for all z such that t(z) 0. This is possible only if P Y * |Z=z (x) = max x P Y * |Z=z (x ) for all x which means H ∞ (Y * |Z = z) = n if t(z) 0. But then H ∞ (Y * |Z = z) = n for all z which contradicts k < n.
To calculate the success probability we need one more observation. The following claim shows that support of D is contained in the support of Y * .
Claim 4. For every z we have Claim 2 applied to Y = Y * yields now the following estimate Observe that Claim 4, Claim 3, and H ∞ (Y * |Z) = k imply Plugging this into (31) we get the following bound Pr Predictor(Z, D , ) = X 2 −n−1 g(λ /2) · 2 n−k λ + To give a lower bound on the success probability it remains to minimize the last expression over λ ∈ (0, 1). This is answered below of Claim. The proof uses standard calculus and is given in the appendix.
Computing t(z) from λ So far, we have shown how to construct the predicting algorithm provided that we are given the numbers t(z). Now we will prove that one can compute them approximately and use successfully in place of the original ones. We start with a few useful facts about the auxiliary function g already introduced in Claim 1 in the proof of Lemma 3. Below we summarize its fundamental properties. The entire solution is based on the next two lemmas. The first lemma is based on the intuition that replacing D by a distinguisher which approximates it close enough should not affect the success probability of Predictor(Z, D, ) very much. For technical reasons we present this statement assuming one-sided L 1 -approximation. The second lemma describes an efficient algorithm which obtains λ as a hint on its input and computes approximations for t(z) from below, for every z.
of Lemma. The idea is pretty simple: given t we approximate values E max(D(U )−t , 0) by sampling and by comparing the result with λ, we can find the right value of t using binary search. This corresponds to finding a blue line on Fig. 4 such that the green area above is sufficiently close to λ.

E Proof of Corollary1
of Corollary. Let y max (z) = max x P Y |Z=z (x ). Consider the function In particular these conditions are satisfied for δ = 0. Suppose now that there are z i and x i for i = 1, 2 such that 0 < P Y * |Z=z i (x i ) < max x P Y * |Z=z (x ). Define δ by δ = min y max (z 1 ) − 1 # {x : D (x, z 1 ) t(z 1 )} , 1 # {x : D (x, z 2 ) > t(z 2 )} − y max (z 2 ) By Lemma 2 we immediately obtain that δ 0. It follows easily from the definition of δ that the number −δ satisfies (44) with z = z 1 and that δ satisfies (44) for z = z 2 . We can see now that if we replace the distribution Y * |Z = z 1 by f −δ z 1 and the distribution Y * |Z = z 2 by f δ z 2 then we obtain the distribution Y |Z satisfying conditions in Lemma 2 and H ∞ (Y |Z) = k. Finally, observe that δ = 1 #{x:D (x,z 2 )>t(z 2 )} − y max (z 2 ) means that the distribution Y |Z = z 2 is uniform on {x : D (x.z 2 ) > t(z 2 )}. In turn, if δ = y max (z 1 ) − 1 #{x:D (x,z 1 ) t(z 1 )} then the distribution Y |Z = z 1 is uniform on {x : D (x, z 1 ) t(z 1 )}.

F Proof of Claim 5, Lemma 3
Proof. We check that lim s→0 h(s) = a and thus the function h is continuous on the interval [0, 1]. This means that h attains its minimum at some point s = s 0 . There is nothing to prove if s 0 ∈ {0, 1}. Suppose that s 0 ∈ (0, 1). Then we must have ∂h ∂s s=s 0 = 0. The first derivative of the function h is given by the following formula Note that the last expression is increasing with respect to and that from the assumption we have > 1+a a+s 0 . Using this we obtain h(s 0 ) (a + s 0 )(1 + a) a(1 − s 0 ) + (1 + a)s 0 = 1 + a which completes the proof.
The lemma follows now immediately by combining (33) and the last claim.