For those of you who don't know, timing attacks are an example of side-channel cryptanalysis: cryptanalysis using additional information about the inner workings of the cryptographic algorithm. I wrote about them here.
What's the big idea here?
There are two ways to look at a cryptographic primitive: block cipher, digital signature function, whatever. The first is as a chunk of math. The second is a physical (or software) implementation of that math.
Traditionally, cryptanalysis has been directed solely against the math. Differential and linear cryptanalysis are good examples of this: high-powered mathematical tools that can be used to break different block ciphers.
On the other hand, timing attacks, power analysis, and fault analysis all makes assumptions about implementation, and uses additional information garnered from attacking those implementations. Failure analysis assumes a one-bit feedback from the implementation -- was the message successfully decrypted -- in order to break the underlying cryptographic primitive. Timing attacks assumes that an attacker knows how long a particular encryption operation takes.
The author suggests that the algorithm is at fault for making it difficult for implementors to make fast-but-constant-time implementations, then they list a few algorithms that don't have this limitation. They mention Helix, but not Blowfish, so how good/bad is Blowfish in this area? (Blowfish was Mr. Schneier's entry into the AES competition wasn't it?)
One of the author's opening statements is "This attack should be blamed on the AES design, not on the particular AES library used by the server;..." This seems like a somewhat harsh statement: is AES a poor design because it is susceptible to this attack? As a previous poster asked, how do other ciphers such as the AES finalists (Twofish, Serpent, MARS, RC6) fare in this area?
In addition, the author makes comparisons between AES and Helix. Since Helix is a stream cipher, is this an 'apples & oranges' situation? The original Helix paper remarked that AES vs. Helix is a somewhat unfair comparison.
Twofish uses S-boxes; higher-performance implementations use larger S-boxes (which I suspect, though I haven't thought about it hard) are more vulnerable to the sort of attacks Bernstein describes. I don't think Twofish would have done any better than Rijndael against attacks like this.
The paper does mention that all 15 AES candidates could be vulnerable to this attack.
Maybe we need to have crypto co-processors as standard (as floating point co-pros are, though now built into the CPU). As part of the CPU would be ideal. Give the crypto co-pro dedicated cache, and let the hardware designers learn from crypto co-processors in the smart card world.
Crypto co-processors in the smart card world work in a potentially very hostile environment where the attacker can control power, timing signal ...
Can anyone explain why a table lookup T[i] leaks information about the index i ?! Isn't it constant? Ok, it may not be constant for values fetched from the main memory, but in that case the time should be one of either quick (cache) or slow (memory) ?!
Timing attacks are a just one of the TEMPEST tricks that have been known about for many years....
The first Timing attack that I know of was against a UK One Time Pad machine used to superencrypt trafic to the UK. The US where snooping on the line, and could by examining the pulse widths tell which way the relays in the unit used for the XOR function where going (open or closed)...
All physical realisations of encryption systems are subject to TEMPEST attacks and they are extreamly difficult to gaurd against. The reason for this is that it is an Energy/Bandwidth trade off. To ensure no inadvertant information escapes the communications bandwidth has to be so low that it is usually not practical for all but the most secure systems.
Oh one thing to note, TEMPEST is two way, ie even if you prevent all harmfull emmisions, you also have to protect against energy being sent into the unit which then can carry secure information out. Oh and by energy it is not just limited to electrical/Radio Frequency, but mechanical (sound/vibration) etc (I guess even gravity as well ;)
However from what has been said in the paper, this particular attack could have been easily predicted and dealt with in the first round of the AES selection process.
The author’s whole attack hinges on the fact that fetch time is affected by the caching done from prior memory fetches.
The caching is what induces the time variation for fetches. Or am I missing something here? It is the compilation steps (from C or assembler code) and more importantly the optimization algorithms (in the CPU or in the compiler) which create software implementations in native machine code where table look-ups become time-variable.
If caching and optimizations can be disabled (for the table-lookup) and the addressing calculations can be forced into a uniform base (e.g. all address calculation done in 32-bits or 64-bits), then the table-lookup performance would be slower, but time-constant.
The slower, but time-constant table look-up would still be orders of magnitude faster than performing the GF(2^8) mathematics ab initio for each use of an S-Box. The time-constant lookups would thwart this attack.
Would this implementation change (essentially software de-optimization) work?
If you have questions on this, please read the referenced document; it is very well written. Specifically, most of the questions above are answered in it.
This paper appears to describe not merely an attack on AES, but on a large group of alogrithms. This paper is not primarily written about the fact that timing attacks are an issue, but how to solve the problems.
This once again shows how hard getting security right actually is. This paper is basicly saying that all the experts who evaluated those NIST AES candidates missed this one and as the result complexity and performance problems have to be added to AES implementations to get around it (however, I don't agree that the CPU would the right place for any fixes and I think we generally have enough bloat in kernels already).
I believe that disabling the L1 cache, as Josh Washburn suggests, can only be done by the kernel, as it requires accessing the CPU configuration registers. (My reading is that the cache is the primary effect, not software table lookup optimizations.)
Taking an 11-cycle hit to the L2 cache, or an even larger hit to main memory, would start to make calculating the S-box look attractive, particularly if the values can be calculated in parallel.
Actually, disabling L1 or even L2 might not be enough, because DRAM paging might also reveal information...
I don't think its a crisis: you can actually use the performance counters to your advantage: Forget all the fancy bits about avoiding issues, just use the performance counter to timecheck the encryption cycle to ensure that it always takes the same "time", and just buisywait if you finish early. True, it makes it worst case, but you colud assume most-caching and most-hit, so you cut off at say, 4 sigma above average, and if it is >4 sigma above average, oh well, thats a minor leak which gets averaged out over encrypting a larger block of data anwyay.
Ari, what would you suggest? According to this paper, due to the problem of interrupts, you need place the alogrithm in the kernel to keep constant time lookups (ignoring the interrupt time itself), or you need to modify the CPU (see section 13). Are you suggesting, for example, using only CPUs that have constant lookup times regardless of the address? This seems less generally useful than putting the alogrithm in the kernel to me.
"This once again shows how hard getting security right actually is. This paper is basicly saying that all the experts who evaluated those NIST AES candidates missed this one and as the result complexity and performance problems have to be added to AES implementations to get around it (however, I don't agree that the CPU would the right place for any fixes and I think we generally have enough bloat in kernels already)."
Almost. We didn't miss side-channel cryptanalysis when we looked at AES candidates. We talked about them explicitly in our Twofish design, for example.
The problem is that side-channel attacks are practical against pretty much anything, so it didn't really enter into consideration.
Having had a little time to think about the implications (and read the paper a second time) The answer is an unpalatable,
"You cannot use a general purpose shared CPU for OnLine crypto activities"
If you read the paper in section 13 DJB mentions similar work caried out by Osvik and Tromer with regard to hyperthreading. The question is what other technologies could leak information about the plaintext or keytext via dependent timing.
If you think back to your old computing course you where taught about a couple of basic CPU architectures (that originated in the 1950s).
Modern CPUs bare little resemblance to these architectures now except in their very core. The need for speed and the limitations of physics have ment that all sorts of technologies have been developed (caching, hyperthreding, out of sequence execution, TTA...) to meet this need. I suspect that all of these technologies can be exploited in one way or another to perform timing attacks (either activly or pasivly).
I guess if you want to do OnLine crypto you realy should use an "atomic" process such as a crypto processor.
An atomic process might be possible with DJBs "constant time" implementations providing they can lock the CPU but I suspect that there will always be an un accounted for technology getting in the way...
So the choice boils down to,
1, Use a Crypto processor 2, Do your crypto off line.
Oh and by OffLine I mean with no other users process running on the machine that kind of takes you back to option 1 any way...
If you remember back a little while Bruce bloged about work that enabled a laptop or PC to be tracked by the CPU clock drift.
This relied on the little known TCP timing information used for flow control.
DJBs example code used a time stamp function to make things easier to demonstrate, in real life you would not normally get a time stamp of that from the high level code (which might account for the comment from the Rijndael designers the DJB quoted).
However it may be possible to combine the information from the TCP time stamp to get the same effect...
The way I read the paper is that DJB tells us that:
(1) Fast implementations are hard to make constant-time. This is due to the fact that implementing the S-boxes in AES is efficient using a table lookup.
(2) Avoiding memory latency side-effects on a modern processor with multi-level memory systems, SDRAMs with pages etc are very hard.
This means that to even try to get to (1) we will have to sacrifice performance, and due to (2) we might not succeed anyway.
One thought I had is to do basically the reverse of what DJB does. That is just prior to performing a cipher operation (for example - something using table-lookups) we record the timestamp. After the operation completes we check the time again and keep track of the max time recorded. Operations that completes faster than the max time are delayed before returning to the user/caller with the result.
Yes, this will also eat performance, but it would at least mitigate the problem. We should be able to measure and do this time adjustment on a similar resolution as DJBs attack.
Another thing: I first thought that having the contents of the S-boxes being constant (as they are in AES and DES/3DES) was an important factor, but it seems that the index is the important factor. Blowfish and Twofish have been mentioned, but how about poor RC4. Here the S-box is permuted based on the key, but we still have a 256 Bye array. Also RC4 basically consists of 3 mem accesses so one could at least get the idea that the timing attack would be easier to observe. Mabye...
No, what I did try to suggest was a scheme where the cipher SW continously monitors the time it takes for itself to perform the operation. (This means storage of the time).
The first MAX value is the time it takes to perform the first operation (with key k1 and data d1).
If the next operation (for example with key k2 and data d2) takes shorter time to complete than MAX1, the operation does not return to the user until MAX time have expired. If the operation takes longer time than MAX1, we now have a new MAX time, MAX2.
This implies that after n operations, the MAX time to complete an operation would be the worst case possible for the operation on the given HW-platform.
Yes, keeping the value between key (and data) changes was what I had i mind. And you should probably have a good (i.e. conservative) guesstimate for the initial value of MAX too.
What we are talking about is a control loop for the MAX value. If it turns out that all values are lower than MAX you may reduce the limit, but any value above pushes MAX up.
To conclude: I see this as a way to mask execution time variance (which the side-effect utilizes). But this method will have an impact on the performance.
And the way I read DJBs paper: Unless you have some sort of HW-support, S-box/table based cipher primitives, any high speed/high performance implementation in SW running on a modern CPU will be hard pressed not to have variation in execution time. Agreed?
And all methods of fixing the current implementations will have some impact on the performande. Agreed?
Now, the question is: How big a problem is this in real life? Even though DJB built a lab setup with server and performed the attack remote, what is the likelihood of doing this on a real server? Or for that matter as a normal user on a server?
Not that DJBs paper is extremo-kewl and impressivs. And he rightly points out that an algorithm, however sound it seems in the paper needs to be implemented in a way that doesn't make it weak in terms of security. And next-gen cipher primitives (including hash functions) should be evaluated also in terms of resistance to side-channel attacks such as the one described in DJBs paper.
So... should we be worried or not? From an admitted novice, the paper looks interesting, but doesn't really seem like it presents a very practical remote attack. Even if it were, given the large number of samples needed to recover a single key, wouldn't it be easy to defeat by using some form of periodic key change?
» crypto timing attacks from Michael Silk's Blog Great paper: [cr.yp.to] Targetted at the AES (Rijndael) it provides code for an actual attack that reveals the key based only through interactions with the server. [Read More]
Why not simply add a small amount of random time instead of trying to find the max? Yes, it also has an impact on performance (hopefully less than the max), but would it not make timing based attacks quite a bit more difficult?
Since this attack is a function of the machine and not the algorithm itself, it seems to me that a relatively straightforward fix is to randomize the machine. Specifically the S box lookup need only be randomized so that the timing for each S box is unknown to the observer. Then instead of looking up S[j], one instead looks up S[R[j]] where R is an array of random numbers. Prior to the calculation S[j] is mapped into S[R[j]] so that the same algorithm is executed, but executed against a random array of S-boxes. One would reinitialize this array before every z set of blocks where z might be as small as 1. To me, that solution is easier than trying to add random time, but instead just randomizes the state of the machine without effecting the resulting algorithm. Of course, in order for this to work, the random number generator has to be random or at worst a PRNG seeded randomly.
And of course, if this attack is really of concern, then the NIST/AES committee would have to standardize the method so that everyone could continue to have confidence in "AES"
In order to look up S[R[j]], you need to look up R[j] first, so the timing attack would work exactly as before. (Of course, the S lookups would then add some random noise, so you'd need to analyze a larger sample.) On the other hand, I can't see any obvious reason why your idea couldn't work if R were computed on the fly every time.
I know I am coming into this conversation late and many people have long since moved on. I just wanted to comment that Serpent doesn't seem to me to suffer from this attack (at least in it's most efficient implementation). Dag Arne Osvik had a very novel approach that was meant to minimize the number of registers needed by the CPU to implement the sboxes. His code converts them to a constant order set of operations such as xor, and, or, etc. These do not look up data in lookup tables that could suffer from caching. Also.. I think it should be pointed out the practical side of this attack. In SSH for example.. if memory serves.. the key is changed 1/hr or 1/1gb.. so it is unlikely that the volume of data needed for this attack would use the same key.. thus.. it would not provide enough data to crack the key. I think that many applications of AES would not use the same key for such a long period of time.
The non-linear timing of array lookups is certainly a very clever and insightful observation. However, from a brief read of DJB's paper, it appears to me that it is necessary to have an extremely controlled "Lab" situation for this attack to work.
When dealing with computers in the wild, running hundreds of independent tasks at the same time with constant interruptions, what is cached in memory at any given time would be altered by uncontrolled interactions with other programs competing for the same resources. In addition to which there is the consideration of the variability of the time required to obtain the data that is being encrypted (e.g. disk IO), which his method ignores. For this attack to work, data generation time would also have to be very consistent. (His demo assumes essentially zero time for data generation).
DJB says the answer to uncontrolled program interactions is to average the timings over a greater number of samples. But under ideal conditions his method already requires a very large number of samples; as we diverge from the "ideal" towards the "real", at some point the number of samples required approaches infinity. And I strongly suspect that point occurs pretty quickly.
Consider a busy web server with hundreds of non-synchronized events (encrypted page views and database lookups) occurring. How would you ever get useful predictability from that type of chaotic real-world environment? The minute variations in memory cache timing are swamped by the huge variations in data generation time.
Paradoxically if you have a computer that is dedicated to the task of secure encryption it's level of vulnerability goes up.
His method requires that you measure the position of an ant's antenna while it walks around on the back of an elephant; yes, it can be done, as long as the elephant holds reasonably still, but is it feasible when the elephant goes for a walk? :-)
What DJB clearly demonstrates is that there is a compelling need to change the encryption key frequently.