Intel Branch Predictors reverse engineered

An interesting article - apparently UCSD researchers have managed to completely reconstruct the workings of the branch predictors in a number of Intel CPUs, and come up with a way to generate code that makes leaking data impossible with minimal (<5%) performance overhead. The underlying paper is here https://cseweb.ucsd.edu/~tullsen/halfandhalf.pdf

Modern processors include a branch predictor in the front end of the pipeline to predict the outcome of these conditional branches, allowing the processor to continue to execute at full speed without stalling until the outcome of the branch is known, much later in the pipeline.

In all modern processors, though, the branch predictor is shared between all executing threads and processes, leading to critical security vulnerabilities. These can be used by an attacker to observe the outcome of any branch in a targeted program—and learn about the data that controls the branch. Even worse, certain Spectre attacks start with the attacker inserting data into the branch predictor so that they can learn private information that is stored in their memory.

Despite the fact that no prior work had fully deciphered these predictors, even those introduced more than 12 years ago, the researchers were able to fully reverse engineer the structure, sizes, and lookup functions of these predictors. The Intel branch predictor is made up of four tables, and the hash functions used to look up the predictions are composed from an enormous amount of data, using information gleaned from the last (up to) 194 taken branches. As a result, these lookup functions are extremely complex, making the most surprising result of this analysis the discovery that the branch predictor can be split into two parts simply by varying a single bit of the branch address.

“With a small change in how we generate code we can now run two threads together on the same processor core, and it is impossible to leak data through the branch predictor, or to induce mispredicts to launch a Spectre attack,” said Hosein Yavarzadeh, a UC San Diego Computer Science and Engineering (CSE) PhD student and lead author of the paper.

“This result was actually quite a surprise, in part because the indexing functions for these tables are so complex, and it seemed astronomically unlikely that a single bit would be isolated in such a way that it is not combined with any other bits” said CSE Professor Dean Tullsen. “We would have been excited to find out that Intel planned to do something like this intentionally for future processors. It was shocking to find out that this feature is already in every major processor from Intel going back 10-plus years, with no-one knowing about it, or for the very few that did, not realizing the implications for security.”

Purdue University professor and CSE alumnus Kazem Taram said that the conditional branch predictor is arguably the most challenging component of the branch predictor to reverse-engineer, and so the details of it have been a mystery for both security and performance researchers. “With Half&Half, now we have a clearer picture of the functionality and detailed structure of branch predictors that were kept secret for decades,” he said.

Previously, the best-known software techniques for providing conditional branch isolation between threads incurred 50 to 100 percent performance overhead. With the Half&Half approach, the same protection against security leakage through the branch predictor can be achieved with a simple change to compiler code generation that costs 2 to 5 percent performance.

2 Likes

Sigh! That’s an hour that I’ll never get back. This is not a criticism of the work, which is brilliant. It’s just not the way I do it. The way to guarantee that the other SMT thread cannot play with your code. Is to always use cmovXX. In general, you can replace all conditional moves with cmovcc sometimes followed by an unconditional jump. It looks like you are adding “unnecessary” instructions, but I find that using cmov allows the CPU to put more instructions in one clock cycle.

1 Like

Why not publish a paper? Think of all the silicon groupies you’d get!

3 Likes