New Spectre attacks and retpoline

Wccftech has an article "… I can’t tell whether the author got this wrong or just the headline author. They also have an article on the new issues with Intel processors. Similar issues:… What gives?

First, this should be entirely a “dusty deck” software issue. Most compilers switched to automatically generating a “retpoline” (return instruction followed by a trampoline) if necessary when the called procedure or function has a variable-length parameter or puts local variables of non-constant size on the stack. (Cases, where a function returns a string or other variable-length objects, are best dealt with by either a separate return stack or treating the value as allocated by the caller.) You almost certainly didn’t want to know any of this.

This was dealt with in Ada about thirty years ago. C (not C++) at that time returned all variable-length parameters by reference. But multi-language compiler back-ends like GCC would use retpolines in some C code anyway. A retpoline was never wrong, it could just be slower if the trampoline section wasn’t necessary.

Are there still old compilers or (very) old code around? Sure. Should various tools (such as lint) find these cases? Sure, I had one tool I ran against all of the ACVC* tests on the Honeywell DPS6 line to look for cases we might have missed. Today you want your run-time (actually load-time) security tools to find these vulnerabilities. They are just as bad as a virus in code you just copied.

  • I didn’t run it against tests that were required to be rejected for other reasons.

AMD CPUs See Less Than 10% Performance Drop From Revised Spectre-v2 Mitigations…

AMD CPUs See Less Than 10% Performance Drop From Revised Spectre-v2 Mitigations

Much closer to getting the story right. Forcing JMP/LFENCE works in many cases, but there are others that require retpolines anyway. (C string returns don’t, Ada strings and PL/I Char * varying returns do.)

What if the compiler always generates retpoline (return and trampoline) even in cases where JMP/LFENCE is faster? Yes, code which has laddered function returns (one function returns to another return, etc.) can be sped up a lot by just doing the JMP/LFENCE. However, much faster, and what compiler writers tend to do is fold the call and return out of existence.

I normally don’t do examples, but one here will help. Say you have:

function Foo (X,Y: Integer) return Integer is begin if X > Y return X else return Y; end if; end Foo;

A good compiler will fold that function into every place it is called, and will only provide code for the actual function if you turn off optimization for debugging reasons. Obviously, if the function is part of some privileged code, it should only get folded into other privileged code. But compiling privileged (i.e. operating system) code and non-privileged code as part of the same compilation is hairy juju only done by networking experts. (I started to say operating system experts. But they tend to eliminate such code. The networking gurus are the ones that need to unpack things, then make the result owned by some user.)