Aliasing Explained (Part 1)
March 26, 2017Since my job involves a lot of HPC stuff and performance-obsessed code, one of the first topics I had to dive into was aliasing in the C standard. I found out that it is a subtle topic, quite obscure and very often overlooked even by the most experienced programmers. This post is an attempt to clarify the concept of aliasing, why should we care, how it impacts the code generated by the compiler and how can we master it with no hassle.
So what is an alias? Let’s try to clarify this a bit:
One pointer (lvalue expression) is said to alias another pointer when both refer to the same location or object
In practice, an alias involves referring to the same physical entity (an object stored in memory) through multiple handlers (pointers):
int i;
int *ptr1 = &i;
int *ptr2 = &i;
In this trivial example, ptr1
and ptr2
are both aliases of each other since
they are pointing to the same object (the integer i
).
Just a quick note: even if we are talking about pointers and C-ish stuff, this concept is valid for all programming languages: a lot of recent languages use pointers (Go, for example) but in C this issue is a lot more critical since we can do arithmetics with them, moving pointers around in memory as we like.
Why should I care?
Got it, this is the question. Why should I care about aliasing at all?
Short answer: if you are writing code that isn’t supposed to be pushed to its limits in terms of performance, you can ignore aliasing at all and be happy.
Long answer: while working on code among whose targets had performance, I found myself several times blaming the compiler for not having recognized obvious optimizations thus generating what I thought was crappy assembly code, translated in a way that was too pedantic for a modern piece of software. After diving into the obscure topic of aliasing, it obviously turned out that the problem was me ignoring a quite large slice of the C language.
So, if you want to dive into this, let’s start with a trivial example:
void foo (float *out_vector_a, float *out_vector_b, int *in_vector, int n)
{
for(int i=0; i<n; ++i) {
out_vector_a[i] = in_vector[i];
out_vector_b[i] = in_vector[i];
}
}
In this snippet we are doing a simple thing: copying an input memory chunk
(starting at address in_vector
), one integer at a time, into two output memory
chunks (starting at addresses out_vector_a
and out_vector_b
). Pretty
straghtforward, uh? How an harmless snippet like this could be source of
hassles? Let’s dive into the x86 assembly code generated by gcc
(disassembled
through objdump
):
Don’t let this pile of assembly code scare you, just get the big picture and focus on the instructions that carry out the actual copies:
out_vector_a[i] = in_vector[i];
10: 66 0f ef c0 pxor xmm0,xmm0
14: f3 0f 2a 04 82 cvtsi2ss xmm0,DWORD PTR [rdx+rax*4]
19: f3 0f 11 04 87 movss DWORD PTR [rdi+rax*4],xmm0
out_vector_b[i] = in_vector[i];
1e: 66 0f ef c0 pxor xmm0,xmm0
22: f3 0f 2a 04 82 cvtsi2ss xmm0,DWORD PTR [rdx+rax*4]
27: f3 0f 11 04 86 movss DWORD PTR [rsi+rax*4],xmm0
In an attempt to render these instructions in a human friendly way, we can decode them like this:
out_vector_a[i] = in_vector[i];
10: cleanup vector register xmm0
14: cast int -> float and load memory location in_vector[i] into xmm0
19: store xmm0 into memory location out_vector_a[i]
out_vector_b[i] = in_vector[i];
1e: cleanup vector register xmm0
22: cast int -> float and load memory location in_vector[i] into xmm0
27: store xmm0 into memory location out_vector_b[i]
There are no doubts about the first three instructions (10
, 14
and 19
), we
are actually spending cycles to carry out useful work copying the value
contained in memory at in_vector[i]
in the corresponding output memory at
out_vector_a[i]
using the register xmm0
as a buffer. But what about the
following instructions? We are carrying out exactly the same work again. Take
a look at instruction 22
: it’s the same instruction as 14
. Exactly the
same. That cvtsi2ss
tells the CPU to load the content
of the same memory location as the previous one ([rdx+rax*4]
) in the same
vector register (xmm0
).
Why are we loading the same data twice? Why the compiler isn’t just recycling the value he loaded just two instructions earlier loading it from memory again instead?
That is because during the previous store instruction (14
) the CPU wrote in a
location inside the boundaries of array out_vector_a
and in order to
guarantee semantic correctness, the compiler must assume that the first store
could have ended up writing in the same area as in_vector
, in other words that
out_vector_a
could be an alias for in_vector
. In order to avoid issues,
it forces the CPU to reload the same data from memory again, so if the location
has been written during the previous instruction, its updated content would be
correctly retrieved in its updated state.
This conservative behavoiur guarantees correctness even in the situation where the two output memory chunks are actually the same but, of course, in the majority of situations where the output memory chunks are not overlapped, performing a load twice means that we are halving our own memory bandwidth and wasting a lot of clock cycles. How can we help the compiler in doing a better job?
Aliasing rules
In the ISO C standard document the committee precisely states how we can access an object or, in other words, what is the type of the pointer that can legally refer to an object:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
ISO/IEC 9899:201x, Section 6.5
Don’t worry about the definition of compatible types, it turns out to be
exactly what you are thinking of (except for struct
and union
, but we can
ignore that for now):
Two types have compatible type if their types are the same.
ISO/IEC 9899:201x, Section 6.2.7
So, following the rules, we can have pointers that are aliases in a lot of different legal ways. For example, these pointers could be valid aliases of each other:
int *a;
unsigned int *b;
char *c;
typedef union {
char c;
float f;
} Mask;
Mask *d;
…but these couldn’t:
long *a;
double *b;
Note that a char *
is always a potential alias since char
is the minimum
allocation unit in C: all type sizes are specified in char
units and char
pointers are how we swipe raw memory (copying chunks regardless of the
underlying object type, for example).
Just a note about terminology: in the ISO C world, strict aliasing, ANSI aliasing and type-based aliasing are exactly the same thing, they all refer to the same concept (actually they are aliases…sorry for the pun :).
Exploiting the law
So, what about our previous example? We were dealing with two pointers of
incompatible types (int *
and float *
), so our output arrays (out_vector_a
and out_vector_b
) and the input array (in_vector
) aren’t legal aliases.
Anyway the compiler treated them as they were, that is because gcc
(and almost
all the compilers I’ve dealt with) works in a conservative way and assumes that
the programmer doesn’t care about aliasing rules and it tries to not to insert
very subtle bugs. Let’s try to tell gcc
to abide by the standard with
a proper flag (-fstrict-aliasing
). This time I’ve left out the full assembly
file (find it here), give a look to the
interesting lines instead, the ones carrying out the actual copy:
out_vector_a[i] = in_vector[i];
10: 66 0f ef c0 pxor xmm0,xmm0
14: f3 0f 2a 04 82 cvtsi2ss xmm0,DWORD PTR [rdx+rax*4]
19: f3 0f 11 04 87 movss DWORD PTR [rdi+rax*4],xmm0
out_vector_b[i] = in_vector[i];
1e: f3 0f 11 04 86 movss DWORD PTR [rsi+rax*4],xmm0
23: 48 83 c0 01 add rax,0x1
This time the second load disappeared. Again, let’s try to decode the meaning of the assembly:
out_vector_a[i] = in_vector[i];
10: cleanup vector register xmm0
14: cast int -> float and load memory location in_vector[i] into xmm0
19: store xmm0 into memory location out_vector_a[i]
out_vector_b[i] = in_vector[i];
1e: store xmm0 into memory location out_vector_b[i]
23: increment loop counter
This is nice: under our own guarantee that we know what we are doing, the compiler loads the memory location once and for the second array it just retrieves the same value from the buffer register.
We’ve just doubled the memory bandwidth available for our copy function.
But what happens when the types of our output and input arrays are compatible? Let’s give it a try:
void foo (float *out_vector_a, float *out_vector_b, float *in_vector, int n)
{
for(int i=0; i<n; ++i) {
out_vector_a[i] = in_vector[i];
out_vector_b[i] = in_vector[i];
}
}
Again, just focus on the instructions carrying out the copy (full assembly here):
out_vector_a[i] = in_vector[i];
10: f3 0f 10 04 82 movss xmm0,DWORD PTR [rdx+rax*4]
15: f3 0f 11 04 87 movss DWORD PTR [rdi+rax*4],xmm0
out_vector_b[i] = in_vector[i];
1a: f3 0f 10 04 82 movss xmm0,DWORD PTR [rdx+rax*4]
1f: f3 0f 11 04 86 movss DWORD PTR [rsi+rax*4],xmm0
24: 48 83 c0 01 add rax,0x1
Apart from tiny differences due to the fact that we are dealing with float
only (no more casts, the instruction that loads from memory is now movss
),
we have fallen into the same double load situation:
out_vector_a[i] = in_vector[i];
10: load memory location in_vector[i] into vector register xmm0
15: store xmm0 into memory location out_vector_a[i]
out_vector_b[i] = in_vector[i];
1a: load memory location in_vector[i] into vector register xmm0
1f: store xmm0 into memory location out_vector_b[i]
24: increment loop counter
Even if we ask the compiler to follow the standard aliasing rules, the use of
compatible types leads to the come back of the extra instruction. What we are
seeing here is that for the standard rules, both output arrays and input array
are pointers of type float
(compatible by definition since it’s exactly the
same type) so the compiler must assume that they can be aliases to each other
referring to the same object in memory. How can we escape this ditch?
The restrict
qualifier
Luckily the standard provides us a tool to tell the compiler under our own responsibility that even if two pointers could be legal aliases, we are guaranteeing that they aren’t:
void foo (float * restrict out_vector_a, float * restrict out_vector_b,
float * restrict in_vector, int n)
{
for(int i=0; i<n; ++i) {
out_vector_a[i] = in_vector[i];
out_vector_b[i] = in_vector[i];
}
}
We’ve just took the same code as in the previous example and added the
restrict
qualifier to all pointers. Compiling it with the same gcc
options
as before, we end up with the following assembly (you can find the full listing
here):
out_vector_a[i] = in_vector[i];
10: f3 0f 10 04 82 movss xmm0,DWORD PTR [rdx+rax*4]
15: f3 0f 11 04 87 movss DWORD PTR [rdi+rax*4],xmm0
out_vector_b[i] = in_vector[i];
1a: f3 0f 11 04 86 movss DWORD PTR [rsi+rax*4],xmm0
2f: 48 83 c0 01 add rax,0x1
Even this time we are able to let the compiler drop the second load
instruction, the only load from memory we have here is at instruction 10
.
This time we used the restrict
qualifier to tell the
compiler that a restricted pointer has no aliases at all. Of course the
standard states precisely the meaning of the restrict
qualifier:
An object that is accessed through a
restrict
-qualified pointer has a special association with that pointer. This association […] requires that all accesses to that object use, directly or indirectly, the value of that particular pointer.ISO/IEC 9899:201x, Section 6.7.3
In other words, a restrict
pointer is the only legal way to access the object
it points to in memory. This kind of special association is the same that for
example links a newly allocated memory chunk to its pointer: since it is
brand new, no other already existing pointer can be an alias for it.
For example, a statement that assigns a value returned by
malloc
to a single pointer establishes this association between the allocated object and the pointer.ISO/IEC 9899:201x, Section 6.7.3
Note that restrict
hasn’t been designed as a mean to change the code’s
behaviour but to allow additional optimizations (actually, compilers are allowed
to completely ignore it):
The intended use of the
restrict
qualifier […] is to promote optimization, and deleting all instances of the qualifier from all preprocessing translation units composing a conforming program does not change its meaning (i.e., observable behavior).ISO/IEC 9899:201x, Section 6.7.3
Restricted ownership (there can be only one)
Let’s consider the following function definitions:
void bar(int * restrict p, int * restrict q, int n)
{
while (n-- > 0) {
*p++ = *q++;
}
}
void callbar(void)
{
extern int d[100];
bar(d + 50, d, 50); // Valid!
bar(d + 1 , d, 50); // Undefined behaviour!
}
In this case we are seeing two different situations:
- during the first call, even if the memory belongs to the same
d
array, the two pointers are swiping two completely disjoint areas,d[50] .. d[99]
throughp
andd[0] .. d[49]
throughq
. Due to the fact thatrestrict
ensures unique accesses to objects (the underlyingint
elements in this case), each array element is going to be accessed through one and only one pointer: this code is perfectly legal leading to defined behaviour; - during the second call,
p
accessesint
objects in the ranged[1] .. d[50]
whileq
in the ranged[0] .. d[49]
: all thed
array elements (except ford[0]
andd[50]
) are going to be accessed through both pointers. This call breaks therestrict
contract leading to undefined behaviour.
This example highlights the fact that restrict
compliance of accesses doesn’t
care about timing. For instance, bar
iterations during the latter call access
d
in the sequence:
p = d[1]
,q = d[0];
p = d[2]
,q = d[1];
p = d[3]
,q = d[2];
- …and so on…
The two restricted pointers never access the same element at the same time but, despite this, we end up with an undefined behaviour anyway: it’s like a first touch ownership claim, the first pointer that touches an element estabilish the kind of special association with the object the standard talks about.
If inside a block an object is going to be accessed at any point in time through a restricted pointer, then that pointer becomes the one and only allowed to access that particular object.
There is an exception to this rule though, and it’s about read-only accesses. Take a look to this example:
void foo(int* restrict p, int* restrict q, int* restrict r, int n)
{
for (int i = 0; i < n; ++i) {
p[i] = q[i] + r[i];
}
}
void callfoo(int n)
{
int a[n];
int b[n];
foo(a, b, b, n); // Valid!
}
In this case, we are accessing elements of array b
through two restricted
pointers (arguments q
and r
) at the same time but, unlike in the previous
example, now we are doing read-only accesses since we are writing elements
belonging to the (non overlapping) array a
only: in this situation, what we are
getting is a completely well defined behaviour. So, we can now reformulate the
restricted ownership rule we saw in the previous example taking into account
this aspect as well:
If inside a block an object is going to be written at any point in time through a restricted pointer, then that pointer becomes the one and only allowed to access (both reading and writing) that particular object.
Assignment between restrict
pointers
The standard is so harsh about restrict
that even producing a pointer
alias to a restricted one (inside the same block) is undefined behaviour:
int * restrict q;
int * restrict p = q; // UNDEFINED BEHAVIOUR!!!
Note that the rule limiting assignments between restricted pointers does not distinguish between a function call and an equivalent nested block. However, there is one exception to the rule: only outer-to-inner assignments between restricted pointers declared in nested blocks have defined behavior:
{
int * restrict foo;
int * restrict bar;
foo = bar; // undefined behaviour
{
int * restrict foo_inner = foo; // OK: outer-to-inner
int * restrict bar_inner = bar; // OK: outer-to-inner
foo = bar_inner; // undefined behaviour
bar_inner = foo_inner; // undefined behaviour
}
}
What happens in other languages
Since we are talking about references and objects in memory, aliasing is a concept common to all programming languages. In the modern ones, this issue is often ignored (preventing any optimization, obviously) or just carefully handled by the language definition itself (Rust for example does it in a brilliant way). But what happens in some classic languages?
Aliasing in C++
Of course C++
is a different language than C
, but it inherited a lot of
rules from the latter. Aliasing is no exception, the rules are the same with
some additions: the concept of compatible types is extended to compatible
dynamic types.
class A{ A operator +(const A& a) const; };
class B : A {};
void foo (A* ip1, B* ip2, int n) {
for (int i = 0; i < n; ++i) {
ip1[i+1] = ip2[i] + ip2[i+1];
}
}
Disclaimer: I know that this example is actually crap, remember that in modern
C++
, direct use of raw pointers is
considered a bad practice.
In this case, the function parameters ip1
and ip2
are legal aliases of each
other since class B
is a subclass of A
. Anyway, in C++
we have templates:
two types generated from the same template while using different template
parameters are considered non-compatible types.
Aliasing in FORTRAN
People doing numerical stuff do love FORTRAN
, and for good reasons: it is
carefully cut for their needs, designed from the ground up for number crunching.
And people doing numerical stuff do love saying that FORTRAN
is faster
than any other language they ever tried (or just read about, usually they think
is enough). Well, this is obviously not true, especially when comparing
FORTRAN
and C
. There is a caveat though: if you don’t know the C
language,
during casual use it’s likely that compiled FORTRAN
will produce faster
assembly code. There’s a reason of course: regarding aliasing, the designers
took the easy way forbidding it at all. Aliasing in FORTRAN
is strictly
forbidden, except for read-only references.
So, a function like the following:
SUBROUTINE f (A, B)
INTEGER, DIMENSION(*), INTENT(INOUT) :: A
INTEGER, DIMENSION(*), INTENT(OUT) :: B
INTEGER :: I
DO I=2,LEN(A):
A(I) = A(I-1) ** 2
B(I) = A(I) ** 2
ENDDO
END SUBROUTINE
…cannot be called in this way:
INTEGER, DIMENSION(100) :: Arr
CALL f(Arr, Arr) ! <-- UNDEFINED BEHAVIOUR
The fact that we are passing the same reference to both the parameters yields an undefined behaviour with its load of potentially mind crushing bugs.
To be continued…
We will dive deeper in more outer-to-inner rule corollaries, pointer hierarchies and memory windows in the next post (coming soon).
Resources: