ODR of C++

Recently in my project I bumped into a weird issue – we vend a framework that is going to be used by another team’s daemon, but as soon as their daemon is linked to our framework, even without calling into our framework, it stops working in a bizarre way. The daemon would launch and run, but soon dies mysteriously, leaving no trace to debug. We had multiple hypotheses, including extra memory usage causing the daemon to be killed, and security violations caused by linking to a new framework. Several days were spent on this issue fruitlessly, until finally I realized that both our framework and the consumer daemon were built with a common set of C++ source files. Per discussion with a coworker, this might trigger calling unintended implementations based on C++’s one definition rule (ORD). I modified the common source code used by our framework by moving them to a different namespace, and the issue is immediately gone. This reminds me once again how tricky C++ can be in areas that are easily neglected.

To illustrate the issue we encountered with a simple example, consider a source file a.cpp which is

#include <iostream>

void say() {
    std::cout << "hi, a is saying" << std::endl;
}

and main.cpp which contains the main function and utilizes the function defined in a.cpp

void say();

int main() {
    say();
    return 0;
}

We can build the binary using the following command and run without issues

% g++ -dynamiclib -fPIC a.cpp -o liba.so; g++ -L. -la main.cpp -o main
% ./main
hi, a is saying

however, if we now have a source file called b.cpp which redefines the symbol say in a different way, as

#include <iostream>

void say() {
    std::cout << "hi, b is saying" << std::endl;
}

and we build it into a shared library

% g++ -dynamiclib -fPIC b.cpp -o libb.so

Now when you rebuild main.cpp again by linking with libb.so too without actually using it. You will find that the runtime behavior is changed.

% g++ -L. -lb -la main.cpp -o main
% ./main
hi, b is saying

Why is this? In hindsight, it is not hard to explain at all with C++’s ODR: both library a and library b expose the same (mangled) symbol __Z3sayv which is compiled from the say function. And when it is invoked by the main binary, one implementation will be chosen and the other will be shadowed. Which one would be chosen? It seems that on my machine, whichever appears first in the linking library order will be chosen. A more detailed explanation can be found at https://en.wikipedia.org/wiki/One_Definition_Rule

 nm -gU liba.so libb.so

liba.so:
0000000000003028 T __Z3sayv
0000000000003bf0 T __ZNSt3__111char_traitsIcE11eq_int_typeEii
0000000000003c18 T __ZNSt3__111char_traitsIcE3eofEv
0000000000003318 T __ZNSt3__111char_traitsIcE6lengthEPKc
0000000000003124 T __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m
00000000000030cc T __ZNSt3__14endlIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_
0000000000003058 T __ZNSt3__1lsINS_11char_traitsIcEEEERNS_13basic_ostreamIcT_EES6_PKc

libb.so:
0000000000003014 T __Z3sayv
0000000000003110 T __Z7b_startv
0000000000003bf0 T __ZNSt3__111char_traitsIcE11eq_int_typeEii
0000000000003c18 T __ZNSt3__111char_traitsIcE3eofEv
0000000000003318 T __ZNSt3__111char_traitsIcE6lengthEPKc
0000000000003124 T __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m
00000000000030b8 T __ZNSt3__14endlIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_
0000000000003044 T __ZNSt3__1lsINS_11char_traitsIcEEEERNS_13basic_ostreamIcT_EES6_PKc

The issue might happen more often than people realize, as in most cases when you redefine a symbol in multiple libraries, it is caused by duplicated (same) source files. So whichever implementation is used does not impact runtime behavior, while in our case, we modified the implementation based on our needs in different projects, leading to the phenomenon that the program can run, but not in the expected way!