I am wondering what the correct memory order is for the classic "atomic counter dynamic scheduling" idiom. That is:
- use fetch-add to get the index
i
of the next element to process - if
i
is past the end of the array, terminate - process element
i
thread-safely, since no other thread can havei
- go to 1.
For example:
#include <atomic>std::atomic_int counter = 0;void foo(int *data, int size) { // we could also write counter++ for (int i; (i = counter.fetch_add(1, std::memory_order::seq_cst)) < size;) { data[i] *= 2; }}
// driver code#include <thread>#include <numeric>#include <cassert>int main() { int data[1'000'000]; std::iota(std::begin(data), std::end(data), 0); std::thread other{foo, data, std::size(data)}; foo(data, std::size(data)); other.join(); for (int i = 0; i < std::size(data); ++i) { assert(data[i] == i * 2); }}
This code works, and it should be safe, because processing an element cannot be reordered before or after getting the next index, and all fetch-adds are observed in a consistent total order by all threads. These requirements seem overly strict to me, and I believe we could use a more relaxed ordering.
std::memory_order_relaxed
and std::memory_order::acquire
are too relaxed I believe, because all threads observe counter = 0;
initially, and we have to ensure that data[0] *= 2
is not moved before the first fetch_add
. These two memory orders would allow that.
The answer has to be one of:
std::memory_order::seq_cst
std::memory_order::acq_rel
std::memory-order::release
Which one is the correct order in this situation?