How to Build an Allocator-aware Smart Pointer

How and where memory is allocated can have a huge effect on an application’s performance.

C++17 introduced polymorphic allocators and the concept of allocator-aware types, making it easier than ever to deploy custom allocation strategies.

While C++17 comes packaged with many useful allocator-aware types already, it’s missing an important component: an allocator-aware smart pointer.

Writing an allocator-aware smart pointer is tricky, but possible. This guide shows you how.

What Does it Mean for a Type to be Allocator-aware?

Before getting in to the details of an allocator-aware smart pointer, let’s briefly discuss what it means for a type to be allocator-aware.

When a type, for example my_container, is allocator-aware it provides

  1. A mechanism to query it’s allocator

my_container::allocator_type
  // tells us the cointainer's allocator;
  // for this guide, we'll assume this is always
  // std::pmr::polymorphic_allocator

const my_container& cont = /* a reference to an instance of my_container */;
cont.get_allocator();
  // returns the allocator associated with `cont`
  1. Overloaded constructors that allow you to specify the allocator to use

    If we can write

my_container cont{"abc", 123};

then we can also write

std::pmr::polymorphic_allocator alloc = /* a custom allocator */
my_container cont{"abc", 123, alloc};

Allocator-aware types are composable. If my_container has allocator-aware data members, it forwards its allocator argument to them on construction. If we write code like this, for example,

std::pmr::polymorphic_allocator<> alloc = /* a custom allocator */
std::pmr::vector<std::pmr::string> v{alloc};
v.emplace_back("abc123");

Then both the data for the vector v and all of the strings it owns will be allocated from alloc.

The move constructor and the move assignment operator for an allocator-aware type must also meet certain requirements.

Suppose we construct an instance of an allocator-aware type and then move-construct another instance with a different allocator

std::pmr::polymorphic_allocator<> alloc1 = /* some allocator */
std::pmr::polymorphic_allocator<> alloc2 = /* some other allocator */
assert(alloc1 != alloc2);

my_container cont1{alloc1};
my_container cont2{std::move(cont1), alloc2};

my_container is not allowed to move memory owning data structures from cont1 into cont2. It must allocate it’s own memory from alloc2 and then copy the data structures in.

Why?

  1. The lifetime of the memory resources associated with allocators may be different. If we were to move memory from cont1 into cont2, we might be left with dangling pointers if the resource for alloc1 were to be destructed.

  2. One of the use cases for allocator-aware software is to maximize memory locality and prevent memory diffusion.

Suppose we have code like this

std::pmr::unsynchronized_pool_resource pool;
std::pmr::polymorphic_allocator<> alloc1{&pool};
std::pmr::vector<my_container> collection{alloc1};

// Our program runs for a while and collection is built up

my_container cont = /* some container that was produced from our program and uses
                       the global allocator */;
collection.emplace_back(std::move(cont));

// later... traverse and process collection

When we move-insert cont into collection, a new instance of my_container will be allocated using alloc1 and the move constructor will be invoked with cont and a copy of alloc1.

At first this might seem like extra work; but by forcing memory of collection to be allocated with alloc1, we can ensure that collection has a localized layout in memory.

Later, if successive entries from collection are accessed and modified together, the better locality can be much more important to performance than the cost of the additional allocation.

See also

Bloomberg did extensive benchmarking measuring how important locality can be to performance.

See On Quantifying Memory-Allocation Strategies.

They benchmarked across a variety of scenarios and found that in many cases localized allocators can improve performance by as much as a factor of 4x to 8x.

How Can We Make a Smart Pointer Allocator-aware?

Now that we’ve discussed the requirements for a type to be allocator-aware, let’s consider how we might create an allocator-aware smart pointer.

We’ll assume that like std::unique_ptr, our smart pointer has move only semantics, so we might sketch the class out as

template <class T>
class managed_ptr {
  public:
    using allocator_type = std::pmr::polymorphic_allocator<>;

    managed_ptr() noexcept = default;

    template <class U>
      requires std::convertible_to<U*, T*> && ???
    managed_ptr(U* ptr, allocator_type alloc = {}) noexcept {
      // ???
    }

    managed_ptr(const managed_ptr&) = delete;

    template <class U>
      requires std::convertible_to<U*, pointer>
    managed_ptr(managed_ptr<U>&& other) noexcept {
      // ???
    }

    template <class U>
      requires std::convertible_to<U*, pointer>
    managed_ptr(managed_ptr<U>&& other, allocator_type alloc)
      : alloc_{alloc}
    {
      this->operator=(std::move(other));
    }

    ~managed_ptr() noexcept {
      this->reset();
    }

    managed_ptr& operator=(const managed_ptr&) = delete;

    template <class U>
      requires std::convertible_to<U*, pointer>
    managed_ptr& operator=(managed_ptr<U>&& other) {
      // ???
    }

    allocator_type get_allocator() const noexcept {
      return alloc_;
    }

    void reset() noexcept {
      // ???
    }

    // other standard pointer methods such as get, operator*, operator->, release, etc

  private:
    allocator_type alloc_;
    T* ptr_ = nullptr;
    // ???
};

managed_ptr allows for an allocator to be specified on construction and allows for the allocator to be queried, but how can the methods be filled in so that it meets the other requirements of allocator aware types?

Recall that a smart pointer can point to a base class, so we might write code like the following

class A {
  public:
    virtual ~A() noexcept = default;

    // ....
};

class B : public A {
  //...
};

std::pmr::polymorphic_allocator<> alloc1 = /* some custom allocator */

managed_ptr<A> ptr1{alloc1.new_object<B>(/* arguments */), alloc1};

std::pmr::polymorphic_allocator<> alloc2 = /* a different custom allocator */
assert(alloc1 != alloc2);

managed_ptr<A> ptr2{std::move(ptr1), alloc2};

Because the allocators are unequal, ptr2 must allocate new memory from alloc2 for the original derived type B and move-construct an instance of B so that it’s construction is equivalent to

managed_ptr<A> ptr2{
          alloc2.new_object<B>(std::move(*static_cast<B*>(ptr1.get()))),
          alloc2};
ptr1.reset();

To do this, we need to type-erase the move constructor for the derived class.

But that’s not all. Suppose we destruct a managed_ptr

managed_ptr<A> ptr1{alloc1.new_object<B>(/* arguments */), alloc1};
ptr1.reset();

Unlike the global delete, std::pmr::polymorphic_allocator<> needs to be passed the size of the allocation when it deallocates.

We’ll use a function pointer to type erase both this information. Let’s now fill in the constructors.

template <class T>
class managed_ptr {
    using pointer_operator = void* (*)(void*, std::pmr::polymorphic_allocator<>, bool);
  public:
    // ...
    template <class U>
      requires std::convertible_to<U*, T*> &&
               std::move_constructible<U>
    managed_ptr(U* ptr, allocator_type alloc = {}) noexcept
      : alloc{alloc}, ptr_{ptr}
    {
      operator_ =
        [](void* ptr, allocator_type alloc, bool construct) {
          auto derived = static_cast<U*>(ptr);
          if (construct) {
            return static_cast<void*>(alloc.new_object<U>(std::move(*derived)));
          } else {
            std::allocator_traits<allocator_type>::destroy(alloc, derived);
            alloc.deallocate_object(derived);
            return nullptr;
          }
        };
    }

    template <class U>
      requires std::convertible_to<U*, pointer>
    managed_ptr(managed_ptr<U>&& other) noexcept {
      : alloc_{other.alloc},
        ptr_{other.release()},
        operator_{other.operator_}
    {}

  private:
    allocator_type alloc_;
    T* ptr_ = nullptr;
    pointer_operator operator_;
};

And now we can use the stored operator_ to fill in the move-assignment operator

template <class T>
class managed_ptr {
  // ...
    template <class U>
      requires std::convertible_to<U*, pointer>
    managed_ptr& operator=(managed_ptr<U>&& other) {
      operator_ = other.operator_;
      if (alloc_ == other.alloc_) {
        // the allocators are equal so there's no need to do an allocation
        ptr_ = other.release();
        return *this;
      }
      // use the operator to reallocate with the correct allocator and then
      // move construct an instance of the derived type.
      ptr_ = static_cast<T*>(
          operator_(static_cast<void*>(other.ptr_), alloc_, true));
      other.reset();
      return *this;
    }
  // ...
};

Similarly, we can use operator_ to implement the reset method

template <class T>
class managed_ptr {
  // ...
  void reset() noexcept {
    if (ptr_ == nullptr) {
      return;
    }
    operator_(static_cast<void*>(ptr_), alloc_, false);
    ptr_ = nullptr;
  }
  // ...
};

This approach will work for most cases, but what about multiple inheritance?

Suppose we have

class A1 {
 public:
  virtual ~A1() noexcept = default;

  // ...
 private:
  int a1;
}

class A2 {
 public:
  virtual ~A2() noexcept = default;

  // ...
 private:
  int a2;
};

class B : public A1, public A2 {
  // ....
};

std::pmr::polymorphic_allocator<> alloc = /* an allocator */
auto bptr = alloc.new_object<B>(/* arguments */);
managed_ptr<A2> ptr{bptr, alloc};

In this case ptr.get() and bptr will not point to the same region of memory so the operator function won’t work.

This can be fixed by adding a void* data member that tracks the original pointer used for construction and offseting on a new allocation.

I won’t go into the all the details in this guide, but you can check out full source code here.

An Example: Parsing Json

Let’s see how managed_ptr might work in an example.

We’ll look at an application that parses a simplified subset of json. It will only handle integral numbers and arrays of integral numbers, for example

[
  1, 2, 3,
  [4, 5, [6], 7]
]

We’ll represent parsed json with polymorphic types.

enum class json_value_type { number, array };

class json_value {
 public:
   virtual ~json_value() noexcept = default;

   virtual json_value_type type() const noexcept = 0;
};

class json_number final : public json_value {
 public:
  explicit json_number(int value) noexcept : value_{value} {}

  // json_value
  json_value_type type() const noexcept override { return json_value_type::number; }

 private:
  int value_;
};

class json_array final : public json_value {
  using vector_type =
      std::pmr::vector<managed_ptr<const json_value>>;

 public:
  using allocator_type = std::pmr::polymorphic_allocator<>;

  json_array() noexcept = default;

  explicit json_array(allocator_type alloc) noexcept : values_{alloc} {}

  explicit json_array(vector_type&& values, allocator_type alloc = {}) noexcept
      : values_{std::move(values), alloc} {}

  allocator_type get_allocator() const noexcept {
    return values_.get_allocator();
  }

  json_array(json_array&& other) noexcept = default;

  json_array(json_array&& other, allocator_type alloc) noexcept
      : values_{std::move(other.values_), alloc} {}

  json_array& operator=(json_array&& other) noexcept = default;

  // ...

  // json_value
  json_value_type type() const noexcept override {
    return json_value_type::array;
  }

 private:
  vector_type values_;
};

We provide a function that parses out json into a manged_ptr

void parse_json(managed_ptr<json_value>& json, std::string_view s);

By using managed_ptr, we support using a custom allocator for the parsed json.

To see how this can be beneficial, let’s write a small benchmark.

We’ll parse json and then recursively sum all the numbers in the arrays. We test a version that parses the json into extended stack space and a version that uses standard allocation with an std::unique_ptr.

const std::string_view s1 = R"(
[
  1, 7, -2,
  [10, -12, 100, 15, -77],
  [[10], [9, 2], [[37]]],
  [[[[]]], [8, -9, [-8, [-1, [10, [5], 8]]]]],
  [1, [2, [3, [4, [5, [6, [7, [8, [9, [10, 11, 12]]]]]]]]]]
]
)";

static void BM_parse_sum_json_managed_stackext(benchmark::State& state) {
  for (auto _ : state) {
    stackext_resource resource;
    managed_ptr<json_value> json{&resource};
    parse_json(json, s1);
    auto sum = sum1(json.get());
    benchmark::DoNotOptimize(sum);
  }
}
BENCHMARK(BM_parse_sum_json_managed_stackext);

static void BM_parse_sum_json_managed_stackext_wink(benchmark::State& state) {
  for (auto _ : state) {
    stackext_resource resource;
    std::pmr::polymorphic_allocator alloc{&resource};
    auto json = alloc.new_object<managed_ptr<json_value>>();
      // note: we wink out here to avoid the unnecessary call to
      // the destructor.
    parse_json(*json, s1);
    auto sum = sum1(json->get());
    benchmark::DoNotOptimize(sum);
  }
}
BENCHMARK(BM_parse_sum_json_managed_stackext_wink);

The table below shows the results:

Variant

Time (nanoseconds)

global allocator

2917

stackext (w/o wink)

2690

stackext (w/ wink)

2468

Stay up to date