I l@ve RuBoard

Solution

Let's answer the Item questions one at a time.

What is the Pimpl Idiom's space overhead?

"What space overhead?" you ask? Well, we now need space for at least one extra pointer (and possibly two, if there's a back pointer in XImpl) for every X object. This typically adds at least 4 (or 8) bytes on many popular systems, and possibly as many as 14 bytes or more, depending on alignment requirements. For example, try the following program on your favorite compiler.
```
struct X { char c; struct XImpl; XImpl* pimpl_; }; 
struct X::XImpl { char c; };
int main()
{
  cout << sizeof(X::XImpl) << endl
       << sizeof(X) << endl;
}
```
On many popular compilers that use 32-bit pointers, this prints:
```
1
8
```
On these compilers, the overhead of storing one extra pointer was actually 7 bytes, not 4. Why? Because the platform on which the compiler is running requires a pointer to be stored on a 4-byte boundary, or else it performs much more poorly if the pointer isn't stored on such a boundary. Knowing this, the compiler allocates 3 bytes of unused/empty space inside each X object, which means the cost of adding a pointer member was actually 7 bytes, not 4. If a back pointer is also needed, then the total storage overhead can be as high as 14 bytes on a 32-bit machine, as high as 30 bytes on a 64-bit machine, and so on.

How do we get around this space overhead? The short answer is: We can't eliminate it, but sometimes we can minimize it.

The longer answer is: There's a downright reckless way to eliminate it that you should never, ever use (and don't tell anyone that you heard it from me), and there's usually a nonportable, but correct, way to minimize it. The utterly reckless "space optimization" happens to be the same as the utterly reckless "performance optimization," so I've moved that discussion off to the side; see the upcoming sidebox "Reckless Fixes and Optimizations, and Why They're Evil."

If (and only if) the space difference is actually important in your program, then the nonportable, but correct, way to minimize the pointer overhead is to use compiler-specific #pragmas. Many compilers will let you override the default alignment/packing for a given class; see your vendor's documentation for details. If your platform only "prefers" (rather than "enforces") pointer alignment and your compiler offers this feature, then on a 32-bit platform you can eliminate as much as 6 bytes of overhead per X object, at the (possibly minuscule) cost of run-time performance, because actually using the pointer will be slightly less efficient. Before you even consider anything like this, though, always follow the age-old sage advice: First make it right, then make it fast. Never optimize—neither for speed, nor for size—until your profiler and other tools tell you that you should.
What is the Pimpl Idiom's performance overhead?

Using the Pimpl idiom can have a performance overhead for two main reasons. For one thing, each X construction/destruction must now allocate/deallocate memory for its XImpl object, which is typically a relatively expensive operation.^[6] For another, each access of a member in the Pimpl can require at least one extra indirection; if the hidden member being accessed itself uses a back pointer to call a function in the visible class, there will be multiple indirections.

^[6] Compared with most other common operations in C++, such as function calls. Note that here I'm specifically talking about the cost of using a general-purpose allocator, which is what you typically get with the builtin ::operator new() and malloc().

How do we get around this performance overhead? The short answer is: Use the Fast Pimpl idiom, which I'll cover next. (There's also a downright reckless way to eliminate it that you should never, ever use; see the sidebar "Reckless Fixes and Optimizations, and Why They're Evil" for more information.)
Discuss Attempt #3.

The short answer about attempt #3 is: Don't do this. Bottom line, C++ doesn't support opaque types directly, and this is a brittle attempt (some people, like me, would even say "hack") to work around that limitation.

What the programmer almost certainly wants is something else, namely the Fast Pimpl idiom.

The second part of the third question was: Can you think of a better way to get around the overhead?

The main performance issue here is that space for the Pimpl objects is being allocated from the free store. In general, the right way to address allocation performance for a specific class is to provide a class-specific operator new() for that class and use a fixed-size allocator, because fixed-size allocators can be made much more efficient than general-purpose allocators.

// file x.h 
class X
{
  /*...*/
  struct XImpl;
  XImpl* pimpl_;
};

// file x.cpp
#include "x.h"
struct X::XImpl
{
  /*...private stuff here...*/
  static void* operator new( size_t )   { /*...*/ }
  static void  operator delete( void* ) { /*...*/ }
};
X::X() : pimpl_( new XImpl ) {}
X::~X() { delete pimpl_; pimpl_ = 0; }

"Aha!" you say. "We've found the holy grail—the Fast Pimpl!" you say. Well, yes, but hold on a minute and think about how this will work and what it will cost you.

Your favorite advanced C++ or general-purpose programming textbook has the details about how to write efficient fixed-size [de]allocation functions, so I won't cover that again here. I will talk about usability. One technique is to put the [de]allocation functions in a generic fixed-size allocator template, perhaps something like this:

template<size_t S> 
class FixedAllocator
{
public:
  void* Allocate( /*requested size is always S*/ );
  void  Deallocate( void* );
private:
  /*...implemented using statics?...*/
};

Because the private details are likely to use statics, however, there could be problems if Deallocate is ever called from a static object's destructor. Probably safer is a singleton that manages a separate free list for each request size (or, as an efficiency tradeoff, a separate free list for each request size "bucket"—for example, one list for blocks of size 0-8, another for blocks of size 9-16, and so forth).

class FixedAllocator 
{
public:
  static FixedAllocator& Instance();
  void* Allocate( size_t );
  void  Deallocate( void* );
private:
  /*...singleton implementation, typically
       with easier-to-manage statics than
       the templated alternative above...*/
};

Let's throw in a helper base class to encapsulate the calls. This works because derived classes "inherit" these overloaded base operators.

struct FastArenaObject 
{
  static void* operator new( size_t s )
  {
    return FixedAllocator::Instance()->Allocate(s);
  }
  static void operator delete( void* p )
  {
    FixedAllocator::Instance()->Deallocate(p);
  }
};

Now, you can easily write as many Fast Pimpls as you like:

//  Want this one to be a Fast Pimpl? 
//  Easy, then just inherit...
struct X::XImpl : FastArenaObject
{
  /*...private stuff here...*/
};

Applying this technique to the original problem, we get a variant of Attempt #2:

// file y.h 

class X;
class Y
{
  /*...*/
  X* px_;
};

// file y.cpp

#include "x.h" // X now inherits from FastArenaObject
Y::Y() : px_( new X ) {}
Y::~Y() { delete px_; px_ = 0; }

But beware! This is nice, but don't use the Fast Pimpl willy nilly. You're getting extra allocation speed, but as usual you should never forget the cost. Managing separate free lists for objects of specific sizes usually means incurring a space efficiency penalty, because any free space is fragmented (more than usual) across several lists.

A final reminder: As with any other optimization, use Pimpls in general and Fast Pimpls in particular only after profiling and experience prove that the extra performance boost is really needed in your situation.

Guideline

Avoid inlining or detailed tuning until performance profiles prove the need.

Reckless Fixes and Optimizations, and Why They're Evil

The main solution text shows why using the Pimpl Idiom can incur space and performance overheads, and it also shows the right way to minimize or eliminate those overheads. There is also a sometimes-recommended, but wrong, way to deal with them.

Here's the reckless, unsafe, might-work-if-you're-lucky, evil, fattening, and high-cholesterol way to eliminate the space and performance overheads, and you didn't hear it from me—the only reason I'm mentioning it at all is because I've seen people try to do this:

// evil dastardly header file x.h 
class X
{
  /* . . . */
  static const size_t sizeofximpl = /*some value*/;
  char pimpl_[sizeofximpl];
};

// pernicious depraved implementation file x.cpp
#include "x.h"
X::X()
{
  assert( sizeofximpl >= sizeof(XImpl) );
  new (&pimpl_[0]) XImpl;
}
X::~X()
{
  (reinterpret_cast<XImpl*>(&pimpl_[0]))->~XImpl();
}

DON'T DO THIS! Yes, it removes the space overhead—it doesn't use so much as a single pointer.^[7] Yes, it removes the memory allocation overhead—there's nary a malloc or new in sight. Yes, it might even happen to work on the current version of your current compiler.

It's also completely nonportable. Worse, it will completely break your system, even if it does appear to work at first. Here are several reasons.

Alignment. Any memory that's allocated dynamically via new or malloc is guaranteed to be properly aligned for objects of any type, but buffers that are not allocated dynamically have no such guarantee:
```
char* buf1 = (char*)malloc( sizeof(Y) ); 
char* buf2 = new char[ sizeof(Y) ];
char  buf3[ sizeof(Y) ];
new (buf1) Y;     // OK, buf1 allocated dynamically (A)
new (buf2) Y;     // OK, buf2 allocated dynamically (B)
new (&buf3[0]) Y; // error, buf3 may not be suitably aligned
(reinterpret_cast<Y*>(buf1))->~Y(); // OK
(reinterpret_cast<Y*>(buf2))->~Y(); // OK
(reinterpret_cast<Y*>(&buf3[0]))->~Y(); // error
```
Just to be clear: I'm not recommending that you do A or B. I'm just pointing out that they're legal, whereas the above attempt to have a Pimpl without dynamic allocation is not, even though it may (dangerously) appear to work correctly at first if you happen to get lucky.^[8]
Brittleness. The author of X has to be inordinately careful with otherwise ordinary X functions. For example, X must not use the default assignment operator, but must either suppress assignment or supply its own. (Writing a safe X::operator=() isn't too hard, but I'll leave it as an exercise for the reader. Remember to account for exception safety in that and in X::~X.^[9] Once you're finished, I think you'll agree that this is a lot more trouble than it's worth.)

^[9] See the Item 8 through 17 miniseries.
Maintenance cost. When sizeof(XImpl) grows beyond sizeofximpl, the programmer must bump up sizeofximpl. This can be an unattractive maintenance burden. Choosing a larger value for sizeofximpl mitigates this, but at the expense of trading off efficiency (see #4).
Inefficiency. Whenever sizeofximpl > sizeof(XImpl), space is being wasted. This can be minimized, but at the expense of maintenance effort (see #3).
Just plain wrongheadedness. In short, it's obvious that the programmer is trying to do "something unusual." Frankly, in my experience, "unusual" is just about always a synonym for "hack." Whenever you see this kind of subversion—whether it's allocating objects inside character arrays like this programmer is doing, or implementing an assignment using explicit destruction and placement as discussed in Item 41—you should Just Say No.

Bottom line, C++ doesn't support opaque types directly, and this is a brittle attempt to work around that limitation.

^[7] This completely hides the Pimpl class—but, of course, clients must still be recompiled if sizeofximpl changes.

^[8] All right, I'll 'fess up: There actually is a (not very portable, but pretty safe) way to put the Pimpl class right into the main class like this, thus avoiding all space and time overhead. It involves creating a "max_align" struct that guarantees maximal alignment, and defining the Pimpl member as union { max_align dummy; char pimpl_[sizeofximpl]; };—this will guarantee sufficient alignment. For all the gory details, do a search for "max_align" on the Web or on DejaNews. However, I still strongly urge you not to go down this sordid path, because using a max_align solves only this first issue and does not address the second through fifth issues. You Have Been Warned.

^[9] See the Item 8 through 17 miniseries.

I l@ve RuBoard