Solution
Let's answer the Item questions one at a
time.
-
What is the Pimpl
Idiom's space overhead?
"What space overhead?" you ask? Well, we now
need space for at least one extra pointer (and possibly two, if
there's a back pointer in XImpl) for every X
object. This typically adds at least 4 (or 8) bytes on many popular
systems, and possibly as many as 14 bytes or more, depending on
alignment requirements. For example, try the following program on
your favorite compiler.
struct X { char c; struct XImpl; XImpl* pimpl_; };
struct X::XImpl { char c; };
int main()
{
cout << sizeof(X::XImpl) << endl
<< sizeof(X) << endl;
}
On many popular compilers that use 32-bit
pointers, this prints:
1
8
On these compilers, the overhead of storing one
extra pointer was actually 7 bytes, not 4. Why? Because the
platform on which the compiler is running requires a pointer to be
stored on a 4-byte boundary, or else it performs much more poorly
if the pointer isn't stored on such a boundary. Knowing this, the
compiler allocates 3 bytes of unused/empty space inside each
X object, which means the cost of adding a pointer member
was actually 7 bytes, not 4. If a back pointer is also needed, then
the total storage overhead can be as high as 14 bytes on a 32-bit
machine, as high as 30 bytes on a 64-bit machine, and so on.
How do we get around this space overhead? The
short answer is: We can't eliminate it, but sometimes we can
minimize it.
The longer answer is: There's a downright
reckless way to eliminate it that you should never, ever use (and
don't tell anyone that you heard it from me), and there's usually a
nonportable, but correct, way to minimize it. The utterly reckless
"space optimization" happens to be the same as the utterly reckless
"performance optimization," so I've moved that discussion off to
the side; see the upcoming sidebox "Reckless Fixes and Optimizations, and Why They're
Evil."
If (and only if) the space difference is
actually important in your program, then the nonportable, but
correct, way to minimize the pointer overhead is to use
compiler-specific #pragmas. Many compilers will let you
override the default alignment/packing for a given class; see your
vendor's documentation for details. If your platform only "prefers"
(rather than "enforces") pointer alignment and your compiler offers
this feature, then on a 32-bit platform you can eliminate as much
as 6 bytes of overhead per X object, at the (possibly
minuscule) cost of run-time performance, because actually using the
pointer will be slightly less efficient. Before you even consider
anything like this, though, always follow the age-old sage advice:
First make it right, then make it
fast. Never optimize—neither for speed, nor for size—until
your profiler and other tools tell you that you should.
-
What is the
Pimpl Idiom's performance overhead?
Using the Pimpl idiom can have a performance
overhead for two main reasons. For one thing, each X
construction/destruction must now allocate/deallocate memory for
its XImpl object, which is typically a relatively
expensive operation. For another, each access of a member
in the Pimpl can require at least one extra indirection; if the
hidden member being accessed itself uses a back pointer to call a
function in the visible class, there will be multiple
indirections.
How do we get around this performance overhead?
The short answer is: Use the Fast Pimpl idiom, which I'll cover
next. (There's also a downright reckless way to eliminate it that
you should never, ever use; see the sidebar "Reckless Fixes and Optimizations, and Why They're
Evil" for more information.)
-
Discuss Attempt
#3.
The short answer about attempt #3 is: Don't do
this. Bottom line, C++ doesn't support opaque types directly, and
this is a brittle attempt (some people, like me, would even say
"hack") to work around that limitation.
What the programmer almost certainly wants is
something else, namely the Fast Pimpl idiom.
The second part of the third question was: Can
you think of a better way to get around the overhead?
The main performance issue here is that space
for the Pimpl objects is being allocated from the free store. In
general, the right way to address allocation performance for a
specific class is to provide a class-specific operator
new() for that class and use a fixed-size allocator, because
fixed-size allocators can be made much more efficient than
general-purpose allocators.
// file x.h
class X
{
/*...*/
struct XImpl;
XImpl* pimpl_;
};
// file x.cpp
#include "x.h"
struct X::XImpl
{
/*...private stuff here...*/
static void* operator new( size_t ) { /*...*/ }
static void operator delete( void* ) { /*...*/ }
};
X::X() : pimpl_( new XImpl ) {}
X::~X() { delete pimpl_; pimpl_ = 0; }
"Aha!" you say. "We've found the holy grail—the
Fast Pimpl!" you say. Well, yes, but hold on a minute and think
about how this will work and what it will cost you.
Your favorite advanced C++ or general-purpose
programming textbook has the details about how to write efficient
fixed-size [de]allocation functions, so I won't cover that again
here. I will talk about usability. One technique is to put the
[de]allocation functions in a generic fixed-size allocator
template, perhaps something like this:
template<size_t S>
class FixedAllocator
{
public:
void* Allocate( /*requested size is always S*/ );
void Deallocate( void* );
private:
/*...implemented using statics?...*/
};
Because the private details are likely to use
statics, however, there could be problems if Deallocate is
ever called from a static object's destructor. Probably safer is a
singleton that manages a separate free list for each request size
(or, as an efficiency tradeoff, a separate free list for each
request size "bucket"—for example, one list for blocks of size 0-8,
another for blocks of size 9-16, and so forth).
class FixedAllocator
{
public:
static FixedAllocator& Instance();
void* Allocate( size_t );
void Deallocate( void* );
private:
/*...singleton implementation, typically
with easier-to-manage statics than
the templated alternative above...*/
};
Let's throw in a helper base class to
encapsulate the calls. This works because derived classes "inherit"
these overloaded base operators.
struct FastArenaObject
{
static void* operator new( size_t s )
{
return FixedAllocator::Instance()->Allocate(s);
}
static void operator delete( void* p )
{
FixedAllocator::Instance()->Deallocate(p);
}
};
Now, you can easily write as many Fast Pimpls as
you like:
// Want this one to be a Fast Pimpl?
// Easy, then just inherit...
struct X::XImpl : FastArenaObject
{
/*...private stuff here...*/
};
Applying this technique to the original problem,
we get a variant of Attempt #2:
// file y.h
class X;
class Y
{
/*...*/
X* px_;
};
// file y.cpp
#include "x.h" // X now inherits from FastArenaObject
Y::Y() : px_( new X ) {}
Y::~Y() { delete px_; px_ = 0; }
But beware! This is nice, but don't use the Fast
Pimpl willy nilly. You're getting extra allocation speed, but as
usual you should never forget the cost. Managing separate free
lists for objects of specific sizes usually means incurring a space
efficiency penalty, because any free space is fragmented (more than
usual) across several lists.
A final reminder: As with any other
optimization, use Pimpls in general and Fast Pimpls in particular
only after profiling and experience prove that the extra
performance boost is really needed in your situation.
Guideline
|
Avoid inlining or
detailed tuning until performance profiles prove the need.
|
The main solution text shows why using the Pimpl
Idiom can incur space and performance overheads, and it also shows
the right way to minimize or eliminate those overheads. There is
also a sometimes-recommended, but wrong, way to deal with them.
Here's the reckless, unsafe,
might-work-if-you're-lucky, evil, fattening, and high-cholesterol
way to eliminate the space and performance overheads, and you
didn't hear it from me—the only reason I'm mentioning it at all is
because I've seen people try to do this:
// evil dastardly header file x.h
class X
{
/* . . . */
static const size_t sizeofximpl = /*some value*/;
char pimpl_[sizeofximpl];
};
// pernicious depraved implementation file x.cpp
#include "x.h"
X::X()
{
assert( sizeofximpl >= sizeof(XImpl) );
new (&pimpl_[0]) XImpl;
}
X::~X()
{
(reinterpret_cast<XImpl*>(&pimpl_[0]))->~XImpl();
}
DON'T DO THIS! Yes, it removes the space
overhead—it doesn't use so much as a single pointer.
Yes, it removes the memory allocation overhead—there's nary a
malloc or new in sight. Yes, it might even happen
to work on the current version of your current compiler.
It's also completely nonportable. Worse, it will
completely break your system, even if it does appear to work at
first. Here are several reasons.
-
Alignment. Any memory that's allocated
dynamically via new or malloc is guaranteed to be
properly aligned for objects of any type, but buffers that are
not allocated dynamically have no
such guarantee:
char* buf1 = (char*)malloc( sizeof(Y) );
char* buf2 = new char[ sizeof(Y) ];
char buf3[ sizeof(Y) ];
new (buf1) Y; // OK, buf1 allocated dynamically (A)
new (buf2) Y; // OK, buf2 allocated dynamically (B)
new (&buf3[0]) Y; // error, buf3 may not be suitably aligned
(reinterpret_cast<Y*>(buf1))->~Y(); // OK
(reinterpret_cast<Y*>(buf2))->~Y(); // OK
(reinterpret_cast<Y*>(&buf3[0]))->~Y(); // error
Just to be clear: I'm not recommending that you
do A or B. I'm just pointing out that they're legal, whereas the
above attempt to have a Pimpl without dynamic allocation is not,
even though it may (dangerously) appear to work correctly at first
if you happen to get lucky.
-
Brittleness. The author of X has to
be inordinately careful with otherwise ordinary X
functions. For example, X must not use the default
assignment operator, but must either suppress assignment or supply
its own. (Writing a safe X::operator=() isn't too hard,
but I'll leave it as an exercise for the reader. Remember to
account for exception safety in that and in
X::~X. Once you're finished, I think you'll
agree that this is a lot more trouble than it's worth.)
-
Maintenance cost. When sizeof(XImpl)
grows beyond sizeofximpl, the programmer must bump up
sizeofximpl. This can be an unattractive maintenance
burden. Choosing a larger value for sizeofximpl mitigates
this, but at the expense of trading off efficiency (see #4).
-
Inefficiency. Whenever sizeofximpl >
sizeof(XImpl), space is being wasted. This can be minimized,
but at the expense of maintenance effort (see #3).
-
Just plain wrongheadedness. In short, it's
obvious that the programmer is trying to do "something unusual."
Frankly, in my experience, "unusual" is just about always a synonym
for "hack." Whenever you see this kind of subversion—whether it's
allocating objects inside character arrays like this programmer is
doing, or implementing an assignment using explicit destruction and
placement as discussed in Item 41—you should Just Say
No.
Bottom line, C++ doesn't support opaque types
directly, and this is a brittle attempt to work around that
limitation.
|
|