ATL Memory
Managers
A Review of
Windows Memory Management
Applications use memory for
almost everything they do. In Windows, memory can be allocated from
three principle places: the thread's stack, memory-mapped files,
and heaps. Memory-mapped files are more specialized, so we don't
discuss them further here. The stack is used to allocate local
variables because their size is known at compile time and their
allocation must be as efficient as possible. Allocating and
deallocating storage from the stack involves merely incrementing
and decrementing the stack pointer by the appropriate amount.
Dynamic memory, however, is allocated and freed as the program
runs, based on changing characteristics within the application.
Instead of being allocated from the thread's stack, dynamic memory
comes from pools of storage known as heaps. A heap is an independently managed
block of memory that services dynamic memory allocation requests
and reclaims memory that an application no longer uses. Typically,
heaps expose APIs for creating the heap, destroying the heap,
allocating a block of memory within the heap, and returning a block
of memory to the heap. The precise algorithms employed in
coordinating these tasks constitute what is commonly termed the
heap manager. In general, heap
managers implement various schemes for managing resources for
specialized circumstances. The heap manager functions exposed for
applications often reflect some of the differences that make one
particular type of heap suitable over another for a particular
circumstance.
In Windows, each process creates a default heap at initialization. Applications
use Win32 functions such as HeapCreate,
HeapAlloc, and HeapFree to manage the heap and
blocks of data within the default heap. Because many Windows
functions that can be called from multiple applications use this
heap, the default heap is implemented to be thread safe. Access to
the default heap is serialized so that multiple simultaneous
threads accessing the heap will not corrupt it. Older versions of
Windows used functions such as LocalAlloc and
GlobalAlloc to manipulate the heap, but these functions
are now deprecated. They run slower and offer fewer features than
the HeapXXX suite.
Applications that link to the C-runtime library
(which ATL projects now do by default) have access to another heap
simply known as the CRT heap. The memory-management functions the
CRT heap manager exposes are likely the most recognizable to the
general C/C++ community because they are part of the C standard.
With the CRT heap, functions such as malloc are used to
obtain storage from the heap; free is used to return
storage to the heap.
An application might need to use different heaps
for various reasons, such as with specialized management
requirements. For instance, COM introduces an additional set of
complexities to the general problem of memory management. Memory
addresses are allocated on a per-process basis, so a process cannot
directly access data stored in another process. Yet COM allows data
to be marshaled between processes. If a method call
is remoted so that a client expects an [out] parameter
from the object, the memory for that [out] parameter will
be allocated in one process (the object's), and used and freed in
another process (the client's). Clearly, a conventional heap has no
way to straddle process boundaries to associate allocations in one
process with free operations in another. The COM task allocator
lives to provide this very service. Part of the COM programming
conventions is that when allocating memory blocks that will be
shared across a COM interface, that memory must be allocated by
calling CoTaskMemAlloc and must be freed by the
corresponding CoTaskMemFree. By agreeing on these
standardized functions, the automatically generated proxy-stub code
can properly allocate and free memory across COM boundaries. COM's
remoting infrastructure does all the dirty work needed to create
the illusion of a single heap that spans processes.
Several other reasons exist for managing memory
with different heaps. Components that are allocated from separate
heaps are better isolated from one another, which could make the
heaps less susceptible to corruption. If objects will be accessed
close together in timesay, within the same functionit is desirable
for those objects to live close together in memory, which can
result in fewer page faults and a marked improvement in overall
performance. Some applications choose to implement custom,
specialized memory managers that are tuned to specific
requirements. Using separate heaps also could allow the application
to avoid the overhead associated with synchronizing access to a
single heap. As previously mentioned, a Win32 process's default
heap is thread safe because it expects to be accessed
simultaneously from multiple threads. This leads to thread
contention because each heap access must pass through thread-safe
interlocked operations. Applications can devote one heap to each
thread and eliminate the synchronization logic and thread
contention.
ATL simplifies the use of heaps through a series
of concrete heap implementations that wrap Windows API functions
and through an abstract interface that allows these implementations
to be used polymorphically in other ATL classes that use memory
resources.
The IAtlMemMgr
Interface
The atlmem.h header file expresses the
generic memory-management pattern through the definition of the
IAtlMemMgr interface.
__interface IAtlMemMgr {
public:
void* Allocate( size_t nBytes ) ;
void Free( void* p ) ;
void* Reallocate( void* p, size_t nBytes ) ;
size_t GetSize( void* p ) ;
};
The four simple functions
defined on this interface provide most of the dynamic memory
functionality required in typical applications. Allocate
reserves a contiguous region of space nBytes in size
within the heap. Free takes a pointer to a memory block
retrieved from Allocate and returns it to the heap so that
it will be available for future allocation requests. The
Reallocate method is useful when an allocated block is not
large enough to accommodate additional data and it is more
practical and/or efficient to grow the existing block than to
allocate a new, larger one and copy the contents. Finally,
GetSize accepts a pointer to a block obtained from
Allocate and returns the current size of the block in
bytes.
Many ATL classes are designed to support
pluggable heap implementations by performing all their
memory-management functions through an IAtlMemMgr
reference. Developers can provide custom implementations of
IAtlMemMgr and use them with ATL. This provides a great
deal of flexibility in optimizing the performance of these classes
to suit specific application requirements. ATL Server makes heavy
use of IAtlMemMgr in processing SOAP requests and in
stencil processing. Additionally, we've already seen how
CStringT allows developers to supply an
IAtlMemMgr implementation to optimize string-handling
performance.
The Memory Manager
Classes
Although it is useful to abstract memory
management behind an interface to facilitate custom heap
implementations, most applications don't need a high degree of
sophistication in these implementations to build efficient
components. Indeed, you can realize many of the benefits of
multiple heaps with simple heap implementations. To that end, ATL
provides five concrete implementations of IAtlMemMgr that
you can use as is in many circumstances.
CComHeap is defined in
atlcommem.h as follows:
class CComHeap :
public IAtlMemMgr {
// IAtlMemMgr
public:
virtual void* Allocate( size_t nBytes ) {
#ifdef _WIN64
if( nBytes > INT_MAX ) { return( NULL ); }
#endif
return( ::CoTaskMemAlloc( ULONG( nBytes ) ) );
}
virtual void Free( void* p ) {
::CoTaskMemFree( p );
}
virtual void* Reallocate( void* p, size_t nBytes ) {
#ifdef _WIN64
if( nBytes > INT_MAX ) { return( NULL ); }
#endif
return( ::CoTaskMemRealloc( p, ULONG( nBytes ) ) );
}
virtual size_t GetSize( void* p ) {
CComPtr< IMalloc > pMalloc;
::CoGetMalloc( 1, &pMalloc );
return( pMalloc->GetSize( p ) );
}
};
As
you can see, this class is merely a very thin wrapper on top of the
COM task allocator API functions. Allocate simply
delegates to CoTaskMemAlloc, and Free delegates
to CoTaskMemFree. In fact, all five of the stock memory
managers implement IAtlMemMgr in a similar manner; the
prime difference is the underlying functions to which the managers
delegate. Table 3.2
summarizes which heap-management functions are used for each of the
ATL memory managers.
Table 3.2. Heap Functions Used in ATL
Memory Managers
Memory Manager Class
|
Heap Functions Used
|
CComHeap
|
CoTaskMemAlloc, CoTaskMemFree,
CoTaskMemRealloc, IMalloc::GetSize
|
CCRTHeap
|
malloc, free, realloc, _msize
|
CLocalHeap
|
LocalAlloc, LocalFree, LocalReAlloc,
LocalSize
|
CGlobalHeap
|
GlobalAlloc, GlobalFree, GlobalReAlloc,
GlobalSize
|
CWin32Heap
|
HeapAlloc, HeapFree, HeapReAlloc,
HeapSize
|
The CCRTHeap uses memory from the CRT
heap, whereas CLocalHeap and CGlobalHeap both
allocate memory from the process heap. The LocalXXX and
GlobalXXX functions in the Win32 API exist now mostly for
backward compatibility. You shouldn't really use them in new code
anymore, so we don't discuss them further.
The CWin32Heap class is a bit different
from the other heap classes in a couple important respects. Whereas
the other memory managers allocate storage from the process heap,
CWin32Heap requires that a valid HANDLE to a heap
be created before using its IAtlMemMgr implementation.
This gives the developer a bit more control over the details of the
underlying heap that will be used, albeit with a bit more
complexity. CWin32Heap supplies three constructors for
initializing an instance:
CWin32Heap() : m_hHeap( NULL ), m_bOwnHeap( false ) { }
CWin32Heap( HANDLE hHeap ) :
m_hHeap( hHeap ),
m_bOwnHeap( false ) {
ATLASSERT( hHeap != NULL );
}
CWin32Heap( DWORD dwFlags, size_t nInitialSize,
size_t nMaxSize = 0 ) :
m_hHeap( NULL ),
m_bOwnHeap( true ) {
ATLASSERT( !(dwFlags&HEAP_GENERATE_EXCEPTIONS) );
m_hHeap = ::HeapCreate( dwFlags, nInitialSize, nMaxSize );
if( m_hHeap == NULL ) { AtlThrowLastWin32(); }
}
The first constructor initializes a
CWin32Heap instance with no associated heap. The second
constructor initializes the instance with a handle to an existing
heap obtained from a previous call to the Win32 HeapCreate
API. Note that the m_bOwnHeap member is set to
false in this case. This member tracks whether the
CWin32Heap instance owns the underlying heap. Thus, when
the second constructor is used, the caller is still responsible for
ultimately calling HeapDestroy to get rid of the heap
later. The third constructor is arguably the simplest to use
because it directly accepts the parameters required to create a
heap and invokes HeapCreate automatically. The
dwFlags parameter is a bit field that allows two different
flags to be set. One of the two flags,
HEAP_GENERATE_EXCEPTIONS, can be given to the underlying
HeapCreate call to indicate that the system should raise
an exception upon function failure. However, the code asserts if
this flag is specified because the ATL code base isn't prepared for
system exceptions to be thrown if an allocation fails. The other
flag, HEAP_NO_SERIALIZE, relates to the synchronization
options with heaps, discussed a bit earlier in this section. If
this flag is specified, the heap is not thread safe. This can improve performance
considerably because interlocked operations are no longer used to
gain access to the heap. However, it is
the programmer's responsibility to ensure that multiple threads
will not access a heap created with this flag set. Otherwise, heap
corruption is likely to occur. The nInitialSize
parameter indicates how much storage should be reserved when the
heap is created. You can use the nMaxSize parameter to
specify how large the heap should be allowed to grow.
CWin32Heap also defines Attach
and Detach operations to associate an existing heap with a
CWin32Heap instance:
void Attach( HANDLE hHeap, bool bTakeOwnership ) {
ATLASSERT( hHeap != NULL );
ATLASSERT( m_hHeap == NULL );
m_hHeap = hHeap;
m_bOwnHeap = bTakeOwnership;
}
HANDLE Detach() {
HANDLE hHeap;
hHeap = m_hHeap;
m_hHeap = NULL;
m_bOwnHeap = false;
return( hHeap );
}
Attach accepts a handle to a heap and a
Boolean flag indicating whether the caller is transferring
ownership of the heap to the CWin32Heap instance. This
governs whether the destructor will destroy the heap.
Detach simply surrenders ownership of the encapsulated
heap by flipping the m_bOwnHeap member to FALSE
and returning the handle to the caller. Note that Attach
simply overwrites the existing heap handle stored in the
CWin32Heap. If the class already held a non-NULL
HANDLE, there would be no way to free that heap after the
Attach is performed. As a result, you have a memory
leakand a really big one, at that. If you thought leaking memory
from an object was bad, trying leaking entire heaps at a time! You
might wonder at first why the Attach method doesn't simply
destroy the existing heap before overwriting the internal handle.
After all, CComVariant::Attach and
CComSafeArray::Attach were shown earlier clearing their
encapsulated data before attaching to a new instance. The
difference here is that even if the CWin32Heap instance
owns the heap (m_bOwnHeap is trUE), it has no
knowledge of what live objects out there have been allocated from
that heap. Blindly destroying the existing heap would yank memory
from any number of objects, which could be disastrous. You simply
have to be careful. Here's the kind of code you want to avoid:
// create an instance and allocate a heap
CWin32Heap heap(0, // no exceptions, use thread-safe access
4000, // initial size
0); // no max size => heap grows as needed
// manually create a second heap
HANDLE hHeap = ::HeapCreate(0, 5000, 0);
// this is gonna get you in a "heap" of trouble!
heap.Attach(hHeap, false /* same result if true */ );
Custom memory management commonly is used in
string processing. Applications that allocate, free, and resize
strings frequently can often tax memory managers and negatively
impact performance. Multithreaded applications that do a lot of
string processing can exhibit reduced performance because of thread
contention for heap allocation requests. Moreover, heaps that
service multithreaded applications can provide slower access
because synchronization locks of some sort must be employed to
ensure thread safety. One tactic to combat this is to provide a
per-thread heap so that no synchronization logic is needed and
thread contention does not occur.
We show an example of a specialized heap for
string allocations using CWin32Heap and ATL's new
CStencil class. This class is discussed in detail in later
chapters when we cover building web applications with ATL Server.
For now, recall from the discussion of web application development
in Chapter 1, "Hello,
ATL," that ATL produces web pages by processing stencil response
files and rendering HTML-based text responses. This involves a
great deal of string parsing and processing, and CStencil
bears a lot of this burden. Its constructor enables you to pass in
a custom memory manager to be used in all its string parsing. The
following code demonstrates how to create a per-thread heap manager
to be used for stencil processing:
DWORD g_dwTlsIndex; // holds thread-local storage slot index
// g_dwTlsIndex = ::TlsAlloc() performed in other
// initialization code
// Create a private heap for use on this thread
// only => no synchronized access
CWin32Heap* pHeap = new CWin32Heap(HEAP_NO_SERIALIZE, 50000);
// Store the heap pointer in this thread's TLS slot
::TlsSetValue(g_dwTlsIndex, reinterpret_cast<void*>(
static_cast<IAtlMemMgr*>(pHeap)));
// ...
// Retrieve the heap pointer from TLS
pHeap = (IAtlMemMgr*)::TlsGetValue(g_dwTlsIndex);
// Create a new CStencil instance that uses the private heap
CStencil* pStencil = new CStencil(pHeap);
Notice the extra layer of casting when storing
the heap pointer in the TLS slot. You need to hold on to the
original CWin32Heap pointer with the concrete type because
IAtlMemMgr doesn't have a virtual destructor. If you just
had an IAtlMemMgr* to call delete on, the actual
CWin32Heap destructor would not get called. That extra
layer of casting is to make sure that you get the correct interface
pointer converted to void* before storing it in the TLS.
It's probably not strictly necessary in the current version of ATL,
but if the heap implementation has multiple base classes, the cast
to void* could cause some serious trouble.
|