ATL Memory Managers

A Review of Windows Memory Management

Applications use memory for almost everything they do. In Windows, memory can be allocated from three principle places: the thread's stack, memory-mapped files, and heaps. Memory-mapped files are more specialized, so we don't discuss them further here. The stack is used to allocate local variables because their size is known at compile time and their allocation must be as efficient as possible. Allocating and deallocating storage from the stack involves merely incrementing and decrementing the stack pointer by the appropriate amount. Dynamic memory, however, is allocated and freed as the program runs, based on changing characteristics within the application. Instead of being allocated from the thread's stack, dynamic memory comes from pools of storage known as heaps. A heap is an independently managed block of memory that services dynamic memory allocation requests and reclaims memory that an application no longer uses. Typically, heaps expose APIs for creating the heap, destroying the heap, allocating a block of memory within the heap, and returning a block of memory to the heap. The precise algorithms employed in coordinating these tasks constitute what is commonly termed the heap manager. In general, heap managers implement various schemes for managing resources for specialized circumstances. The heap manager functions exposed for applications often reflect some of the differences that make one particular type of heap suitable over another for a particular circumstance.

In Windows, each process creates a default heap at initialization. Applications use Win32 functions such as HeapCreate, HeapAlloc, and HeapFree to manage the heap and blocks of data within the default heap. Because many Windows functions that can be called from multiple applications use this heap, the default heap is implemented to be thread safe. Access to the default heap is serialized so that multiple simultaneous threads accessing the heap will not corrupt it. Older versions of Windows used functions such as LocalAlloc and GlobalAlloc to manipulate the heap, but these functions are now deprecated. They run slower and offer fewer features than the HeapXXX suite.

Applications that link to the C-runtime library (which ATL projects now do by default) have access to another heap simply known as the CRT heap. The memory-management functions the CRT heap manager exposes are likely the most recognizable to the general C/C++ community because they are part of the C standard. With the CRT heap, functions such as malloc are used to obtain storage from the heap; free is used to return storage to the heap.

An application might need to use different heaps for various reasons, such as with specialized management requirements. For instance, COM introduces an additional set of complexities to the general problem of memory management. Memory addresses are allocated on a per-process basis, so a process cannot directly access data stored in another process. Yet COM allows data to be marshaled between processes. If a method call is remoted so that a client expects an [out] parameter from the object, the memory for that [out] parameter will be allocated in one process (the object's), and used and freed in another process (the client's). Clearly, a conventional heap has no way to straddle process boundaries to associate allocations in one process with free operations in another. The COM task allocator lives to provide this very service. Part of the COM programming conventions is that when allocating memory blocks that will be shared across a COM interface, that memory must be allocated by calling CoTaskMemAlloc and must be freed by the corresponding CoTaskMemFree. By agreeing on these standardized functions, the automatically generated proxy-stub code can properly allocate and free memory across COM boundaries. COM's remoting infrastructure does all the dirty work needed to create the illusion of a single heap that spans processes.

Several other reasons exist for managing memory with different heaps. Components that are allocated from separate heaps are better isolated from one another, which could make the heaps less susceptible to corruption. If objects will be accessed close together in timesay, within the same functionit is desirable for those objects to live close together in memory, which can result in fewer page faults and a marked improvement in overall performance. Some applications choose to implement custom, specialized memory managers that are tuned to specific requirements. Using separate heaps also could allow the application to avoid the overhead associated with synchronizing access to a single heap. As previously mentioned, a Win32 process's default heap is thread safe because it expects to be accessed simultaneously from multiple threads. This leads to thread contention because each heap access must pass through thread-safe interlocked operations. Applications can devote one heap to each thread and eliminate the synchronization logic and thread contention.

ATL simplifies the use of heaps through a series of concrete heap implementations that wrap Windows API functions and through an abstract interface that allows these implementations to be used polymorphically in other ATL classes that use memory resources.

The IAtlMemMgr Interface

The atlmem.h header file expresses the generic memory-management pattern through the definition of the IAtlMemMgr interface.

__interface IAtlMemMgr {                        
public:                                         
    void* Allocate( size_t nBytes ) ;           
    void Free( void* p ) ;                      
    void* Reallocate( void* p, size_t nBytes ) ;
    size_t GetSize( void* p ) ;                 
};

The four simple functions defined on this interface provide most of the dynamic memory functionality required in typical applications. Allocate reserves a contiguous region of space nBytes in size within the heap. Free takes a pointer to a memory block retrieved from Allocate and returns it to the heap so that it will be available for future allocation requests. The Reallocate method is useful when an allocated block is not large enough to accommodate additional data and it is more practical and/or efficient to grow the existing block than to allocate a new, larger one and copy the contents. Finally, GetSize accepts a pointer to a block obtained from Allocate and returns the current size of the block in bytes.

Many ATL classes are designed to support pluggable heap implementations by performing all their memory-management functions through an IAtlMemMgr reference. Developers can provide custom implementations of IAtlMemMgr and use them with ATL. This provides a great deal of flexibility in optimizing the performance of these classes to suit specific application requirements. ATL Server makes heavy use of IAtlMemMgr in processing SOAP requests and in stencil processing. Additionally, we've already seen how CStringT allows developers to supply an IAtlMemMgr implementation to optimize string-handling performance.

The Memory Manager Classes

Although it is useful to abstract memory management behind an interface to facilitate custom heap implementations, most applications don't need a high degree of sophistication in these implementations to build efficient components. Indeed, you can realize many of the benefits of multiple heaps with simple heap implementations. To that end, ATL provides five concrete implementations of IAtlMemMgr that you can use as is in many circumstances.

CComHeap is defined in atlcommem.h as follows:

class CComHeap :                                           
    public IAtlMemMgr {                                    
// IAtlMemMgr                                              
public:                                                    
    virtual void* Allocate( size_t nBytes ) {              
#ifdef _WIN64                                              
        if( nBytes > INT_MAX ) { return( NULL ); }         
#endif                                                     
        return( ::CoTaskMemAlloc( ULONG( nBytes ) ) );     
    }                                                      
    virtual void Free( void* p ) {                         
        ::CoTaskMemFree( p );                              
    }                                                      
    virtual void* Reallocate( void* p, size_t nBytes ) {   
#ifdef _WIN64                                              
        if( nBytes > INT_MAX ) { return( NULL ); }         
#endif                                                     
        return( ::CoTaskMemRealloc( p, ULONG( nBytes ) ) );
    }                                                      
    virtual size_t GetSize( void* p ) {                    
        CComPtr< IMalloc > pMalloc;                        
        ::CoGetMalloc( 1, &pMalloc );                      
        return( pMalloc->GetSize( p ) );                   
    }                                                      
};

As you can see, this class is merely a very thin wrapper on top of the COM task allocator API functions. Allocate simply delegates to CoTaskMemAlloc, and Free delegates to CoTaskMemFree. In fact, all five of the stock memory managers implement IAtlMemMgr in a similar manner; the prime difference is the underlying functions to which the managers delegate. Table 3.2 summarizes which heap-management functions are used for each of the ATL memory managers.

Table 3.2. Heap Functions Used in ATL Memory Managers

Memory Manager Class	Heap Functions Used
`CComHeap`	`CoTaskMemAlloc, CoTaskMemFree, CoTaskMemRealloc, IMalloc::GetSize`
`CCRTHeap`	`malloc, free, realloc, _msize`
`CLocalHeap`	`LocalAlloc, LocalFree, LocalReAlloc, LocalSize`
`CGlobalHeap`	`GlobalAlloc, GlobalFree, GlobalReAlloc, GlobalSize`
`CWin32Heap`	`HeapAlloc, HeapFree, HeapReAlloc, HeapSize`

The CCRTHeap uses memory from the CRT heap, whereas CLocalHeap and CGlobalHeap both allocate memory from the process heap. The LocalXXX and GlobalXXX functions in the Win32 API exist now mostly for backward compatibility. You shouldn't really use them in new code anymore, so we don't discuss them further.

The CWin32Heap class is a bit different from the other heap classes in a couple important respects. Whereas the other memory managers allocate storage from the process heap, CWin32Heap requires that a valid HANDLE to a heap be created before using its IAtlMemMgr implementation. This gives the developer a bit more control over the details of the underlying heap that will be used, albeit with a bit more complexity. CWin32Heap supplies three constructors for initializing an instance:

CWin32Heap()  :    m_hHeap( NULL ), m_bOwnHeap( false ) { }   
CWin32Heap( HANDLE hHeap )  :                                 
    m_hHeap( hHeap ),                                         
    m_bOwnHeap( false ) {                                     
    ATLASSERT( hHeap != NULL );                               
}                                                             
CWin32Heap( DWORD dwFlags, size_t nInitialSize,               
    size_t nMaxSize = 0 ) :                                   
    m_hHeap( NULL ),                                          
    m_bOwnHeap( true ) {                                      
    ATLASSERT( !(dwFlags&HEAP_GENERATE_EXCEPTIONS) );     
    m_hHeap = ::HeapCreate( dwFlags, nInitialSize, nMaxSize );
    if( m_hHeap == NULL ) { AtlThrowLastWin32(); }            
}

The first constructor initializes a CWin32Heap instance with no associated heap. The second constructor initializes the instance with a handle to an existing heap obtained from a previous call to the Win32 HeapCreate API. Note that the m_bOwnHeap member is set to false in this case. This member tracks whether the CWin32Heap instance owns the underlying heap. Thus, when the second constructor is used, the caller is still responsible for ultimately calling HeapDestroy to get rid of the heap later. The third constructor is arguably the simplest to use because it directly accepts the parameters required to create a heap and invokes HeapCreate automatically. The dwFlags parameter is a bit field that allows two different flags to be set. One of the two flags, HEAP_GENERATE_EXCEPTIONS, can be given to the underlying HeapCreate call to indicate that the system should raise an exception upon function failure. However, the code asserts if this flag is specified because the ATL code base isn't prepared for system exceptions to be thrown if an allocation fails. The other flag, HEAP_NO_SERIALIZE, relates to the synchronization options with heaps, discussed a bit earlier in this section. If this flag is specified, the heap is not thread safe. This can improve performance considerably because interlocked operations are no longer used to gain access to the heap. However, it is the programmer's responsibility to ensure that multiple threads will not access a heap created with this flag set. Otherwise, heap corruption is likely to occur. The nInitialSize parameter indicates how much storage should be reserved when the heap is created. You can use the nMaxSize parameter to specify how large the heap should be allowed to grow.

CWin32Heap also defines Attach and Detach operations to associate an existing heap with a CWin32Heap instance:

void Attach( HANDLE hHeap, bool bTakeOwnership ) {
    ATLASSERT( hHeap != NULL );                   
    ATLASSERT( m_hHeap == NULL );                 
    m_hHeap = hHeap;                              
    m_bOwnHeap = bTakeOwnership;                  
}                                                 
HANDLE Detach() {                                 
    HANDLE hHeap;                                 
                                                  
    hHeap = m_hHeap;                              
    m_hHeap = NULL;                               
    m_bOwnHeap = false;                           
                                                  
    return( hHeap );                              
}

Attach accepts a handle to a heap and a Boolean flag indicating whether the caller is transferring ownership of the heap to the CWin32Heap instance. This governs whether the destructor will destroy the heap. Detach simply surrenders ownership of the encapsulated heap by flipping the m_bOwnHeap member to FALSE and returning the handle to the caller. Note that Attach simply overwrites the existing heap handle stored in the CWin32Heap. If the class already held a non-NULL HANDLE, there would be no way to free that heap after the Attach is performed. As a result, you have a memory leakand a really big one, at that. If you thought leaking memory from an object was bad, trying leaking entire heaps at a time! You might wonder at first why the Attach method doesn't simply destroy the existing heap before overwriting the internal handle. After all, CComVariant::Attach and CComSafeArray::Attach were shown earlier clearing their encapsulated data before attaching to a new instance. The difference here is that even if the CWin32Heap instance owns the heap (m_bOwnHeap is trUE), it has no knowledge of what live objects out there have been allocated from that heap. Blindly destroying the existing heap would yank memory from any number of objects, which could be disastrous. You simply have to be careful. Here's the kind of code you want to avoid:

// create an instance and allocate a heap
CWin32Heap heap(0,        // no exceptions, use thread-safe access
                4000,     // initial size
                0);       // no max size => heap grows as needed

// manually create a second heap
HANDLE hHeap = ::HeapCreate(0, 5000, 0);

// this is gonna get you in a "heap" of trouble!
heap.Attach(hHeap, false /* same result if true */ );

Custom memory management commonly is used in string processing. Applications that allocate, free, and resize strings frequently can often tax memory managers and negatively impact performance. Multithreaded applications that do a lot of string processing can exhibit reduced performance because of thread contention for heap allocation requests. Moreover, heaps that service multithreaded applications can provide slower access because synchronization locks of some sort must be employed to ensure thread safety. One tactic to combat this is to provide a per-thread heap so that no synchronization logic is needed and thread contention does not occur.

We show an example of a specialized heap for string allocations using CWin32Heap and ATL's new CStencil class. This class is discussed in detail in later chapters when we cover building web applications with ATL Server. For now, recall from the discussion of web application development in Chapter 1, "Hello, ATL," that ATL produces web pages by processing stencil response files and rendering HTML-based text responses. This involves a great deal of string parsing and processing, and CStencil bears a lot of this burden. Its constructor enables you to pass in a custom memory manager to be used in all its string parsing. The following code demonstrates how to create a per-thread heap manager to be used for stencil processing:

DWORD g_dwTlsIndex;    // holds thread-local storage slot index
// g_dwTlsIndex = ::TlsAlloc() performed in other
// initialization code

// Create a private heap for use on this thread
// only => no synchronized access
CWin32Heap* pHeap = new CWin32Heap(HEAP_NO_SERIALIZE, 50000);

// Store the heap pointer in this thread's TLS slot
::TlsSetValue(g_dwTlsIndex, reinterpret_cast<void*>(
static_cast<IAtlMemMgr*>(pHeap)));

// ...

// Retrieve the heap pointer from TLS
pHeap = (IAtlMemMgr*)::TlsGetValue(g_dwTlsIndex);

// Create a new CStencil instance that uses the private heap
CStencil* pStencil = new CStencil(pHeap);

Notice the extra layer of casting when storing the heap pointer in the TLS slot. You need to hold on to the original CWin32Heap pointer with the concrete type because IAtlMemMgr doesn't have a virtual destructor. If you just had an IAtlMemMgr* to call delete on, the actual CWin32Heap destructor would not get called. That extra layer of casting is to make sure that you get the correct interface pointer converted to void* before storing it in the TLS. It's probably not strictly necessary in the current version of ATL, but if the heap implementation has multiple base classes, the cast to void* could cause some serious trouble.