The CString Class

CString Overview

For years now, ATL programmers have glared longingly over the shoulders of their MFC brethren slinging character data about in their programs with the grace and dexterity of Barishnikov himself. MFC developers have long enjoyed the ubiquitous CString class provided with the libraryso much so that when they ventured into previous versions of ATL, they often found themselves tempted to check that wizard option named Support MFC and suck in a 1MB library just to allow them to continue working with their bread-'n-butter string class. Sure, ATL programmers have CComBSTR, which is fine for code at the "edges" of a method's implementationthat is, either receiving a BSTR input parameter at the beginning of a method or returning some sort of BSTR output parameter at the end of a method. But compared to CString's extensive support for everything from sprintf-style formatting to search-and-replace, CComBSTR is woefully inadequate for any serious string processing. And, sure, ATL programmers have had STL's string<> template class for years, but it also falls short of CString in functionality. In addition, because it is a standard, platform-independent class, it can't possibly provide such useful functionality as integrating with the Windows resource architecture.

Well, the long wait is over: CString is available as of ATL 7. In fact, CString is a shared class between MFC and ATL, along with a number of other classes. You'll note that there are no longer separate \MFC\Include and \ATL\Include directories within the Visual Studio file hierarchy. Instead, both libraries maintain code in \ATLMFC\Include. I think it's extraordinarily insightful to examine just how and where the shared CString class is defined. First, all the header files are under a directory named \ATLMFC, not \MFCATL. CString used to be defined in afx.h, the prefix that has identified MFC from its earliest beginnings. Now the definition appears in a file that simply defines CString as a typedef to a template class called CStringT that does all the work. This template class is actually in the ATL namespace. That's rightone of the last bastions of MFC supremacy is now found under the ATL moniker.

CString Anatomy

Now that CString is template-based, it follows the general ATL design pattern of supporting pluggable functionality through template parameters that specialize in CString behavior. As the first sections of this chapter revealed, a number of different types of strings exist, with different mechanisms for manipulating them. Templates are very well suited to this kind of scenario, in which exposing flexibility is important. But usability is also important, so ATL uses a convenient combination of typedefs and default template parameters to simplify using CString.

Understanding what's under the covers of a CString instance is important in understanding not only how the methods and operators work, but also how CString can be extended and specialized to fit particular requirements or to facilitate certain optimizations. When you declare an instance of CString, you are actually instantiating a template class called CStringT. The file atlstr.h provides typedefs for CString, as well as for ANSI and Unicode versionsCStringA and CStringW, respectively.

typedef CStringT< wchar_t, StrTraitATL<
    wchar_t, ChTraitsCRT< wchar_t > > >
    CAtlStringW;                       
typedef CStringT< char, StrTraitATL<   
    char, ChTraitsCRT< char > > >      
    CAtlStringA;                       
typedef CStringT< TCHAR, StrTraitATL<  
    TCHAR, ChTraitsCRT< TCHAR > > >    
    CAtlString;                        
                                       
typedef CAtlStringW CStringW;          
typedef CAtlStringA CStringA;          
typedef CAtlString CString;

Strictly speaking, these typedefs are generated only if the ATL project is linking to the CRT, which ATL projects now do by default. Otherwise, the ChTraitsCRT template class is not used as a parameter to CStringT because it relies upon CRT functions to manage character-level manipulation.

Because the CStringT template class is the underlying class doing all the work, the remainder of the discussion is in terms of CStringT. This class is defined in cstringt.h as follows:

template< typename BaseType, class StringTraits >
class CStringT :                                 
    public CSimpleStringT< BaseType > {          
     // ...                                      
}

The behavior of the CStringT class is governed largely by three things: 1) the CSimpleStringT base class, 2) the BaseType template parameter, and 3) the StringTraits template parameter. CSimpleStringT provides a lot of basic string functionality that CStringT inherits. The BaseType template parameter is used to establish the underlying character data type of the string. The only state CStringT holds is a pointer to a character string of the type BaseType. This data is held in the m_pszData private member defined in the CSimpleStringT base class. The StringTraits parameter is an interesting one. This parameter establishes three things: 1) the module from which resource strings will be loaded, 2) the string manager used to allocate string data, and 3) the class that will provide low-level character manipulation. The atlstr.h header file contains the definition for this template class.

template< typename _BaseType = char, class StringIterator =      
                                        ChTraitsOS< _BaseType > >
class StrTraitATL : public StringIterator {                      
public:                                                          
    static HINSTANCE FindStringResourceInstance(UINT nID) {      
        return( AtlFindStringResourceInstance( nID ) );          
    }                                                            
                                                                 
    static IAtlStringMgr* GetDefaultManager() {                  
        return( &g_strmgr );                                     
    }                                                            
};

StrTraitATL derives from the StringIterator template parameter passed in. This parameter implements low-level character operations that CStringT ultimately will invoke when application code calls methods on instances of CString. Two choices of ATL-provided classes encapsulate the character traits: ChTraitsCRT and ChTraitsOS. The former uses functions that require you to link to the CRT in your project, so you would use it if you were already linking to the CRT. The latter does not require the CRT to implement its character-manipulation functions. Both expose a common set of functions that CStringT uses in its internal implementation.

Note that in the definition of the StrTraitATL, we see the first evidence of the extensibility of CStringT. The GetdefaultManager method returns a reference to a string manager via the IAtlStringMgr interface. This interface enforces a generic pattern for managing string memory. atlsimpstr.h provides the definition for this interface.

__interface IAtlStringMgr {                                  
public:                                                      
    CStringData* Allocate( int nAllocLength, int nCharSize );
    void Free( CStringData* pData );                         
    CStringData* Reallocate( CStringData* pData,             
        int nAllocLength, int nCharSize );                   
                                                             
    CStringData* GetNilString();                             
    IAtlStringMgr* Clone();                                  
};

ATL supplies a default string manager that is used if the user does not specify another. This default string manager is a concrete class called CAtlStringMgr that implements IAtlStringMgr. Abstracting string management into a separate class enables you to customize the behavior of the string-management functions to suit specific application requirements. Two mechanisms exist for customizing string management for CStringT. The first mechanism involves merely using CAtlStringMgr with a specific memory manager. Chapter 3, "ATL Smart Types," discusses the IAtlMemMgr interface, a generic interface that encapsulates heap memory management. Associating a memory manager with CAtlStringMgr is as simple as passing a pointer to the memory manager to the CAtlStringMgr constructor. CStringT must be instructed to use this CAtlStringMgr in its internal implementation by passing the string manager pointer to the CStringT constructor. ATL provides five built-in heap managers that implement IAtlMemMgr. We use CWin32Heap to demonstrate how to use an alternate memory manager with CStringT.

// create a thread-safe process heap with zero initial size
// and no max size
// constructor parameters are explained later in this chapter
CWin32Heap heap(0, 0, 0);

// create a string manager that uses this memory manager
CAtlStringMgr strMgr(&heap);

// create a CString instance that uses this string manager
CString str(&strMgr);

// ... perform some string operations as usual

If you want more control over the string-management functions, you can supply your own custom string manager that fully implements IAtlStringMgr. Instead of passing a pointer to CAtlStringMgr to the CString constructor, as in the previous code, you would simply pass a pointer to your custom IAtlStringMgr implementation. This custom string manager might use one of the existing memory managers or a custom implementation of IAtlMemMgr. Additionally, a custom string manager might want to enforce a different buffer-sharing policy than CAtlStringMgr's default copy-on-write policy. Copy-on-write allows multiple CStringT instances to read the same string memory, but a duplicate is created before any writes to the buffer are performed.

Of course, the simplest thing to do is to use the defaults that ATL chooses when you use a simple CString declaration, as in the following:

// declare an empty CString instance
CString str;

With this declaration, ATL will use CAtlStringMgr to manage the string data. CAtlStringMgr will use the built-in CWin32Heap heap manager for supplying string data storage.

Constructors

CStringT provides 19 different constructors, although one of the constructors is compiled into the class definition only if you are building a managed C++ project for the .NET platform. These types of ATL specializations are not discussed in this book. In general, however, the large number of constructors present represents the various different sources of string data with which a CString instance can be initialized, along with the additional options for supplying alternate string managers. We examine these constructors in related groups.

Before going further into the various methods, let's look at some of the notational shortcuts that CStringT uses in its method signatures. To properly understand even the method declarations with CStringT, you must be comfortable with the typedefs used to represent the character types in CStringT. Because CStringT uses template parameters to represent the base character type, the syntax for expressing the various allowed character types can become cumbersome or unclear in places. For instance, when you declare a CStringW, you create an instance of CStringT that encapsulates a series of wchar_t characters. From the definition of the CStringT template class, you can easily see that the BaseType template parameter can be used in method signatures that need to specify a wchar_t type parameterbut how would you specify methods that need to accept a char type parameter? Certainly, I need to be able to append char strings to a wchar_t-based CString. Conversely, I must have the ability to append wchar_t strings to a char-based CString. Yet I have only one template class in which to accomplish all this. CStringT provides six type definitions to deal with this syntactic dichotomy. They might seem somewhat arbitrary at first, but you'll see as we look closer into CStringT that their use actually makes a lot of sense. Table 2.3 summarizes these typedefs.

Table 2.3. CStringT Character Traits Type Definitions

Typedef	BaseType is `char`	BaseType is `wchar_t`	Meaning
`XCHAR`	`char`	`wchar_t`	Single character of the same type as the `CStringT` instance
`PXSTR`	`LPSTR`	`LPWSTR`	Pointer to character string of the same type as `CStringT` instance
`PCXSTR`	`LPCSTR`	`LPCWSTR`	Pointer to constant character string of the same type as the `CStringT` instance
`YCHAR`	`wchar_t`	`Char`	Single character of the opposite type as the `CStringT` instance
`PYSTR`	`LPWSTR`	`LPSTR`	Pointer to character string of the opposite type as `CStringT` instance
`PCYSTR`	`LPCWSTR`	`LPCSTR`	Pointer to constant character string of the opposite type as the `CStringT` instance

Two constructors enable you to initialize a CString to an empty string:

CStringT();                                    
explicit CStringT( IAtlStringMgr* pStringMgr );

Recall that the data for the CString is kept in the m_pszData data member. These constructors simply initialize the value of this member to be either a NUL character or two NUL characters if the BaseType is wchar_t. The second constructor accepts a pointer to a string manager to use with this CStringT instance. As stated previously, if the first constructor is used, the CStringT instance will use the default string manager CAtlStringMgr, which relies upon an underlying CWin32Heap heap manager to allocate storage from the process heap.

The next two constructors provide two different copy constructors that enable you to initialize a new instance from an existing CStringT or from an existing CSimpleStringT.

CStringT( const CStringT& strSrc );         
CStringT( const CThisSimpleString& strSrc );

The second constructor accepts a CThisSimpleString reference, but this is simply a typedef to CSimpleString<BaseType>. Exactly what these copy constructors do depends upon the policy established by the string manager that is associated with the CStringT instance. Recall that if no string manager is specified, such as with the constructor shown previously that accepts an IAtlStringMgr pointer, CAtlStringMgr will be used to manage memory allocation for the instance's string data. This default string manager implements a copy-on-write policy that allows multiple CStringT instances to share a string buffer for reading, but automatically creates a copy of the buffer whenever another CStringT instance tries to perform a write operation. The following code demonstrates how these copy semantics work in practice:

// "Fred" memcpy'd into strOrig buffer
CString strOrig("Fred");
// str1 points to strOrig buffer (no memcpy)
CString str1(strOrig);
// str2 points to strOrig buffer (no memcpy)
CString str2(str1);
// str3 points to strOrig buffer (no memcpy)
CString str3(str2);
// new buffer allocated for str2
// "John" memcpy'd into str2 buffer
str2 = "John";

As the comments indicate, CAtlStringMgr creates no additional copies of the internal string buffer until a write operation is performed with the assignment statement of str2. The storage to hold the new data in str2 is obtained from CAtlStringMgr. If we had specified another custom string manager to use via a constructor, that implementation would have determined how and when data is allocated. Actually, CAtlStringMgr simply increments str2's buffer pointer to "allocate" memory within its internal heap. As long as there is room in the CAtlStringMgr's heap, no expansion of the heap is required and the string allocation is fast and efficient.

Several constructors accept a pointer to a character string of the same type as the CStringT instancethat is, a character string of type BaseType.

CStringT( const XCHAR* pszSrc );                                     
CStringT( const XCHAR* pch, int nLength );                           
CStringT( const XCHAR* pch, int nLength, IAtlStringMgr* pStringMgr );

The first constructor should be used when the character string provided is NUL terminated. CStringT determines the size of the buffer needed by simply looking for the terminating NUL. However, the second and third forms of the constructor can accept an array of characters that is not NUL terminated. In this case, the length of the character array (in characters, not bytes), not including the terminating NUL that will be added, must be provided. You can improperly initialize your CString if you don't feed these constructors the proper length or if you use the first form with a string that's not NUL terminated. For instance:

char rg[4] = { 'F', 'r', 'e', 'd' };

// Wrong! Wrong!  rg not NULL-terminated
// str1 contains junk
CString str1(rg);

// ok, length provided to invoke correct ctor
CString str2(rg, 4);

char* sz = "Fred";
// ok, sz NULL-terminated => no length parameter needed
CString str3(sz);

You can also initialize a CStringT instance with a character string of the opposite type of BaseType.

CSTRING_EXPLICIT CStringT( const YCHAR* pszSrc );
CStringT( const YCHAR* pch, int nLength );       
CStringT( const YCHAR* pch, int nLength,         
    IAtlStringMgr* pStringMgr );

These constructors work in an analogous manner to the XCHAR-based constructors just shown. The difference is that these constructors convert the source string to the BaseType declared for the CStringT instance, if it is required. For example, if the BaseType is wchar_t, such as when you explicitly declare a CStringW instance, and you pass the constructor a char*, CStringT will use the Windows API function MultiByteToWideChar to convert the source string.

CStringT( LPCSTR pszSrc, IAtlStringMgr* pStringMgr ); 
CStringT( LPCWSTR pszSrc, IAtlStringMgr* pStringMgr );

You can also initialize a CStringT instance with a repeated series of characters using the following constructors:

CSTRING_EXPLICIT CStringT( char ch, int nLength = 1 );   
CSTRING_EXPLICIT CStringT( wchar_t ch, int nLength = 1 );

Here, the nLength specifies the number of copies of the ch character to replicate in the CStringT instance, as in the following:

CString str('z', 5); // str contains "zzzzz"

CStringT also enables you to initialize a CStringT instance from an unsigned char string, which is how MBCS strings are represented.

CSTRING_EXPLICIT CStringT( const unsigned char* pszSrc );
CStringT( const unsigned char* pszSrc,                   
    IAtlStringMgr* pStringMgr );

Finally, CStringT provides two constructors that accept a VARIANT as the string source:

CStringT( const VARIANT& varSrc );                           
CStringT( const VARIANT& varSrc, IAtlStringMgr* pStringMgr );

Internally, CStringT uses the COM API function VariantChangeType to attempt to convert varSrc to a BSTR. VariantChangeType handles simple conversion between basic types, such as numeric-to-string conversions. However, the varSrc VARIANT cannot contain a complex type, such as an array of double. In addition, these two constructors truncate a BSTR that contains an embedded NUL.

// BSTR bstr contains "This is part one\0and here's part two"
VARIANT var;
var.vt = VT_BSTR;
var.bstrVal = bstr;
// var contains "This is part one\0 and here's part two"
CString str(var);   // str contains "This is part one"

Assignment

CStringT defines eight assignment operators. The first two enable you to initialize an instance from an existing CStringT or CSimpleStringT:

CStringT& operator=( const CStringT& strSrc );         
CStringT& operator=( const CThisSimpleString& strSrc );

With both of these constructors, the copy policy of the string manager in use dictates how these operators behave. By default, CStringT instances use the copy-on-write policy of the CAtlStringMgr class. See the previous discussion of the CStringT constructors for more information.

The next two assignment operators accept pointers to string literals of the same type as the CStringT instance or of the opposite type, as indicated by the PCXSTR and PCYSTR source string types:

CStringT& operator=( PCXSTR pszSrc );
CStringT& operator=( PCYSTR pszSrc );

Of course, no conversions are necessary with the first operator. However, CStringT invokes the appropriate Win32 conversion function when the second operator is used, as in the following code:

CStringA str;         // declare an empty ANSI CString
str = L"Hello World"; // operator=(PCYSTR) invoked
                      // characters converted via
                      // WideCharToMultiByte

CStringT also enables you to assign instances to individual characters. In these cases, CStringT actually creates a string of one character and appends either a 1- or 2-byte NUL terminator, depending on the type of character specified and the BaseType of the CStringT instance. These operators then delegate to either operator=(PCXSTR) or operator=(PCYSTR) so that any necessary conversions are performed.

CStringT& operator=( char ch );   
CStringT& operator=( wchar_t ch );

Yet another CStringT assignment operator accepts an unsigned char* as its argument to support MBCS strings. This operator simply casts pszSrc to a char* and invokes either operator=(PCXSTR) or operator=(PCYSTR):

CStringT& operator=( const unsigned char* pszSrc );

Finally, instances of CStringT can be assigned to VARIANT types. The use and behavior here are identical to that described previously for the corresponding CStringT constructor:

CStringT& operator=( const VARIANT& var );

String Concatenation Using CString

CStringT defines eight operators used to append string data to the end of an existing string buffer. In all cases, storage for the new data appended is allocated using the underlying string manager and its encapsulated heap. By default, this means that CAtlStringMgr is employed; its underlying CWin32Heap instance will be used to invoke the Win32 HeapReAlloc API function as necessary to grow the CStringT buffer to accommodate the data appended by these operators.

CStringT& operator+=( const CThisSimpleString& str );
CStringT& operator+=( PCXSTR pszSrc );               
CStringT& operator+=( PCYSTR pszSrc );               
template< int t_nSize >                              
CStringT& operator+=( const CStaticString<           
    XCHAR, t_nSize >& strSrc );                      
CStringT& operator+=( char ch );                     
CStringT& operator+=( unsigned char ch );            
CStringT& operator+=( wchar_t ch );                  
CStringT& operator+=( const VARIANT& var );

The first operator accepts an existing CStringT instance, and two operators accept PCXSTR strings or PCYSTR strings. Three other operators enable you to append individual characters to an existing CStringT. You can append a char, wchar_t, or unsigned char. One operator enables you to append the string contained in an instance of CStaticString. You can use this template class to efficiently store immutable string data; it performs no copying of the data with which it is initialized and merely serves as a convenient container for a string constant. Finally, you can append a VARIANT to an existing CStringT instance. As with the VARIANT constructor and assignment operator discussed previously, this operator relies upon VariantChangeType to convert the underlying VARIANT data into a BSTR. To the compiler, a BSTR looks just like an OLECHAR*, so this operator will ultimately end up calling either operator+=(PCXSTR) or operator+=(PCYSTR), depending on the BaseType of the CStringT instance. The same issues with embedded NULs in the source BSTR that we discussed earlier in the "Assignment" section apply here.

Three overloads of operator+() enable you to concatenate multiple strings conveniently.

friend CSimpleStringT operator+( 
    const CSimpleStringT& str1,  
    const CSimpleStringT& str2 );
friend CSimpleStringT operator+( 
    const CSimpleStringT& str1,  
    PCXSTR psz2 );               
friend CSimpleStringT operator+( 
    PCXSTR psz1,                 
    const CSimpleStringT& str2 );

These operators are invoked when you write code such as the following:

CString str1("Every good "); // str1: "Every good"
CString str2("boy does ");   // str2: "boy does "
CString str3;                // str3: empty
str3 = str1 + str3 + "fine"; // str3: "Every good boy does fine"

String concatenation is also supported through several Append methods. Four of these methods are defined on the CSimpleStringT base class and actually do the real work for the operators just discussed. Indeed, the only additional functionality offered by these four Append methods over the operators appears in the overload that accepts an nLength parameter. This enables you to append only a portion of an existing string. If you specify an nLength greater than the length of the source string, space will be allocated to accommodate nLength characters. However, the resulting CStringT data will be NUL terminated in the same place as pszSrc.

void Append( PCXSTR pszSrc );               
void Append( PCXSTR pszSrc, int nLength );  
void AppendChar( XCHAR ch );                
void Append( const CSimpleStringT& strSrc );

Three additional methods defined on CStringT enable you to append formatted strings to existing CStringT instances. Formatted strings are discussed more later in this section when we cover CStringT's Format operation. In short, these types of operations enable you to employ sprintf-style formatting to CStringT instances. The three methods shown here differ only from FormatMessage in that the CStringT instance is appended with the constructed string instead of being overwritten by it.

void __cdecl AppendFormat( UINT nFormatID, ... );    
void __cdecl AppendFormat( PCXSTR pszFormat, ... );  
void AppendFormatV( PCXSTR pszFormat, va_list args );

Character Case Conversion

Two CStringT methods support case conversion: MakeUpper and MakeLower.

CStringT& MakeUpper() {                        
    int nLength = GetLength();                 
    PXSTR pszBuffer = GetBuffer( nLength );    
    StringTraits::StringUppercase( pszBuffer );
    ReleaseBufferSetLength( nLength );         
                                               
    return( *this );                           
}                                              
                                        
CStringT& MakeLower() {                        
    int nLength = GetLength();                 
    PXSTR pszBuffer = GetBuffer( nLength );    
    StringTraits::StringLowercase( pszBuffer );
    ReleaseBufferSetLength( nLength );         
                                               
    return( *this );                           
}

Both of these methods delegate their work to the ChTraitsOS or ChTraitsCRT class, depending on which of these was specified as the template parameter when the CStringT instance was declared. Simply instantiating a variable of type CString uses the default character traits class supplied in the typedef for CString. If the preprocessor symbol _ATL_CSTRING_NO_CRT is defined, the ChTraitsOS class is used; and the Win32 functions CharLower and CharUpper are invoked to perform the conversion. If _ATL_CSTRING_NO_CRT is not defined, the ChTraitsCRT class is used by default, and it uses the appropriate CRT function: _mbslwr, _mbsupr, _wcslwr, or _wcsupr.

CString Comparison Operators

CString defines a whole slew of comparison operators (that's a metric slew, not an imperial slew). Seven versions of operator== enable you to compare CStringT instances with other instances, with ANSI and Unicode string literals, and with individual characters.

friend bool operator==( const CStringT& str1,               
    const CStringT& str2 );                                 
friend bool operator==( const CStringT& str1, PCXSTR psz2 );
friend bool operator==( PCXSTR psz1, const CStringT& str2 );
friend bool operator==( const CStringT& str1, PCYSTR psz2 );
friend bool operator==( PCYSTR psz1, const CStringT& str2 );
friend bool operator==( XCHAR ch1, const CStringT& str2 );  
friend bool operator==( const CStringT& str1, XCHAR ch2 );

As you might expect, a corresponding set of overloads for operator!= is also provided.

friend bool operator!=( const CStringT& str1,               
    const CStringT& str2 );                                 
friend bool operator!=( const CStringT& str1, PCXSTR psz2 );
friend bool operator!=( PCXSTR psz1, const CStringT& str2 );
friend bool operator!=( const CStringT& str1, PCYSTR psz2 );
friend bool operator!=( PCYSTR psz1, const CStringT& str2 );
friend bool operator!=( XCHAR ch1, const CStringT& str2 );  
friend bool operator!=( const CStringT& str1, XCHAR ch2 );

And, of course, a full battalion of relational comparison operators is available in CStringT.

friend bool operator<( const CStringT& str1,                
    const CStringT& str2 );                                 
friend bool operator<( const CStringT& str1, PCXSTR psz2 ); 
friend bool operator<( PCXSTR psz1, const CStringT& str2 ); 
friend bool operator>( const CStringT& str1,                
    const CStringT& str2 );                                 
friend bool operator>( const CStringT& str1, PCXSTR psz2 ); 
friend bool operator>( PCXSTR psz1, const CStringT& str2 ); 
friend bool operator<=( const CStringT& str1,               
    const CStringT& str2 );                                 
friend bool operator<=( const CStringT& str1, PCXSTR psz2 );
friend bool operator<=( PCXSTR psz1, const CStringT& str2 );
friend bool operator>=( const CStringT& str1,               
    const CStringT& str2 );                                 
friend bool operator>=( const CStringT& str1, PCXSTR psz2 );
friend bool operator>=( PCXSTR psz1, const CStringT& str2 );

All the operators use the same method to perform the actual comparison: CStringT::Compare. A brief inspection of the operator= overload that takes two CStringT instances reveals how this is accomplished:

friend bool operator==( const CStringT& str1,
    const CStringT& str2 ) {                 
    return( str1.Compare( str2 ) == 0 );     
}

Similarly, the same overload for operator!= is defined as follows:

friend bool operator!=( const CStringT& str1,
    const CStringT& str2 ) {                 
    return( str1.Compare( str2 ) != 0 );     
}

The relational operators use Compare like this:

friend bool operator<( const CStringT& str1,
    const CStringT& str2 ) {                
    return( str1.Compare( str2 ) < 0 );     
}

Compare returns -1 if str1 is lexicographically (say that ten times fast while standing on your head) less than str2, and 1 if str1 is lexicographically greater than str1. Strings are compared character by character until an inequality occurs or the end of one of the strings is reached. If no inequalities are detected and the strings are the same length, they are considered equal. Compare returns 0 in this case. If an inequality is found between two characters, the result of a lexical comparison between the two characters is returned as the result of the string comparison. If the characters in the strings are the same except that one string is longer, the shorter string is considered to be less than the longer string. It is important to note that all these comparisons are case-sensitive. If you want to perform noncase-sensitive comparisons, you must resort to using the CompareNoCase method directly, as discussed in a moment.

As with many of the character-level operations invoked by various CStringT methods and operators, the character traits class does the real heavy lifting. The CStringT::Compare method delegates to either ChTraitsOS or ChTraitsCRT, as discussed previously.

int Compare( PCXSTR psz ) const {                             
    ATLASSERT( AtlIsValidString( psz ) );                     
    return( StringTraits::StringCompare( GetString(), psz ) );
}                                                             
                                                              
int CompareNoCase( PCXSTR psz ) const {                       
    ATLASSERT( AtlIsValidString( psz ) );                     
    return( StringTraits::StringCompareIgnore(                
        GetString(), psz ) );                                 
}

Assuming that CString is used to declare the instance and the project defaults are in use (_ATL_CSTRING_NO_CRT is not defined), the Compare method delegates to ChTraitsCRT::StringCompare. This function uses one of the CRT functions lstrcmpA or wcscmp. Correspondingly, CompareNoCase invokes either lstrcmpiA or _wcsicmp.

Two additional comparison methods provide the same functionality as Compare and CompareNoCase, except that they perform the comparison using language rules. The CRT functions underlying these methods are _mbscoll and _mbsicoll, or their Unicode equivalents, depending again on the underlying character type of the CStringT.

int Collate( PCXSTR psz ) const      
int CollateNoCase( PCXSTR psz ) const

One final operator that bears mentioning is operator[]. This operator enables you to use convenient arraylike syntax to access individual characters in the CStringT string buffer. This operator is defined on the CSimpleStringT base class as follows:

XCHAR operator[]( int iChar ) const {               
ATLASSERT( (iChar >= 0) && (iChar <= GetLength()) );
return( m_pszData[iChar] );                         
}

This function merely does some simple bounds checking (note that you can index the NUL terminator if you want) and then returns the character located at the specified index. This enables you to write code like the following:

CString str("ATL Internals");
char c1 = str[2];    // 'L'
char c2 = str[5];    // 'n'
char c3 = str[13];   // '\0'

CString Operations

CStringT instances can be manipulated and searched in a variety of ways. This section briefly presents the methods CStringT exposes for performing various types of operations. Three methods are designed to facilitate searching for strings and characters within a CStringT instance.

int Find( XCHAR ch, int iStart = 0 ) const     
int Find( PCXSTR pszSub, int iStart = 0 ) const
int FindOneOf( PCXSTR pszCharSet ) const       
int ReverseFind( XCHAR ch ) const

The first version of Find accepts a single character of BaseType and returns the zero-based index of the first occurrence of ch within the CStringT instance. Find starts the search at the index specified by iStart. If the character is not found, -1 is returned. The second version of Find accepts a string of characters and returns either the index of the first character of pszSub within the CStringT or -1 if pszSub does not occur in its entirety within the instance. As with many character-level operations, the character traits class performs the real work. With ChTraitsCRT in use, the first two versions of Find delegate ultimately to the CRT functions _mbschr and _mbsstr, respectively. The FindOneOf method looks for the first occurrence of any character within the pszCharSet parameter. This method invokes the CRT function _mbspbrk to do the search. Finally, the ReverseFind method operates similarly to Find, except that it starts its search at the end of the CStringT and looks "backward." Note that all these operations are case-sensitive. The following examples demonstrate the use of these search operations.

CString str("Show me the money!");

int n = str.Find('o');      // n = 2
n = str.Find('O');          // n = -1, case-sensitivity
n = str.ReverseFind('o');   // n = 13, 'o' in "money" found
                            // first
n = str.Find("the");        // n = 8
n = str.FindOneOf("aeiou"); // n = 2
n = str.Find('o', 4);       // n = 13, started search after
                            // first 'o'

Nine different trim functions enable you to remove characters from the beginning and or end of a CStringT. The first trim function removes all leading and trailing whitespace characters from the string. The second overload of trim accepts a character and removes all leading and trailing instances of chTarget from the string; the third overload of trim removes leading and trailing occurrences of any character in the pszTargets string parameter. The three overloads for trimLeft behave similarly to trim, except that they remove the desired characters only from the beginning of the string. As you might guess, trimRight removes only trailing instances of the specified characters.

CStringT& Trim()                        
CStringT& Trim( XCHAR chTarget )        
CStringT& Trim( PCXSTR pszTargets )     
CStringT& TrimLeft()                    
CStringT& TrimLeft( XCHAR chTarget )    
CStringT& TrimLeft( PCXSTR pszTargets ) 
CStringT& TrimRight()                   
CStringT& TrimRight( XCHAR chTarget )   
CStringT& TrimRight( PCXSTR pszTargets )

CStringT provides two useful functions for extracting characters from the encapsulated string:

CStringT SpanIncluding( PCXSTR pszCharSet ) const
CStringT SpanExcluding( PCXSTR pszCharSet ) const

SpanIncluding starts from the beginning of the CStringT data and returns a new CStringT instance that contains all the characters in the CStringT that are included in the pszCharSet string parameter. If no characters in pszCharSet are found, an empty CStringT is returned. Conversely, SpanExcluding returns a new CStringT that contains all the characters in the original CStringT, up to the first one in pszCharSet. In this case, if no character in pszCharSet is found, the entire original string is returned.

You can insert individual characters or entire strings into a CStringT instance using the overloaded Insert method:

int Insert( int iIndex, PCXSTR psz )
int Insert( int iIndex, XCHAR ch )

These methods insert the specified character or string into the CStringT instance starting at iIndex. The string manager associated with the CStringT allocates additional storage to accommodate the new data. Similarly, you can delete a character or series of characters from a string using either the Delete or Remove methods:

int Delete( int iIndex, int nCount = 1 )
int Remove( XCHAR chRemove )

Delete removes from the CStringT nCount characters starting at iIndex. Remove deletes all occurrences of the single character specified by chRemove.

CString str("That's a spicy meatball!");
str.Remove('T');    // str contains "hat's a spicy meatball!"
str.Remove('a');    // str contains "ht's spicy metbll!"

Individual characters or strings can be replaced using the overloaded Replace method:

int Replace( XCHAR chOld, XCHAR chNew )    
int Replace( PCXSTR pszOld, PCXSTR pszNew )

These methods search the CStringT instance for every occurrence of the specified character or string and replace each occurrence with the new character or string provided. The methods return either the number of replacements performed or -1 if no occurrences were found.

You can extract substrings of a CStringT using the Left, Mid, and Right functions:

CStringT Left( int nCount ) const           
CStringT Mid( int iFirst ) const            
CStringT Mid( int iFirst, int nCount ) const
CStringT Right( int nCount ) const

These functions are quite simple. Left returns in a new CStringT instance the first nCount characters of the original CStringT. Mid has two overloads. The first returns a new CStringT instance that contains all characters in the original starting at iFirst and continuing to the end. The second overload of Mid accepts an nCount parameter so that only the specified number of characters starting at iFirst are returned in the new CStringT. Finally, Right returns the rightmost nCount characters of the CStringT instance.

CStringT's MakeReverse method enables you to reverse the characters in a CStringT:

CStringT& MakeReverse();                              

CString str("Let's do some ATL");
str.MakeReverse(); // str contains "LTA emos od s'teL"

Tokenize is a very useful method for breaking a CStringT into tokens separated by user-specified delimiters:

CStringT Tokenize( PCXSTR pszTokens, int& iStart ) const

The pszTokens parameter can include any number of characters that will be interpreted as delimiters between tokens. The iStart parameter specifies the starting index of the tokenization process. Note that this parameter is passed by reference so that the Tokenize implementation can update its value to the index of the first character following a delimiter. The function returns a CStringT instance containing the string token found. When no more tokens are found, the function returns an empty CStringT and iStart is set to -1. Tokenize is typically used in code like the following:

CString str("Name=Jenny; Ph: 867-5309");
CString tok;
int nPos = 0;
LPCSTR pszDelims = "; =:-";
tok = str.Tokenize(pszDelims, nPos);
while (tok != "") {
printf("Found token: %s\n", tok);
    tok = str.Tokenize(pszDelims, nPos);
}
// Prints the following:
// Found token: Name
// Found token: Jenny
// Found token: Ph
// Found token: 867
// Found token: 5309

Three methods enable you to populate a CStringT with string data embedded in the component DLL (or EXE) as a Windows resource:

BOOL LoadString( UINT nID )                     
BOOL LoadString( HINSTANCE hInstance, UINT nID )
BOOL LoadString( HINSTANCE hInstance, UINT nID, 
    WORD wLanguageID )

The first overload retrieves the string from the module containing the calling code and stores it in CStringT. The second and third overloads enable you to explicitly pass in a handle to the module from which the resource string should be loaded. Additionally, the third overload enables you to load a string in a specific language by specifying the LANGID via the wLanguageID parameter. The function returns trUE if the specified resource could be loaded into the CStringT instance; otherwise, it returns FALSE.

CStringT also provides a very thin wrapper function on top of the Win32 function GetEnvironmentVariable:

BOOL GetEnvironmentVariable( PCXSTR pszVar )

With this simple function, you can retrieve the value of the environment variable indicated by pszVar and store it in the CStringT instance. The functions return TRUE if it succeeded and FALSE otherwise.

Formatted Data

One of the most useful features of CStringT is its capability to construct formatted strings using sprintf-style format specifiers. CStringT exposes four methods for building formatted string data. The first two methods wrap underlying calls to the CRT function vsprintf or vswprintf, depending on whether the CStringT's BaseType is char or wchar_t.

void __cdecl Format( PCXSTR pszFormat, ... );
void __cdecl Format( UINT nFormatID, ... );

The first overload for the Format method accepts a format string directly. The second overload retrieves the format string from the module's string table by looking up the resource ID nFormatID.

Two other closely related methods enable you to build formatted strings with CStringT instances. These methods wrap the Win32 API function FormatMessage:

void __cdecl FormatMessage( PCXSTR pszFormat, ... );
void __cdecl FormatMessage( UINT nFormatID, ... );

As with the Format methods, FormatMessage enables you to directly specify the format string by using the first overload or to load it from the module's string table using the second overload. It is important to note that the format strings allowed for Format and FormatMessage are different. Format uses the format strings vsprintf allows; FormatMessage uses the format strings the Win32 function FormatMessage allows. The exact syntax and semantics for the various format specifiers allowed are well documented in the online documentation, so this is not repeated here.

You use these methods in code like the following:

CString strFirst = "John";
CString strLast = "Doe";
CString str;

// str will contain "Doe, John: Age = 45"
str.Format("%s, %s: Age = %d", strLast, strFirst, 45);

Working with BSTRs and CString

You've seen that CStringT is great for manipulating char or wchar_t strings. Indeed, all the operations we've presented so far operate in terms of these two fundamental character types. However, we're going to be using ATL to build COM components, and that means we'll often be dealing with Automation types such as BSTR. So, we must have a convenient mechanism for returning a BSTR from a method while doing all the processing with our powerful CStringT class. As it happens, CStringT supplies two methods for precisely that purpose:

BSTR AllocSysString() const {                                   
    BSTR bstrResult = StringTraits::AllocSysString( GetString(),
        GetLength() );                                          
    if( bstrResult == NULL ) {                                  
        ThrowMemoryException();                                 
    }                                                           
                                                         
    return( bstrResult );                                       
}                                                               
                                                                
BSTR SetSysString( BSTR* pbstr ) const {                        
    ATLASSERT( AtlIsValidAddress( pbstr, sizeof( BSTR ) ) );    
                                                                
    if( !StringTraits::ReAllocSysString( GetString(), pbstr,    
        GetLength() ) ) {                                       
        ThrowMemoryException();                                 
    }                                                           
                                                                
    ATLASSERT( *pbstr != NULL );                                
    return( *pbstr );                                           
}

AllocSysString allocates a BSTR and copies the CStringT contents into it. CStringT delegates this work to the character traits class, which ultimately uses the COM API function SysAllocString. The resulting BSTR is returned to the caller. Note that AllocSysString transfers ownership of the BSTR, so the burden is on the caller to eventually call SysFreeString. CStringT also provides SetSysString, which provides the same capability as AllocSysString, except that SetSysString works with an existing BSTR and uses ReAllocSysString to expand the storage of the pbstr argument and then copies the CStringT data into it. This process also frees the original BSTR passed in.

The following example demonstrates how AllocSysString can be used to return a BSTR from a method call.

STDMETHODIMP CPhoneBook::LookupName( BSTR* pbstrName) {
  // ... do some processing

  CString str("Kirk");

  *pbstrName = str.AllocString(); // pbstrName contains "Kirk"

    // caller must eventually call SysFreeString
}