The CString
Class
CString
Overview
For years now, ATL
programmers have glared longingly over the shoulders of their MFC
brethren slinging character data about in their programs with the
grace and dexterity of Barishnikov himself. MFC developers have
long enjoyed the ubiquitous CString class provided with
the libraryso much so that when they ventured into previous
versions of ATL, they often found themselves tempted to check that
wizard option named Support MFC and suck in a 1MB library just to
allow them to continue working with their bread-'n-butter string
class. Sure, ATL programmers have CComBSTR, which is fine
for code at the "edges" of a method's implementationthat is, either
receiving a BSTR input parameter at the beginning of a
method or returning some sort of BSTR output parameter at
the end of a method. But compared to CString's extensive
support for everything from sprintf-style formatting to
search-and-replace, CComBSTR is woefully inadequate for
any serious string processing. And, sure, ATL programmers have had
STL's string<> template class for years, but it also
falls short of CString in functionality. In addition,
because it is a standard, platform-independent class, it can't
possibly provide such useful functionality as integrating with the
Windows resource architecture.
Well, the long wait is over: CString is
available as of ATL 7. In fact, CString is a shared class
between MFC and ATL, along with a number of other classes. You'll
note that there are no longer separate \MFC\Include and
\ATL\Include directories within the Visual Studio file
hierarchy. Instead, both libraries maintain code in
\ATLMFC\Include. I think it's extraordinarily insightful to examine just how
and where the shared CString class is defined. First, all
the header files are under a directory named \ATLMFC,
not \MFCATL.
CString used to be defined in afx.h, the prefix
that has identified MFC from its earliest beginnings. Now the
definition appears in a file that simply defines CString
as a typedef to a template class called CStringT that does
all the work. This template class is actually in the ATL namespace.
That's rightone of the last bastions of MFC supremacy is now found
under the ATL moniker.
CString
Anatomy
Now that CString is template-based, it
follows the general ATL design pattern of supporting pluggable
functionality through template parameters that specialize in
CString behavior. As the first sections of this chapter
revealed, a number of different types of strings exist, with
different mechanisms for manipulating them. Templates are very well
suited to this kind of scenario, in which exposing flexibility is
important. But usability is also important, so ATL uses a
convenient combination of typedefs and default template parameters
to simplify using CString.
Understanding what's
under the covers of a CString instance is important in
understanding not only how the methods and operators work, but also
how CString can be extended and specialized to fit
particular requirements or to facilitate certain optimizations.
When you declare an instance of CString, you are actually
instantiating a template class called CStringT. The file
atlstr.h provides typedefs for CString, as well
as for ANSI and Unicode versionsCStringA and
CStringW, respectively.
typedef CStringT< wchar_t, StrTraitATL<
wchar_t, ChTraitsCRT< wchar_t > > >
CAtlStringW;
typedef CStringT< char, StrTraitATL<
char, ChTraitsCRT< char > > >
CAtlStringA;
typedef CStringT< TCHAR, StrTraitATL<
TCHAR, ChTraitsCRT< TCHAR > > >
CAtlString;
typedef CAtlStringW CStringW;
typedef CAtlStringA CStringA;
typedef CAtlString CString;
Strictly speaking, these typedefs are generated
only if the ATL project is linking to the CRT, which ATL projects
now do by default. Otherwise, the ChTraitsCRT template
class is not used as a parameter to CStringT because it
relies upon CRT functions to manage character-level
manipulation.
Because the CStringT template class is
the underlying class doing all the work, the remainder of the
discussion is in terms of CStringT. This class is defined
in cstringt.h as follows:
template< typename BaseType, class StringTraits >
class CStringT :
public CSimpleStringT< BaseType > {
// ...
}
The behavior of the CStringT class is
governed largely by three things: 1) the CSimpleStringT
base class, 2) the BaseType template parameter, and 3) the
StringTraits template parameter. CSimpleStringT
provides a lot of basic string functionality that CStringT
inherits. The BaseType template parameter is used to
establish the underlying character data type of the string. The
only state CStringT holds is a pointer to a character
string of the type BaseType. This data is held in the
m_pszData private member defined in the
CSimpleStringT base class. The StringTraits
parameter is an interesting one. This
parameter establishes three things: 1) the module from which
resource strings will be loaded, 2) the string manager used to
allocate string data, and 3) the class that will provide low-level
character manipulation. The atlstr.h header file contains
the definition for this template class.
template< typename _BaseType = char, class StringIterator =
ChTraitsOS< _BaseType > >
class StrTraitATL : public StringIterator {
public:
static HINSTANCE FindStringResourceInstance(UINT nID) {
return( AtlFindStringResourceInstance( nID ) );
}
static IAtlStringMgr* GetDefaultManager() {
return( &g_strmgr );
}
};
StrTraitATL derives from the
StringIterator template parameter passed in. This
parameter implements low-level character operations that
CStringT ultimately will invoke when application code
calls methods on instances of CString. Two choices of
ATL-provided classes encapsulate the character traits:
ChTraitsCRT and ChTraitsOS. The former uses
functions that require you to link to the CRT in your project, so
you would use it if you were already linking to the CRT. The latter
does not require the CRT to implement its character-manipulation
functions. Both expose a common set of functions that
CStringT uses in its internal implementation.
Note that in the definition of the
StrTraitATL, we see the first evidence of the
extensibility of CStringT. The GetdefaultManager
method returns a reference to a string manager via the
IAtlStringMgr interface. This interface enforces a generic
pattern for managing string memory. atlsimpstr.h provides
the definition for this interface.
__interface IAtlStringMgr {
public:
CStringData* Allocate( int nAllocLength, int nCharSize );
void Free( CStringData* pData );
CStringData* Reallocate( CStringData* pData,
int nAllocLength, int nCharSize );
CStringData* GetNilString();
IAtlStringMgr* Clone();
};
ATL supplies a default
string manager that is used if the user does not specify another.
This default string manager is a concrete class called
CAtlStringMgr that implements IAtlStringMgr.
Abstracting string management into a separate class enables you to
customize the behavior of the string-management functions to suit
specific application requirements. Two mechanisms exist for
customizing string management for CStringT. The first
mechanism involves merely using CAtlStringMgr with a
specific memory manager. Chapter 3, "ATL Smart Types," discusses the
IAtlMemMgr interface, a generic interface that
encapsulates heap memory management. Associating a memory manager
with CAtlStringMgr is as simple as passing a pointer to
the memory manager to the CAtlStringMgr constructor.
CStringT must be instructed to use this
CAtlStringMgr in its internal implementation by passing
the string manager pointer to the CStringT constructor.
ATL provides five built-in heap managers that implement
IAtlMemMgr. We use CWin32Heap to demonstrate how
to use an alternate memory manager with CStringT.
// create a thread-safe process heap with zero initial size
// and no max size
// constructor parameters are explained later in this chapter
CWin32Heap heap(0, 0, 0);
// create a string manager that uses this memory manager
CAtlStringMgr strMgr(&heap);
// create a CString instance that uses this string manager
CString str(&strMgr);
// ... perform some string operations as usual
If you want more control over the
string-management functions, you can supply your own custom string
manager that fully implements IAtlStringMgr. Instead of
passing a pointer to CAtlStringMgr to the CString
constructor, as in the previous code, you would simply pass a
pointer to your custom IAtlStringMgr implementation. This
custom string manager might use one of the existing memory managers
or a custom implementation of IAtlMemMgr. Additionally, a
custom string manager might want to enforce a different
buffer-sharing policy than CAtlStringMgr's default
copy-on-write policy. Copy-on-write allows multiple
CStringT instances to read the same string memory, but a
duplicate is created before any writes to the buffer are
performed.
Of course, the simplest thing to do is to use
the defaults that ATL chooses when you use a simple
CString declaration, as in the following:
// declare an empty CString instance
CString str;
With this declaration, ATL will use
CAtlStringMgr to manage the string data.
CAtlStringMgr will use the built-in CWin32Heap
heap manager for supplying string data storage.
Constructors
CStringT provides 19 different
constructors, although one of the constructors is compiled into the
class definition only if you are building a managed C++ project for
the .NET platform. These types of ATL specializations are not
discussed in this book. In general, however, the large number of
constructors present represents the various different sources of
string data with which a CString instance can be
initialized, along with the additional options for supplying
alternate string managers. We examine these constructors in related
groups.
Before going further into the various methods,
let's look at some of the notational shortcuts that
CStringT uses in its method signatures. To properly
understand even the method declarations with CStringT, you
must be comfortable with the typedefs used to represent the
character types in CStringT. Because CStringT
uses template parameters to represent the base character type, the
syntax for expressing the various allowed character types can
become cumbersome or unclear in places. For instance, when you
declare a CStringW, you create an instance of
CStringT that encapsulates a series of wchar_t
characters. From the definition of the CStringT template
class, you can easily see that the BaseType template
parameter can be used in method signatures that need to specify a
wchar_t type parameterbut how would you specify methods
that need to accept a char type parameter? Certainly, I
need to be able to append char strings to a
wchar_t-based CString. Conversely, I must have
the ability to append wchar_t strings to a
char-based CString. Yet I have only one template
class in which to accomplish all this. CStringT provides
six type definitions to deal with this syntactic dichotomy. They
might seem somewhat arbitrary at first, but you'll see as we look
closer into CStringT that their use actually makes a lot
of sense. Table 2.3
summarizes these typedefs.
Table 2.3. CStringT Character Traits Type
Definitions
Typedef
|
BaseType is char
|
BaseType is wchar_t
|
Meaning
|
XCHAR
|
char
|
wchar_t
|
Single character of the same type as the CStringT
instance
|
PXSTR
|
LPSTR
|
LPWSTR
|
Pointer to character string of the same type as CStringT instance
|
PCXSTR
|
LPCSTR
|
LPCWSTR
|
Pointer to constant character string of the
same type as the CStringT
instance
|
YCHAR
|
wchar_t
|
Char
|
Single character of the opposite type as the CStringT
instance
|
PYSTR
|
LPWSTR
|
LPSTR
|
Pointer to character string of the opposite type as CStringT
instance
|
PCYSTR
|
LPCWSTR
|
LPCSTR
|
Pointer to constant character string of the
opposite type as the
CStringT instance
|
Two constructors enable
you to initialize a CString to an empty string:
CStringT();
explicit CStringT( IAtlStringMgr* pStringMgr );
Recall that the data for the CString is
kept in the m_pszData data member. These constructors
simply initialize the value of this member to be either a
NUL character or two NUL characters if the
BaseType is wchar_t. The second constructor
accepts a pointer to a string manager to use with this
CStringT instance. As stated previously, if the first
constructor is used, the CStringT instance will use the
default string manager CAtlStringMgr, which relies upon an
underlying CWin32Heap heap manager to allocate storage
from the process heap.
The next two constructors provide two different
copy constructors that enable you to initialize a new instance from
an existing CStringT or from an existing
CSimpleStringT.
CStringT( const CStringT& strSrc );
CStringT( const CThisSimpleString& strSrc );
The second constructor accepts a
CThisSimpleString reference, but this is simply a typedef
to CSimpleString<BaseType>. Exactly what these copy
constructors do depends upon the policy established by the string
manager that is associated with the CStringT instance.
Recall that if no string manager is specified, such as with the
constructor shown previously that accepts an IAtlStringMgr
pointer, CAtlStringMgr will be used to manage memory
allocation for the instance's string data. This default string
manager implements a copy-on-write policy that allows multiple
CStringT instances to share a string buffer for reading,
but automatically creates a copy of the buffer whenever another
CStringT instance tries to perform a write operation. The
following code demonstrates how these copy semantics work in
practice:
// "Fred" memcpy'd into strOrig buffer
CString strOrig("Fred");
// str1 points to strOrig buffer (no memcpy)
CString str1(strOrig);
// str2 points to strOrig buffer (no memcpy)
CString str2(str1);
// str3 points to strOrig buffer (no memcpy)
CString str3(str2);
// new buffer allocated for str2
// "John" memcpy'd into str2 buffer
str2 = "John";
As the comments indicate, CAtlStringMgr
creates no additional copies of the internal string buffer until a
write operation is performed with the assignment statement of
str2. The storage to hold the new data in str2 is
obtained from CAtlStringMgr. If we had specified another
custom string manager to use via a constructor, that implementation
would have determined how and when data is allocated. Actually,
CAtlStringMgr simply increments str2's buffer
pointer to "allocate" memory within its internal heap. As long as
there is room in the CAtlStringMgr's heap, no expansion of
the heap is required and the string allocation is fast and
efficient.
Several constructors accept a pointer to a
character string of the same type as the CStringT
instancethat is, a character string of type BaseType.
CStringT( const XCHAR* pszSrc );
CStringT( const XCHAR* pch, int nLength );
CStringT( const XCHAR* pch, int nLength, IAtlStringMgr* pStringMgr );
The first constructor should be used when the
character string provided is NUL terminated.
CStringT determines the size of the buffer needed by
simply looking for the terminating NUL. However, the
second and third forms of the constructor can accept an array of
characters that is not NUL terminated. In this case, the
length of the character array (in characters, not bytes), not
including the terminating NUL that will be added, must be
provided. You can improperly initialize your CString if
you don't feed these constructors the proper length or if you use
the first form with a string that's not NUL terminated.
For instance:
char rg[4] = { 'F', 'r', 'e', 'd' };
// Wrong! Wrong! rg not NULL-terminated
// str1 contains junk
CString str1(rg);
// ok, length provided to invoke correct ctor
CString str2(rg, 4);
char* sz = "Fred";
// ok, sz NULL-terminated => no length parameter needed
CString str3(sz);
You can also initialize a CStringT
instance with a character string of the opposite type of
BaseType.
CSTRING_EXPLICIT CStringT( const YCHAR* pszSrc );
CStringT( const YCHAR* pch, int nLength );
CStringT( const YCHAR* pch, int nLength,
IAtlStringMgr* pStringMgr );
These constructors work in an analogous manner
to the XCHAR-based constructors just shown. The difference
is that these constructors convert the source string to the
BaseType declared for the CStringT instance, if
it is required. For example, if the BaseType is
wchar_t, such as when you explicitly declare a
CStringW instance, and you pass the constructor a
char*, CStringT will use the Windows API function
MultiByteToWideChar to convert the source string.
CStringT( LPCSTR pszSrc, IAtlStringMgr* pStringMgr );
CStringT( LPCWSTR pszSrc, IAtlStringMgr* pStringMgr );
You can also initialize a CStringT
instance with a repeated series of characters using the following
constructors:
CSTRING_EXPLICIT CStringT( char ch, int nLength = 1 );
CSTRING_EXPLICIT CStringT( wchar_t ch, int nLength = 1 );
Here, the nLength specifies the number
of copies of the ch character to replicate in the
CStringT instance, as in the following:
CString str('z', 5); // str contains "zzzzz"
CStringT also enables you to initialize a
CStringT instance from an unsigned char string,
which is how MBCS strings are represented.
CSTRING_EXPLICIT CStringT( const unsigned char* pszSrc );
CStringT( const unsigned char* pszSrc,
IAtlStringMgr* pStringMgr );
Finally, CStringT provides two
constructors that accept a VARIANT as the string
source:
CStringT( const VARIANT& varSrc );
CStringT( const VARIANT& varSrc, IAtlStringMgr* pStringMgr );
Internally, CStringT uses the COM API
function VariantChangeType to attempt to convert
varSrc to a BSTR. VariantChangeType
handles simple conversion between basic types, such as
numeric-to-string conversions. However, the varSrc VARIANT
cannot contain a complex type, such as an array of double. In
addition, these two constructors truncate a BSTR that
contains an embedded NUL.
// BSTR bstr contains "This is part one\0and here's part two"
VARIANT var;
var.vt = VT_BSTR;
var.bstrVal = bstr;
// var contains "This is part one\0 and here's part two"
CString str(var); // str contains "This is part one"
Assignment
CStringT defines eight assignment
operators. The first two enable you to initialize an instance from
an existing CStringT or CSimpleStringT:
CStringT& operator=( const CStringT& strSrc );
CStringT& operator=( const CThisSimpleString& strSrc );
With both of these constructors, the copy policy
of the string manager in use dictates how these operators behave.
By default, CStringT instances use the copy-on-write
policy of the CAtlStringMgr class. See the previous
discussion of the CStringT constructors for more
information.
The next two assignment operators accept
pointers to string literals of the same type as the
CStringT instance or of the opposite type, as indicated by
the PCXSTR and PCYSTR source string types:
CStringT& operator=( PCXSTR pszSrc );
CStringT& operator=( PCYSTR pszSrc );
Of course, no conversions
are necessary with the first operator. However, CStringT
invokes the appropriate Win32 conversion function when the second
operator is used, as in the following code:
CStringA str; // declare an empty ANSI CString
str = L"Hello World"; // operator=(PCYSTR) invoked
// characters converted via
// WideCharToMultiByte
CStringT also enables you to assign
instances to individual characters. In these cases,
CStringT actually creates a string of one character and
appends either a 1- or 2-byte NUL terminator, depending on
the type of character specified and the BaseType of the
CStringT instance. These operators then delegate to either
operator=(PCXSTR) or operator=(PCYSTR) so that
any necessary conversions are performed.
CStringT& operator=( char ch );
CStringT& operator=( wchar_t ch );
Yet another CStringT assignment
operator accepts an unsigned char* as its argument to
support MBCS strings. This operator simply casts pszSrc to
a char* and invokes either operator=(PCXSTR) or
operator=(PCYSTR):
CStringT& operator=( const unsigned char* pszSrc );
Finally, instances of CStringT can be
assigned to VARIANT types. The use and behavior here are
identical to that described previously for the corresponding
CStringT constructor:
CStringT& operator=( const VARIANT& var );
String
Concatenation Using CString
CStringT defines eight operators used
to append string data to the end of an existing string buffer. In
all cases, storage for the new data appended is allocated using the
underlying string manager and its encapsulated heap. By default,
this means that CAtlStringMgr is employed; its underlying
CWin32Heap instance will be used to invoke the Win32
HeapReAlloc API function as necessary to grow the
CStringT buffer to accommodate the data appended by these
operators.
CStringT& operator+=( const CThisSimpleString& str );
CStringT& operator+=( PCXSTR pszSrc );
CStringT& operator+=( PCYSTR pszSrc );
template< int t_nSize >
CStringT& operator+=( const CStaticString<
XCHAR, t_nSize >& strSrc );
CStringT& operator+=( char ch );
CStringT& operator+=( unsigned char ch );
CStringT& operator+=( wchar_t ch );
CStringT& operator+=( const VARIANT& var );
The first operator accepts an existing
CStringT instance, and two operators accept
PCXSTR strings or PCYSTR strings. Three other
operators enable you to append individual characters to an existing
CStringT. You can append a char,
wchar_t, or unsigned char. One operator enables
you to append the string contained in an instance of
CStaticString. You can use this template class to
efficiently store immutable string data; it performs no copying of
the data with which it is initialized and merely serves as a
convenient container for a string constant. Finally, you can append
a VARIANT to an existing CStringT instance. As
with the VARIANT constructor and assignment operator
discussed previously, this operator relies upon
VariantChangeType to convert the underlying
VARIANT data into a BSTR. To the compiler, a
BSTR looks just like an OLECHAR*, so this
operator will ultimately end up calling either
operator+=(PCXSTR) or operator+=(PCYSTR),
depending on the BaseType of the CStringT
instance. The same issues with embedded NULs in the source
BSTR that we discussed earlier in the "Assignment" section apply here.
Three overloads of operator+() enable
you to concatenate multiple strings conveniently.
friend CSimpleStringT operator+(
const CSimpleStringT& str1,
const CSimpleStringT& str2 );
friend CSimpleStringT operator+(
const CSimpleStringT& str1,
PCXSTR psz2 );
friend CSimpleStringT operator+(
PCXSTR psz1,
const CSimpleStringT& str2 );
These operators are invoked when you write code
such as the following:
CString str1("Every good "); // str1: "Every good"
CString str2("boy does "); // str2: "boy does "
CString str3; // str3: empty
str3 = str1 + str3 + "fine"; // str3: "Every good boy does fine"
String concatenation is also supported through
several Append methods. Four of these methods are defined
on the CSimpleStringT base class and actually do the real
work for the operators just discussed. Indeed, the only additional
functionality offered by these four Append methods over
the operators appears in the overload that accepts an
nLength parameter. This enables you to append only a
portion of an existing string. If you specify an nLength
greater than the length of the source string, space will be
allocated to accommodate nLength characters. However, the
resulting CStringT data will be NUL terminated in
the same place as pszSrc.
void Append( PCXSTR pszSrc );
void Append( PCXSTR pszSrc, int nLength );
void AppendChar( XCHAR ch );
void Append( const CSimpleStringT& strSrc );
Three additional methods defined on
CStringT enable you to append formatted strings to
existing CStringT instances. Formatted strings are
discussed more later in this section when we cover
CStringT's Format operation. In short, these
types of operations enable you to employ sprintf-style
formatting to CStringT instances. The three methods shown
here differ only from FormatMessage in that the
CStringT instance is appended with the constructed string
instead of being overwritten by it.
void __cdecl AppendFormat( UINT nFormatID, ... );
void __cdecl AppendFormat( PCXSTR pszFormat, ... );
void AppendFormatV( PCXSTR pszFormat, va_list args );
Character Case
Conversion
Two CStringT methods support case
conversion: MakeUpper and MakeLower.
CStringT& MakeUpper() {
int nLength = GetLength();
PXSTR pszBuffer = GetBuffer( nLength );
StringTraits::StringUppercase( pszBuffer );
ReleaseBufferSetLength( nLength );
return( *this );
}
CStringT& MakeLower() {
int nLength = GetLength();
PXSTR pszBuffer = GetBuffer( nLength );
StringTraits::StringLowercase( pszBuffer );
ReleaseBufferSetLength( nLength );
return( *this );
}
Both of these methods delegate their work to the
ChTraitsOS or ChTraitsCRT class, depending on
which of these was specified as the template parameter when the
CStringT instance was declared. Simply instantiating a
variable of type CString uses the default character traits
class supplied in the typedef for CString. If the
preprocessor symbol _ATL_CSTRING_NO_CRT is defined, the
ChTraitsOS class is used; and the Win32 functions
CharLower and CharUpper are invoked to perform
the conversion. If _ATL_CSTRING_NO_CRT is not defined, the
ChTraitsCRT class is used by default, and it uses the
appropriate CRT function: _mbslwr, _mbsupr,
_wcslwr, or _wcsupr.
CString Comparison
Operators
CString defines a whole slew of
comparison operators (that's a metric slew, not an imperial slew). Seven
versions of operator== enable you to compare
CStringT instances with other instances, with ANSI and
Unicode string literals, and with individual characters.
friend bool operator==( const CStringT& str1,
const CStringT& str2 );
friend bool operator==( const CStringT& str1, PCXSTR psz2 );
friend bool operator==( PCXSTR psz1, const CStringT& str2 );
friend bool operator==( const CStringT& str1, PCYSTR psz2 );
friend bool operator==( PCYSTR psz1, const CStringT& str2 );
friend bool operator==( XCHAR ch1, const CStringT& str2 );
friend bool operator==( const CStringT& str1, XCHAR ch2 );
As you might expect, a corresponding set of
overloads for operator!= is also provided.
friend bool operator!=( const CStringT& str1,
const CStringT& str2 );
friend bool operator!=( const CStringT& str1, PCXSTR psz2 );
friend bool operator!=( PCXSTR psz1, const CStringT& str2 );
friend bool operator!=( const CStringT& str1, PCYSTR psz2 );
friend bool operator!=( PCYSTR psz1, const CStringT& str2 );
friend bool operator!=( XCHAR ch1, const CStringT& str2 );
friend bool operator!=( const CStringT& str1, XCHAR ch2 );
And, of course, a full battalion of relational
comparison operators is available in CStringT.
friend bool operator<( const CStringT& str1,
const CStringT& str2 );
friend bool operator<( const CStringT& str1, PCXSTR psz2 );
friend bool operator<( PCXSTR psz1, const CStringT& str2 );
friend bool operator>( const CStringT& str1,
const CStringT& str2 );
friend bool operator>( const CStringT& str1, PCXSTR psz2 );
friend bool operator>( PCXSTR psz1, const CStringT& str2 );
friend bool operator<=( const CStringT& str1,
const CStringT& str2 );
friend bool operator<=( const CStringT& str1, PCXSTR psz2 );
friend bool operator<=( PCXSTR psz1, const CStringT& str2 );
friend bool operator>=( const CStringT& str1,
const CStringT& str2 );
friend bool operator>=( const CStringT& str1, PCXSTR psz2 );
friend bool operator>=( PCXSTR psz1, const CStringT& str2 );
All the operators use the same method to perform
the actual comparison: CStringT::Compare. A brief
inspection of the operator= overload that takes two
CStringT instances reveals how this is accomplished:
friend bool operator==( const CStringT& str1,
const CStringT& str2 ) {
return( str1.Compare( str2 ) == 0 );
}
Similarly, the same overload for
operator!= is defined as follows:
friend bool operator!=( const CStringT& str1,
const CStringT& str2 ) {
return( str1.Compare( str2 ) != 0 );
}
The relational operators use Compare
like this:
friend bool operator<( const CStringT& str1,
const CStringT& str2 ) {
return( str1.Compare( str2 ) < 0 );
}
Compare returns -1 if
str1 is lexicographically (say that ten times fast while standing on your
head) less than str2, and 1 if str1 is
lexicographically greater than str1. Strings are compared
character by character until an inequality occurs or the end of one
of the strings is reached. If no inequalities are detected and the
strings are the same length, they are considered equal.
Compare returns 0 in this case. If an inequality is found
between two characters, the result of a lexical comparison between
the two characters is returned as the result of the string
comparison. If the characters in the strings are the same except
that one string is longer, the shorter string is considered to be
less than the longer string. It is important to note that all these
comparisons are case-sensitive. If you want to perform
noncase-sensitive comparisons, you must resort to using the
CompareNoCase method directly, as discussed in a
moment.
As with many of the character-level operations
invoked by various CStringT methods and operators, the
character traits class does the real heavy lifting. The
CStringT::Compare method delegates to either
ChTraitsOS or ChTraitsCRT, as discussed
previously.
int Compare( PCXSTR psz ) const {
ATLASSERT( AtlIsValidString( psz ) );
return( StringTraits::StringCompare( GetString(), psz ) );
}
int CompareNoCase( PCXSTR psz ) const {
ATLASSERT( AtlIsValidString( psz ) );
return( StringTraits::StringCompareIgnore(
GetString(), psz ) );
}
Assuming that CString is used to
declare the instance and the project defaults are in use
(_ATL_CSTRING_NO_CRT is not defined), the Compare
method delegates to ChTraitsCRT::StringCompare. This
function uses one of the CRT functions lstrcmpA or
wcscmp. Correspondingly, CompareNoCase invokes
either lstrcmpiA or _wcsicmp.
Two additional comparison methods provide the
same functionality as Compare and CompareNoCase,
except that they perform the comparison using language rules. The
CRT functions underlying these methods are _mbscoll and
_mbsicoll, or their Unicode equivalents, depending again
on the underlying character type of the CStringT.
int Collate( PCXSTR psz ) const
int CollateNoCase( PCXSTR psz ) const
One final operator that
bears mentioning is operator[]. This operator enables you
to use convenient arraylike syntax to access individual characters
in the CStringT string buffer. This operator is defined on
the CSimpleStringT base class as follows:
XCHAR operator[]( int iChar ) const {
ATLASSERT( (iChar >= 0) && (iChar <= GetLength()) );
return( m_pszData[iChar] );
}
This function merely does some simple bounds
checking (note that you can index the NUL terminator if
you want) and then returns the character located at the specified
index. This enables you to write code like the following:
CString str("ATL Internals");
char c1 = str[2]; // 'L'
char c2 = str[5]; // 'n'
char c3 = str[13]; // '\0'
CString
Operations
CStringT instances can be manipulated
and searched in a variety of ways. This section briefly presents
the methods CStringT exposes for performing various types
of operations. Three methods are designed to facilitate searching
for strings and characters within a CStringT instance.
int Find( XCHAR ch, int iStart = 0 ) const
int Find( PCXSTR pszSub, int iStart = 0 ) const
int FindOneOf( PCXSTR pszCharSet ) const
int ReverseFind( XCHAR ch ) const
The first version of Find accepts a
single character of BaseType and returns the zero-based
index of the first occurrence of ch within the
CStringT instance. Find starts the search at the
index specified by iStart. If the character is not found,
-1 is returned. The second version of Find
accepts a string of characters and returns either the index of the
first character of pszSub within the CStringT or
-1 if pszSub does not occur in its entirety
within the instance. As with many character-level operations, the
character traits class performs the real work. With
ChTraitsCRT in use, the first two versions of
Find delegate ultimately to the CRT functions
_mbschr and _mbsstr, respectively. The
FindOneOf method looks for the first occurrence of any
character within the pszCharSet parameter. This method
invokes the CRT function _mbspbrk
to do the search. Finally, the ReverseFind method operates
similarly to Find, except that it starts its search at the
end of the CStringT and looks "backward." Note that all
these operations are case-sensitive. The following examples
demonstrate the use of these search operations.
CString str("Show me the money!");
int n = str.Find('o'); // n = 2
n = str.Find('O'); // n = -1, case-sensitivity
n = str.ReverseFind('o'); // n = 13, 'o' in "money" found
// first
n = str.Find("the"); // n = 8
n = str.FindOneOf("aeiou"); // n = 2
n = str.Find('o', 4); // n = 13, started search after
// first 'o'
Nine different trim functions enable
you to remove characters from the beginning and or end of a
CStringT. The first trim function removes all
leading and trailing whitespace characters from the string. The
second overload of trim accepts a character and removes
all leading and trailing instances of chTarget from the
string; the third overload of trim removes leading and
trailing occurrences of any character in the pszTargets
string parameter. The three overloads for trimLeft behave
similarly to trim, except that they remove the desired
characters only from the beginning of the string. As you might
guess, trimRight removes only trailing instances of the
specified characters.
CStringT& Trim()
CStringT& Trim( XCHAR chTarget )
CStringT& Trim( PCXSTR pszTargets )
CStringT& TrimLeft()
CStringT& TrimLeft( XCHAR chTarget )
CStringT& TrimLeft( PCXSTR pszTargets )
CStringT& TrimRight()
CStringT& TrimRight( XCHAR chTarget )
CStringT& TrimRight( PCXSTR pszTargets )
CStringT provides two useful functions
for extracting characters from the encapsulated string:
CStringT SpanIncluding( PCXSTR pszCharSet ) const
CStringT SpanExcluding( PCXSTR pszCharSet ) const
SpanIncluding
starts from the beginning of the CStringT data and returns
a new CStringT instance that contains all the characters
in the CStringT that are included in the
pszCharSet string parameter. If no characters in
pszCharSet are found, an empty CStringT is
returned. Conversely, SpanExcluding returns a new
CStringT that contains all the characters in the original
CStringT, up to the first one in pszCharSet. In
this case, if no character in pszCharSet is found, the
entire original string is returned.
You can insert individual characters or
entire strings into a CStringT instance using the
overloaded Insert method:
int Insert( int iIndex, PCXSTR psz )
int Insert( int iIndex, XCHAR ch )
These methods insert the specified character or
string into the CStringT instance starting at
iIndex. The string manager associated with the
CStringT allocates additional storage to accommodate the
new data. Similarly, you can delete a character or series of
characters from a string using either the Delete or
Remove methods:
int Delete( int iIndex, int nCount = 1 )
int Remove( XCHAR chRemove )
Delete removes from the CStringT
nCount characters starting at iIndex. Remove
deletes all occurrences of the single character specified by
chRemove.
CString str("That's a spicy meatball!");
str.Remove('T'); // str contains "hat's a spicy meatball!"
str.Remove('a'); // str contains "ht's spicy metbll!"
Individual characters or strings can be replaced
using the overloaded Replace method:
int Replace( XCHAR chOld, XCHAR chNew )
int Replace( PCXSTR pszOld, PCXSTR pszNew )
These methods search the CStringT
instance for every occurrence of the specified character or string
and replace each occurrence with the new character or string
provided. The methods return either the number of replacements
performed or -1 if no occurrences were found.
You can extract substrings of a
CStringT using the Left, Mid, and
Right functions:
CStringT Left( int nCount ) const
CStringT Mid( int iFirst ) const
CStringT Mid( int iFirst, int nCount ) const
CStringT Right( int nCount ) const
These functions are quite simple. Left
returns in a new CStringT instance the first
nCount characters of the original CStringT.
Mid has two overloads. The first returns a new
CStringT instance that contains all characters in the
original starting at iFirst and continuing to the end. The
second overload of Mid accepts an nCount
parameter so that only the specified number of characters starting
at iFirst are returned in the new CStringT.
Finally, Right returns the rightmost nCount
characters of the CStringT instance.
CStringT's MakeReverse method enables
you to reverse the characters in a CStringT:
CStringT& MakeReverse();
CString str("Let's do some ATL");
str.MakeReverse(); // str contains "LTA emos od s'teL"
Tokenize is a very useful method for
breaking a CStringT into tokens separated by
user-specified delimiters:
CStringT Tokenize( PCXSTR pszTokens, int& iStart ) const
The pszTokens parameter can include any
number of characters that will be interpreted as delimiters between
tokens. The iStart parameter specifies the starting index
of the tokenization process. Note that this parameter is passed by
reference so that the Tokenize implementation can update
its value to the index of the first character following a
delimiter. The function returns a CStringT instance
containing the string token found. When no more tokens are found,
the function returns an empty CStringT and iStart
is set to -1. Tokenize is typically used in code
like the following:
CString str("Name=Jenny; Ph: 867-5309");
CString tok;
int nPos = 0;
LPCSTR pszDelims = "; =:-";
tok = str.Tokenize(pszDelims, nPos);
while (tok != "") {
printf("Found token: %s\n", tok);
tok = str.Tokenize(pszDelims, nPos);
}
// Prints the following:
// Found token: Name
// Found token: Jenny
// Found token: Ph
// Found token: 867
// Found token: 5309
Three methods enable you to
populate a CStringT with string data embedded in the
component DLL (or EXE) as a Windows resource:
BOOL LoadString( UINT nID )
BOOL LoadString( HINSTANCE hInstance, UINT nID )
BOOL LoadString( HINSTANCE hInstance, UINT nID,
WORD wLanguageID )
The first overload retrieves the string from the
module containing the calling code and stores it in
CStringT. The second and third overloads enable you to
explicitly pass in a handle to the module from which the resource
string should be loaded. Additionally, the third overload enables
you to load a string in a specific language by specifying the
LANGID via the wLanguageID parameter. The
function returns trUE if the specified resource could be
loaded into the CStringT instance; otherwise, it returns
FALSE.
CStringT also provides a very thin
wrapper function on top of the Win32 function
GetEnvironmentVariable:
BOOL GetEnvironmentVariable( PCXSTR pszVar )
With this simple function, you can retrieve the
value of the environment variable indicated by pszVar and
store it in the CStringT instance. The functions return
TRUE if it succeeded and FALSE otherwise.
Formatted
Data
One of the most useful features of
CStringT is its capability to construct formatted strings
using sprintf-style format specifiers. CStringT
exposes four methods for building formatted string data. The first
two methods wrap underlying calls to the CRT function
vsprintf or vswprintf, depending on whether the
CStringT's BaseType is char or
wchar_t.
void __cdecl Format( PCXSTR pszFormat, ... );
void __cdecl Format( UINT nFormatID, ... );
The first overload for the
Format method accepts a format string directly. The second
overload retrieves the format string from the module's string table
by looking up the resource ID nFormatID.
Two other closely related methods enable you to
build formatted strings with CStringT instances. These
methods wrap the Win32 API function FormatMessage:
void __cdecl FormatMessage( PCXSTR pszFormat, ... );
void __cdecl FormatMessage( UINT nFormatID, ... );
As with the Format methods,
FormatMessage enables you to directly specify the format
string by using the first overload or to load it from the module's
string table using the second overload. It is important to note
that the format strings allowed for Format and
FormatMessage are different. Format uses the
format strings vsprintf allows; FormatMessage
uses the format strings the Win32 function FormatMessage
allows. The exact syntax and semantics for the various format
specifiers allowed are well documented in the online documentation,
so this is not repeated here.
You use these methods in code like the
following:
CString strFirst = "John";
CString strLast = "Doe";
CString str;
// str will contain "Doe, John: Age = 45"
str.Format("%s, %s: Age = %d", strLast, strFirst, 45);
Working with BSTRs
and CString
You've seen that CStringT is great for
manipulating char or wchar_t strings. Indeed, all
the operations we've presented so far operate in terms of these two
fundamental character types. However, we're going to be using ATL
to build COM components, and that means we'll often be dealing with
Automation types such as BSTR. So, we must have a
convenient mechanism for returning a BSTR from a method
while doing all the processing with our powerful CStringT
class. As it happens, CStringT supplies two methods for
precisely that purpose:
BSTR AllocSysString() const {
BSTR bstrResult = StringTraits::AllocSysString( GetString(),
GetLength() );
if( bstrResult == NULL ) {
ThrowMemoryException();
}
return( bstrResult );
}
BSTR SetSysString( BSTR* pbstr ) const {
ATLASSERT( AtlIsValidAddress( pbstr, sizeof( BSTR ) ) );
if( !StringTraits::ReAllocSysString( GetString(), pbstr,
GetLength() ) ) {
ThrowMemoryException();
}
ATLASSERT( *pbstr != NULL );
return( *pbstr );
}
AllocSysString allocates a
BSTR and copies the CStringT contents into it.
CStringT delegates this work to the character traits
class, which ultimately uses the COM API function
SysAllocString. The resulting BSTR is returned to
the caller. Note that AllocSysString transfers ownership
of the BSTR, so the burden is on the caller to eventually
call SysFreeString. CStringT also provides
SetSysString, which provides the same capability as
AllocSysString, except that SetSysString works
with an existing BSTR and uses ReAllocSysString
to expand the storage of the pbstr argument and then
copies the CStringT data into it. This process also frees
the original BSTR passed in.
The following example demonstrates how
AllocSysString can be used to return a BSTR from
a method call.
STDMETHODIMP CPhoneBook::LookupName( BSTR* pbstrName) {
// ... do some processing
CString str("Kirk");
*pbstrName = str.AllocString(); // pbstrName contains "Kirk"
// caller must eventually call SysFreeString
}
|