COM Collection and Enumeration Interfaces

Standard C++ Containers and Iterators

C++ programmers long ago learned to separate their collections into three pieces: the data itself, the container of the data, and an iterator for accessing the data. This separation is useful for building pieces separately from each other. The container's job is to enable the user to affect the contents of the collection. The iterator's job is to enable the user to access the contents of the container. And although the iterator implementation depends on how the container stores the data, the implementation details are hidden from the client of the container and the iterator. For example, imagine the following code for populating a container and then accessing it via an iterator:

void main() {
  // Populate the collection
  vector<long> rgPrimes;
  for (long n = 0; n != 1000; ++n) {
    if (IsPrime(n)) rgPrimes.push_back(n);
  }

  // Count the number of items in the collection
  cout << "Primes: " << rgPrimes.size() << endl;

  // Iterate over the collection using sequential access
  vector<long>::iterator begin = rgPrimes.begin();
  vector<long>::iterator end = rgPrimes.end();
  for (vector<long>::iterator it = begin; it != end; ++it) {
    cout << *it << " ";
  }
  cout << endl;
}

Because the container provides a well-known C++ interface, the client does not need to know the implementation details. In fact, C++ container classes are so uniform that this simple example would work just as well with a list or a deque as it does with a vector. Likewise, because the iterators that the container provides are uniform, the client doesn't need to know the implementation details of the iterator.

For the client to enjoy these benefits, the container and the iterator have certain responsibilities. The responsibilities of the container include the following:

Can allow the user to manipulate the data. Most containers are of variable size and are populated by the client. However, some containers represent a fixed data set or a set of data that is calculated instead of stored.
Can allow the user to obtain the count of items. Containers have a size method for this purpose.
Can allow random access. The std::vector class allows this using operator[], whereas the std::list class does not.
Must allow the user to access the data at least sequentially, if not randomly. C++ containers provide this facility by exposing iterators.

Likewise, the responsibilities of the iterator entail the following:

Must be capable of accessing the container's data. That data might be in some shared spot (such as memory, file, or database) where the collection and iterator can both access the data. Alternatively, the iterator might have its own copy of the data. This would allow one client to access a snapshot of the data while another client modified the data using the container. Finally, the iterator could generate the data on demandfor example, by generating the next prime number.
The iterator must keep track of its current position in the collection of data. Every call to the iterator's operator++ means to advance that position. Every call to the iterator's operator* means to hand out the data at the current position.
The iterator must be capable of indicating the end of the data to the client.

Although C++ containers and iterators are handy in your C++ code, neither is useful as a way of communicating data via a COM interface. Instead, we turn to the COM equivalent of containers and iterators: COM collections and enumerators.

COM Collections and Enumerators

A COM collection is a COM object that holds a set of data and allows the client to manipulate its contents via a COM interface. In many ways, a COM collection is similar to a C++ container. Unfortunately, IDL doesn't support templates, so it's impossible to define a generic ICollection interface. Instead, COM defines collections through coding conventions.

By convention, a COM collection interface takes a minimum form. This form is shown here, pretending that IDL supported templates:

[ object, dual ]
template <typename T>
interface ICollection : IDispatch {
  [propget]
  HRESULT Count([out, retval] long* pnCount);

  [id(DISPID_VALUE), propget]
  HRESULT Item([in] long n, [out, retval] T* pnItem);

  [id(DISPID_NEWENUM), propget]
  HRESULT _NewEnum([out, retval] IUnknown** ppEnum);
};

Several features about this interface are worth noting:

Although this minimal collection interface doesn't show any methods for adding or removing elements from the collection, most collections include such methods.
Most collection interfaces are dual interfaces. An IDispatch-based interface is required for some convenient language-mapping features that I discuss later.
Most collection interfaces have a read-only Count property that provides a count of the current elements in the collection. Not all collections can calculate a reliable count, however. Examples include a collection of all prime numbers and a collection of rows from a database query that hasn't yet been completed.
Most collection interfaces have a read-only Item property for random access to a specific element. The first parameter is the index of the element to access, which I've shown as a long. It's also common for this to be a VARIANT so that a number index or a string name can be used. If the index is a number, it is often 1-based, but the creator of the container can choose any indexing scheme desired. Furthermore, the Item property should be given the standard DISPID DISPID_VALUE. This marks the property as the "default" property, which certain language mappings use to provide more convenient access. I show you how this works later.
An interface is a collection interface when it exposes an enumerator via the read-only property _NewEnum, which must be assigned the standard DISPID DISPID_NEWENUM. Visual Basic uses this DISPID to implement its For-Each syntax, as I show you soon.

None of the methods specified earlier is actually required; you need to add only the methods you expect to support. However, it's highly recommended to have all three. Without them, you've got a container with inaccessible contents, and you can't even tell how many things are trapped in there.

A COM enumerator is to a COM collection as an iterator is to a container. The collection holds the data and allows the client to manipulate it, and the enumerator allows the client sequential access. However, instead of providing sequential access one element at a time, as with an iterator, an enumerator allows the client to decide how many elements it wants. This enables the client to balance the cost of round-trips with the memory requirements to handle more elements at once. A COM enumerator interface takes the following form (again, pretending that IDL supported templates):

template <typename T>
interface IEnum : IUnknown {
  [local]
  HRESULT Next([in] ULONG celt,
               [out] T* rgelt,
               [out] ULONG *pceltFetched);

  [call_as(Next)] // Discussed later...
  HRESULT RemoteNext([in] ULONG celt,
                     [out, size_is(celt),
                      length_is(*pceltFetched)] T* rgelt,
                     [out] ULONG *pceltFetched);

  HRESULT Skip([in] ULONG celt);
  HRESULT Reset();
  HRESULT Clone([out] IEnum<T> **ppenum);
}

A COM enumerator interface has the following properties:

The enumerator must be capable of accessing the data of the collection and maintaining a logical pointer to the next element to retrieve. All operations on an enumerator manage this logical pointer in some manner.
The Next method allows the client to decide how many elements to retrieve in a single round-trip. A result of S_OK indicates that the exact number of elements requested by the celt parameter has been returned in the rgelt array. A result of S_FALSE indicates that the end of the collection has been reached and that the pceltFetched argument holds the number of elements actually retrieved. In addition to retrieving the elements, the Next method implementation must advance the logical pointer internally so that subsequent calls to Next retrieve additional data.
The Skip method moves the logical pointer but retrieves no data. Notice that celt is an unsigned long, so there is no skipping backward. You can think of an enumerator as modeling a single-linked list, although, of course, it can be implemented any number of ways.
The Reset method moves the logical pointer back to the beginning of the collection.
The Clone method returns a copy of the enumerator object. The copy refers to the same data (although it can have its own copy) and points to the same logical position in the collection. The combination of Skip, Reset, and Clone makes up for the lack of a Back method.

Custom Collection and Enumerator Example

For example, let's model a collection of prime numbers as a COM collection:

[dual]
interface IPrimeNumbers : IDispatch {
  HRESULT CalcPrimes([in] long min, [in] long max);

  [propget]
  HRESULT Count([out, retval] long* pnCount);

  [propget, id(DISPID_VALUE)]
  HRESULT Item([in] long n, [out, retval] long* pnPrime);

  [propget, id(DISPID_NEWENUM)] // Not quite right...
  HRESULT _NewEnum([out, retval] IEnumPrimes** ppEnumPrimes);
};

The corresponding enumerator looks like this:

interface IEnumPrimes : IUnknown {
  [local]
  HRESULT Next([in] ULONG celt,
               [out] long* rgelt,
               [out] ULONG *pceltFetched);

  [call_as(Next)]
  HRESULT RemoteNext([in] ULONG celt,
                     [out, size_is(celt),
                       length_is(*pceltFetched)] long* rgelt,
                     [out] ULONG *pceltFetched);

  HRESULT Skip([in] ULONG celt);
  HRESULT Reset();
  HRESULT Clone([out] IEnumPrimes **ppenum);
};

Porting the previous C++ client to use the collection and enumerator looks like this:

void main() {
  CoInitialize(0);

  CComPtr<IPrimeNumbers> spPrimes;
  if (SUCCEEDED(spPrimes.CoCreateInstance(CLSID_PrimeNumbers))) {
    // Populate the collection
    HRESULT hr = spPrimes->CalcPrimes(0, 1000);

    // Count the number of items in the collection
    long nPrimes;
    hr = spPrimes->get_Count(&nPrimes);
    cout << "Primes: " << nPrimes << endl;

    // Enumerate over the collection using sequential access
    CComPtr<IEnumPrimes> spEnum;
    hr = spPrimes->get__NewEnum(&spEnum);

    const size_t PRIMES_CHUNK = 64;
    long         rgnPrimes[PRIMES_CHUNK];

    do {
      ULONG celtFetched;
    hr = spEnum->Next(PRIMES_CHUNK, rgnPrimes, &celtFetched);
    if (SUCCEEDED(hr)) {
      if (hr == S_OK) celtFetched = PRIMES_CHUNK;
      for (long* pn = &rgnPrimes[0];
        pn != &rgnPrimes[celtFetched]; ++pn) {
        cout << *pn << " ";
       }
    }
  }
  while (hr == S_OK);
  cout << endl;

  spPrimes.Release();
  }

  CoUninitialize();
}

This client code asks the collection object to populate itself via the CalcPrimes method instead of adding each prime number one at a time. Of course, this procedure reduces round-trips. The client further reduces round-trips when retrieving the data in chunks of 64 elements. A chunk size of any number greater than 1 reduces round-trips but increases the data requirement of the client. Only profiling can tell you the right number for each client/enumerator pair, but larger numbers are preferred to reduce round-trips.

Dealing with the Enumerator local/call_as Oddity

One thing that's rather odd about the client side of enumeration is the pceltFetched parameter filled by the Next method. The COM documentation is ambiguous, but it boils down to this: When only a single element is requested, the client doesn't have to provide storage for the number of elements fetchedthat is, pceltFetched is allowed to be NULL. Normally, however, MIDL doesn't allow an [out] parameter to be NULL. So, to support the documented behavior for enumeration interfaces, all of them are defined with two versions of the Next method. The [local] Next method is for use by the client and allows the pceltFetched parameter to be NULL. The [call_as] RemoteNext method doesn't allow the pceltFetched parameter to be NULL and is the method that performs the marshaling. Although the MIDL compiler implements the RemoteNext method, we have to implement Next manually because we've marked the Next method as [local]. In fact, we're responsible for implementing two versions of the Next method. One version is called by the client and, in turn, calls the RemoteNext method implemented by the proxy. The other version is called by the stub and calls the Next method implemented by the object. Figure 8.1 shows the progression of calls from client to object through the proxy, the stub, and our custom code. The canonical implementation is as follows:

static HRESULT STDMETHODCALLTYPE
IEnumPrimes_Next_Proxy(
  IEnumPrimes* This, ULONG celt, long* rgelt,
  ULONG* pceltFetched) {
  ULONG cFetched;
  if (!pceltFetched && celt != 1) return E_INVALIDARG;
  return IEnumPrimes_RemoteNext_Proxy(This, celt, rgelt,
    pceltFetched ? pceltFetched : &cFetched);
}

static HRESULT STDMETHODCALLTYPE
IEnumPrimes_Next_Stub(
  IEnumPrimes* This, ULONG celt, long* rgelt,
  ULONG* pceltFetched) {
  HRESULT hr = This->lpVtbl->Next(This, celt, rgelt,
    pceltFetched);
  if (hr == S_OK && celt == 1) *pceltFetched = 1;
  return hr;
}

Figure 8.1. Call progression from client, through proxy and stub, to implementation of `IEnumPrimes`

[View full size image]

Every enumeration interface includes this code in the proxy/stub implementation, including all the standard ones, such as IEnumUnknown, IEnumString, and IEnumVARIANT. The only difference in implementation is the name of the interface and the type of data being enumerated over (as shown in the IEnumPrimes example in bold).

When you're building the proxy/stub for your project using the <project>PS project generated by the ATL project template, and you have a custom enumeration interface, it's your job to inject that code into your proxy/stub. One way is to edit the <project>_p.c file, but if you were to recompile the IDL, the implementation would be lost. Another way is to add another .c file to the proxy/stub project. This is rather unpleasant and requires that you remember to update this code every time you edit the IDL file. The technique I prefer relies on macro definitions used during the proxy-/stub-building process and makes heavy use of the cpp_quote statement in IDL.^[1] Whenever you have a custom enumeration interface, insert code like this at the bottom of the IDL file, and all will be right with the world (the bold code changes based on the enumeration interface):

^[1] I learned these tricks from Don Box and his enumeration-generation macros. These are available at www.sellsbrothers.com/links/dbox/enumgen.zip (http://tinysells.com/51).

cpp_quote("#ifdef __midl_proxy")
cpp_quote("static HRESULT STDMETHODCALLTYPE")
cpp_quote("IEnumPrimes_Next_Proxy")
cpp_quote("(IEnumPrimes* This, ULONG celt, long* rgelt,
  ULONG* pceltFetched)")
cpp_quote("{")
cpp_quote(" ULONG cFetched;")
cpp_quote(" if( !pceltFetched && celt != 1 )
  return E_INVALIDARG;")
cpp_quote(" return IEnumPrimes_RemoteNext_Proxy(This, celt,
  rgelt,")
cpp_quote("                    pceltFetched ?
  pceltFetched : &cFetched);")
cpp_quote("}")
cpp_quote("")
cpp_quote("static HRESULT STDMETHODCALLTYPE")
cpp_quote("IEnumPrimes_Next_Stub")
cpp_quote("(IEnumPrimes* This, ULONG celt, long* rgelt,
  ULONG* pceltFetched)")
cpp_quote("{")
cpp_quote(" HRESULT hr = This->lpVtbl->Next(This, celt, rgelt,")
cpp_quote("                                 pceltFetched);")
cpp_quote(" if( hr == S_OK && celt == 1 ) *pceltFetched = 1;")
cpp_quote(" return hr;")
cpp_quote("}")
cpp_quote("#endif // __midl_proxy")

All the code within the cpp_quote statements is deposited into the <project>.h file, but because the __midl_proxy symbol is used, the code is compiled only when building the proxy/stub.

An Enumeration Iterator

One other niggling problem with COM enumerators is their ease of useor, rather, the lack thereof. It's good that a client has control of the number of elements to retrieve in a single round-trip, but logically the client is still processing the data one element at a time. This is obfuscated by the fact that we're using two loops instead of one. Of course, C++ being C++, there's no reason that a wrapper can't be built to remove this obfuscation.^[2] Such a wrapper is included with the source code examples for this book. It's called the enum_iterator and is declared like this:

^[2] Or replace one obfuscation for another, depending on your point of view.

#ifndef ENUM_CHUNK
#define ENUM_CHUNK 64
#endif

template <typename EnumItf, const IID* pIIDEnumItf,
  typename EnumType, typename CopyClass = _Copy<EnumType> >
class enum_iterator {
public:
  enum_iterator(IUnknown* punkEnum = 0,
    ULONG nChunk = ENUM_CHUNK);
  enum_iterator(const enum_iterator& i);
  ~enum_iterator();

  enum_iterator& operator=(const enum_iterator& rhs);
  bool operator!=(const enum_iterator& rhs);
  bool operator==(const enum_iterator& rhs);

  enum_iterator& operator++();
  enum_iterator operator++(int);
  EnumType& operator*();

private:
  ...
};

The enum_iterator class provides a standard C++-like forward iterator that wraps a COM enumerator. The type of the enumeration interface and the type of data that it enumerates are specified as template parameters. The buffer size is passed, along with the pointer to the enumeration interface, as a constructor argument. The first constructor allows for the common use of forward iterators. Instead of asking a container for the beginning and ending iterators, the beginning iterator is created by passing a non-NULL enumeration interface pointer. The end iterator is created by passing NULL. The copy constructor is used when forming a looping statement. This iterator simplifies the client enumeration code considerably:

...
// Enumerate over the collection using sequential access
CComPtr<IEnumPrimes> spEnum;
hr = spPrimes->get__NewEnum(&spEnum);

// Using an C++-like forward iterator
typedef enum_iterator<IEnumPrimes, &IID_IEnumPrimes, long>
  primes_iterator;
primes_iterator begin(spEnum, 64);
primes_iterator end;
for (primes_iterator it = begin; it != end; ++it) {
  cout << *it << " ";
 }
cout << endl;
...

Or if you'd like to get a little more fancy, you can use the enum_iterator with a function object and a standard C++ algorithm, which helps you avoid writing the looping code altogether:

struct OutputPrime {
  void operator()(const long& nPrime) {
    cout << nPrime << " ";
  }
};

  ...
  // Using a standard C++ algorithm
  typedef enum_iterator<IEnumPrimes, &IID_IEnumPrimes, long>
    primes_iterator;
  for_each(primes_iterator(spEnum, 64), primes_iterator(),
    OutputPrime());

  ...

This example might not be as clear to you as the looping example, but it warms the cockles of my C++ heart.

Enumeration and Visual Basic 6.0

In the discussion that follows and in all references to Visual Basic in this chapter, we talk specifically about Visual Basic 6.0, not the latest version, VB .NET. COM collections and enumerations evolved with VB6 in mind, so it's insightful to examine client-side programming with VB6 and collections. VB .NET, of course, is an entirely different subject and squarely outside the scope of this book.

The C++ for_each algorithm might seem a lot like the Visual Basic 6.0 (VB) For-Each statement, and it is. The For-Each statement allows a VB programmer to access each element in a collection, whether it's an intrinsic collection built into VB or a custom collection developed using COM. Just as the for_each algorithm is implemented using iterators, the For-Each syntax is implemented using a COM enumeratorspecifically, IEnumVARIANT. To support the For-Each syntax, the collection interface must be based on IDispatch and must have the _NewEnum property marked with the special DISPID value DISPID_NEWENUM. Because our prime number collection object exposes such a method, you might be tempted to write the following code to exercise the For-Each statement:

Private Sub Command1_Click()
    Dim primes As IPrimeNumbers
    Set primes = New PrimeNumbers
    primes.CalcPrimes 0, 1000

    MsgBox "Primes: " & primes.Count

    Dim sPrimes As String
    Dim prime As Variant

    For Each prime In primes ' Calls Invoke(DISPID_NEWENUM)
        sPrimes = sPrimes & prime & " "
    Next prime

    MsgBox sPrimes
End Sub

When VB sees the For-Each statement, it invokes the _NewEnum property, looking for an enumerator that implements IEnumVARIANT. To support this use, our prime number collection interface must change from exposing IEnumPrimes to exposing IEnumVARIANT. Here's the twist: The signature of the method is actually _NewEnum(IUnknown**), not _NewEnum(IEnumVARIANT**). VB takes the IUnknown* returned from _NewEnum and queries for IEnumVARIANT. It would've been nice for VB to avoid an extra round-trip, but perhaps at one point, the VB team expected to support other enumeration types.

Modifying IPrimeNumbers to support the VB For-Each syntax looks like this:

[dual]
interface IPrimeNumbers : IDispatch {
  HRESULT CalcPrimes([in] long min, [in] long max);

  [propget]
  HRESULT Count([out, retval] long* pnCount);

  [propget, id(DISPID_VALUE)]
  HRESULT Item([in] long n, [out, retval] long* pnPrime);

  [propget, id(DISPID_NEWENUM)]
  HRESULT _NewEnum([out, retval] IUnknown** ppunkEnum);
};

This brings the IPrimeNumbers interface into line with the ICollection template form we showed you earlier. In fact, it's fair to say that the ICollection template form was defined to work with VB.

Note one important thing about VB's For-Each statement. If your container contains objects (your returned variants contain VT_UNKNOWN or VT_DISPATCH), the contained objects must implement the IDispatch interface. If they don't, you'll get an "item not an object" error at runtime from VB 6.

The VB Subscript Operator

Using the Item method, a VB client can access each individual item in the collection one at a time:

...
Dim i As Long
For i = 1 To primes.Count
    sPrimes = sPrimes & primes.Item(i) & " "
Next i
...

Because I marked the Item method with DISPID_VALUE, VB allows the following abbreviated syntax that makes a collection seem like an array (if only for a second):

...
Dim i As Long
For i = 1 To primes.Count
    sPrimes = sPrimes & primes(i) & " " ' Invoke(DISPID_VALUE)
Next i
...

Assigning a property the DISPID_VALUE dispatch identifier makes it the default property, as far as VB is concerned. Using this syntax results in VB getting the default propertythat is, calling Invoke with DISPID_VALUE. However, because we're dealing with array syntax in VB, we have two problems. The first is knowing where to start the index1 or 0? A majority of existing code suggests making collections 1-based, but only a slight majority. As a collection implementer, you get to choose. As a collection user, you get to guess. In general, if you anticipate a larger number of VB clients for your collection, choose 1-basedand whatever you do, please document the decision.

The other concern with using array-style access is round-trips. Using the Item property puts us smack dab in the middle of what we're trying to avoid by using enumerators: one round-trip per data element. If you think that using the For-Each statement and, therefore, enumerators under VB solves both these problems, you're half right. Unfortunately, Visual Basic 6.0 continues to access elements one at a time, even though it's using IEnumVARIANT::Next and is perfectly capable of providing a larger buffer. However, using the ForEach syntax does allow you to disregard whether the Item method is 1-based or 0-based.

The Server Side of Enumeration

Because the semantics of enumeration interfaces are loose, you are free to implement them however you like. The data can be pulled from an array, a file, a database result set, or wherever it is stored. Even better, you might want to calculate the data on demand, saving yourself calculations and storage for elements in which the client isn't interested. Either way, if you're doing it by hand, you have some COM grunge code to write. Or, if you like, ATL is there to help write that grunge code.