Implementing ISAPI in ATL Server

The CIsapiExtension class is the heart of ATL's implementation of the ISAPI interface.

template <class ThreadPoolClass=CThreadPool<CIsapiWorker>,  
  class CRequestStatClass=CNoRequestStats,                  
  class HttpUserErrorTextProvider=CDefaultErrorProvider,    
  class WorkerThreadTraits=DefaultThreadTraits,             
  class CPageCacheStats=CNoStatClass,                       
  class CStencilCacheStats=CNoStatClass>                    
class CIsapiExtension :                                     
  public IServiceProvider,                                  
  public IIsapiExtension,                                   
  public IRequestStats {                                    
protected:                                                  
  CIsapiExtension();                                        
                                                            
  DWORD HttpExtensionProc(LPEXTENSION_CONTROL_BLOCK lpECB) ;
  BOOL GetExtensionVersion(__out HSE_VERSION_INFO* pVer) ;  
  BOOL TerminateExtension(DWORD /*dwFlags*/) ;              
                                                            
  // ...                                                    
};

As you can see, this class is heavily templated. Three of the template parameters (CRequestStatClass, CPageCacheStats, and CStencilCacheStats) are used for performance tracking and logging. The default template parameters result in no logging or performance counters being used; ATL Server provides other implementation that will gather statistics for you, but because that logging can have a significant performance impact, it's turned off by default.

The three CIsapiExtension methods contain the actual implementations of the three ISAPI functions. The GetExtensionVersion method is long but fairly straightforward. Because this is the method called when the ISAPI extension is first loaded, the class does most of its initialization here:

BOOL GetExtensionVersion( HSE_VERSION_INFO* pVer) {                       
  // allocate a Tls slot for storing per thread data                      
  m_dwTlsIndex = TlsAlloc();                                              
                                                                          
  // create a private heap for request data                               
  // this heap has to be thread safe to allow for                         
  // async processing of requests                                         
  m_hRequestHeap = HeapCreate(0, 0, 0);                                   
  if (!m_hRequestHeap) {                                                  
    m_hRequestHeap = GetProcessHeap();                                    
    if (!m_hRequestHeap) {                                                
      return SetCriticalIsapiError(IDS_ATLSRV_CRITICAL_HEAPCREATEFAILED); 
    }                                                                     
  }                                                                       
                                                                          
  // create a private heap (synchronized) for                             
  // allocations. This reduces fragmentation overhead                     
  // as opposed to the process heap                                       
  HANDLE hHeap = HeapCreate(0, 0, 0);                                     
  if (!hHeap) {                                                           
    hHeap = GetProcessHeap();                                             
    m_heap.Attach(hHeap, false);                                          
  } else {                                                                
    m_heap.Attach(hHeap, true);                                           
  }                                                                       
  hHeap = NULL;                                                           
                                                                          
  if (S_OK != m_WorkerThread.Initialize()) {                              
      return SetCriticalIsapiError(IDS_ATLSRV_CRITICAL_WORKERINITFAILED); 
  }                                                                       
                                                                          
  if (m_critSec.Init() != S_OK) {                                         
      HRESULT hrIgnore=m_WorkerThread.Shutdown();                         
      return SetCriticalIsapiError(IDS_ATLSRV_CRITICAL_CRITSECINITFAILED);
  }                                                                       
  if (S_OK != m_ThreadPool.Initialize(                                    
    static_cast<IIsapiExtension*>(this), GetNumPoolThreads(),             
    GetPoolStackSize(), GetIOCompletionHandle())) {                       
    HRESULT hrIgnore=m_WorkerThread.Shutdown();                           
    m_critSec.Term();                                                     
    return SetCriticalIsapiError(                                         
      IDS_ATLSRV_CRITICAL_THREADPOOLFAILED);                              
  }                                                                       
                                                                          
  if (FAILED(m_DllCache.Initialize(&m_WorkerThread,                       
    GetDllCacheTimeout()))) {                                             
    HRESULT hrIgnore=m_WorkerThread.Shutdown();                           
    m_ThreadPool.Shutdown();                                              
    m_critSec.Term();                                                     
    return SetCriticalIsapiError(                                         
      IDS_ATLSRV_CRITICAL_DLLCACHEFAILED);                                
  }                                                                       
                                                                          
  if (FAILED(m_PageCache.Initialize(&m_WorkerThread))) {                  
    HRESULT hrIgnore=m_WorkerThread.Shutdown();                           
    m_ThreadPool.Shutdown();                                              
    m_DllCache.Uninitialize();                                            
    m_critSec.Term();                                                     
    return SetCriticalIsapiError(                                         
      IDS_ATLSRV_CRITICAL_PAGECACHEFAILED);                               
  }                                                                       
                                                                          
  if (S_OK != m_StencilCache.Initialize(                                  
    static_cast<IServiceProvider*>(this),                                 
    &m_WorkerThread,                                                      
    GetStencilCacheTimeout(),                                             
    GetStencilLifespan())) {                                              
    HRESULT hrIgnore=m_WorkerThread.Shutdown();                           
    m_ThreadPool.Shutdown();                                              
    m_DllCache.Uninitialize();                                            
    m_PageCache.Uninitialize();                                           
    m_critSec.Term();                                                     
    return SetCriticalIsapiError(IDS_ATLSRV_CRITICAL_STENCILCACHEFAILED); 
  }                                                                       
                                                                          
  pVer->dwExtensionVersion = HSE_VERSION;                                 
  Checked::strncpy_s(pVer->lpszExtensionDesc,                             
    HSE_MAX_EXT_DLL_NAME_LEN, GetExtensionDesc(), _TRUNCATE);             
  pVer->lpszExtensionDesc[HSE_MAX_EXT_DLL_NAME_LEN - 1] = '\0';           
                                                                          
  return TRUE;                                                            
}

This method allocates two Win32 heaps for use during request process, sets up a thread pool, and initializes various caches.

The real action takes place in the HttpExtensionProc method. This is called for every HTTP request that IIS routes to our extension DLL. Before we look at the implementation of this method, we need to look at how to achieve high performance in a server environment.

Performance and Multithreading

Any production web server needs to handle many simultaneous network requests. In the original web extension platform, the Common Gateway Interface (CGI), each request was handled by spawning a new process. This process handled that one request and then exited. This worked acceptably on UNIX for small sites, but process creation overhead soon limited the number of simultaneous requests that could be processed.

This process-creation model was made even worse on Windows, where creating processes is much more expensive. However, there's a fairly obvious alternative in Win32: use a thread per request instead of a process. Threads are much, much cheaper to start. Unfortunately, the obvious solution is somewhat less obviously wrong in large systems. Threads might be cheap, but they're not free. As the number of threads increases, the CPU spends more time on thread management and less time actually doing the work of serving your web site.

The solution comes from the stateless nature of HTTP. Because each request is independent, it doesn't matter which specific thread processes a request. More usefully, when a thread is done processing a request, instead of dying, it can be reused to process another request. This design is called a thread pool.

IIS uses a thread pool internally to handle incoming traffic. Each request is handed off to a thread in the pool. The thread services the request (by either returning static content off the disk or executing the HttpExtensionProc of the appropriate ISAPI extension DLL). In general, this works well, but the thread has to finish its processing quickly. If all the threads in the IIS pool are busy, new requests start getting dropped. Serving static content is a low-overhead process. But when you start executing arbitrary code (to generate dynamic HTML, for example), suddenly the time it takes for the thread to return to the pool is much less predictable, and it could be much longer.

So, we need to return the IIS thread back to the pool as soon as possible. But we also need to actually perform our processing to handle the request. Instead of forcing every developer to micro-optimize every statement of the ISAPI extension to get the thread back to the pool, ATL Server provides its own thread pool. On a request, the HttpExtensionProc (which is running on the IIS thread) places the request into the internal thread pool. The IIS thread then returns, ready to process another request. The code follows:

DWORD HttpExtensionProc(LPEXTENSION_CONTROL_BLOCK lpECB) {       
  AtlServerRequest *pRequestInfo = NULL;                         
  _ATLTRY {                                                      
    pRequestInfo = CreateRequest();                              
    if (pRequestInfo == NULL)                                    
      return HSE_STATUS_ERROR;                                   
                                                                 
    CServerContext *pServerContext = NULL;                       
    ATLTRY(pServerContext = CreateServerContext(m_hRequestHeap));
    if (pServerContext == NULL) {                                
      FreeRequest(pRequestInfo);                                 
      return HSE_STATUS_ERROR;                                   
    }                                                            
                                                                 
    pServerContext->Initialize(lpECB);                           
    pServerContext->AddRef();                                    
                                                                 
    pRequestInfo->pServerContext = pServerContext;               
    pRequestInfo->dwRequestType = ATLSRV_REQUEST_UNKNOWN;        
    pRequestInfo->dwRequestState = ATLSRV_STATE_BEGIN;           
    pRequestInfo->pExtension =                                   
      static_cast<IIsapiExtension *>(this);                      
    pRequestInfo->pDllCache =                                    
      static_cast<IDllCache *>(&m_DllCache);                     
#ifndef ATL_NO_MMSYS                                             
    pRequestInfo->dwStartTicks = timeGetTime();                  
#else                                                            
    pRequestInfo->dwStartTicks = GetTickCount();                 
#endif                                                           
    pRequestInfo->pECB = lpECB;                                  
                                                                 
    m_reqStats.OnRequestReceived();                              
                                                                 
    if (m_ThreadPool.QueueRequest(pRequestInfo))                 
      return HSE_STATUS_PENDING;                                 
                                                                 
    if (pRequestInfo != NULL) {                                  
      FreeRequest(pRequestInfo);                                 
    }                                                            
  }                                                              
  _ATLCATCHALL() { }                                             
  return HSE_STATUS_ERROR;                                       
}

The CreateRequest method simply allocates a chunk of memory from the request heap to store the information about the request:

struct AtlServerRequest {                                       
  // For future compatibility                                   
  DWORD cbSize;                                                 
                                                                
  // Necessary because it wraps the ECB                         
  IHttpServerContext *pServerContext;                           
                                                                
  // Indicates whether it was called through an .srf file or    
  // through a .dll file                                        
  ATLSRV_REQUESTTYPE dwRequestType;                             
  // Indicates what state of completion the request is in       
  ATLSRV_STATE dwRequestState;                                  
  // Necessary because the callback (for async calls) must      
  // know where to route the request                            
  IRequestHandler *pHandler;                                    
  // Necessary in order to release the dll properly             
  // (for async calls)                                          
  HINSTANCE hInstDll;                                           
  // Necessary to requeue the request (for async calls)         
  IIsapiExtension *pExtension;                                  
  // Necessary to release the dll in async callback             
  IDllCache* pDllCache;                                         
                                                                
  HANDLE hFile;                                                 
  HCACHEITEM hEntry;                                            
  IFileCache* pFileCache;                                       
                                                                
  // necessary to synchronize calls to HandleRequest            
  // if HandleRequest could potentially make an                 
  // async call before returning. only used                     
  // if indicated with ATLSRV_INIT_USEASYNC_EX                  
  HANDLE m_hMutex;                                              
  // Tick count when the request was received                   
  DWORD dwStartTicks;                                           
  EXTENSION_CONTROL_BLOCK *pECB;                                
  PFnHandleRequest pfnHandleRequest;                            
  PFnAsyncComplete pfnAsyncComplete;                            
  // buffer to be flushed asynchronously                        
  LPCSTR pszBuffer;                                             
  // length of data in pszBuffer                                
  DWORD dwBufferLen;                                            
  // value that can be used to pass user data between           
  // parent and child handlers                                  
  void* pUserData;                                              
};                                                              
                                                                
AtlServerRequest *CreateRequest() {                             
    // Allocate a fixed block size to avoid fragmentation       
    AtlServerRequest *pRequest = (AtlServerRequest *) HeapAlloc(
      m_hRequestHeap, HEAP_ZERO_MEMORY,                         
      __max(sizeof(AtlServerRequest),                           
        sizeof(_CComObjectHeapNoLock<CServerContext>)));  
    if (!pRequest) return NULL;                                 
                                                                
    pRequest->cbSize = sizeof(AtlServerRequest);             
    return pRequest;                                            
}

As you can see, there's all the information that IIS supplies about the request (the ECB pointer), plus a whole lot more.

The ATL Server Thread Pool

ATL Server provides a thread pool implementation in the CThreadPool class:

template <class Worker,                       
  class ThreadTraits=DefaultThreadTraits,     
  class WaitTraits=DefaultWaitTraits>         
class CThreadPool : public IThreadPoolConfig {
    // ...                                    
};

The template parameters give you control over how threads are created and what they do. The Worker template parameter lets you specify what class will actually do the processing of the request. The ThreadTraits class controls how a thread is created. Depending on the ATL_MIN_CRT setting, DefaultThreadTraits is a typedef to one of two other classes:

class CRTThreadTraits {                                          
public:                                                          
  static HANDLE CreateThread(LPSECURITY_ATTRIBUTES lpsa,         
      DWORD dwStackSize, LPTHREAD_START_ROUTINE pfnThreadProc,   
      void *pvParam, DWORD dwCreationFlags, DWORD *pdwThreadId) {
    // _beginthreadex calls CreateThread                         
    // which will set the last error value                       
    // before it returns.                                        
    return (HANDLE) _beginthreadex(lpsa, dwStackSize,            
      (unsigned int (__stdcall *)(void *)) pfnThreadProc,        
      pvParam, dwCreationFlags, (unsigned int *) pdwThreadId);   
  }                                                              
};                                                               
                                                                 
class Win32ThreadTraits {                                        
public:                                                          
  static HANDLE CreateThread(LPSECURITY_ATTRIBUTES lpsa,         
      DWORD dwStackSize, LPTHREAD_START_ROUTINE pfnThreadProc,   
      void *pvParam, DWORD dwCreationFlags, DWORD *pdwThreadId) {
    return ::CreateThread(lpsa, dwStackSize, pfnThreadProc,      
      pvParam, dwCreationFlags, pdwThreadId);                    
  }                                                              
};                                                               
                                                                 
#if !defined(_ATL_MIN_CRT) && defined(_MT)                       
    typedef CRTThreadTraits DefaultThreadTraits;                 
#else                                                            
    typedef Win32ThreadTraits DefaultThreadTraits;               
#endif

As part of initialization, the CThreadPool class uses the ThreadTraits class to create the initial set of threads. The threads in the pool all run this thread proc:

DWORD ThreadProc() {                                             
  DWORD dwBytesTransfered;                                       
  ULONG_PTR dwCompletionKey;                                     
                                                                 
  OVERLAPPED* pOverlapped;                                       
                                                                 
  // this block is to ensure theWorker gets destructed before the
  // thread handle is closed {                                   
    // We instantiate an instance of the worker class on the     
    // stack for the life time of the thread.                    
    Worker theWorker;                                            
    if (theWorker.Initialize(m_pvWorkerParam) == FALSE) {        
      return 1;                                                  
    }                                                            
                                                                 
    SetEvent(m_hThreadEvent);                                    
    // Get the request from the IO completion port               
    while (GetQueuedCompletionStatus(m_hRequestQueue,            
      &dwBytesTransfered, &dwCompletionKey, &pOverlapped,        
      INFINITE)) {                                               
      if (pOverlapped == ATLS_POOL_SHUTDOWN) // Shut down {      
        LONG bResult = InterlockedExchange(&m_bShutdown, FALSE); 
        if (bResult) // Shutdown has not been cancelled          
          break;                                                 
                                                                 
      // else, shutdown has been cancelled  continue as before  
      }                                                          
      else {                                                     
        // Do work                                               
        Worker::RequestType request =                            
          (Worker::RequestType) dwCompletionKey;                 
                                                                 
        // Process the request. Notice the following:            
        // (1) It is the worker's responsibility to free any     
        // memory associated with the request if the request is  
        // complete                                              
        // (2) If the request still requires some more processing
        // the worker should queue the request again for         
        // dispatching                                           
        theWorker.Execute(request, m_pvWorkerParam, pOverlapped);
      }                                                          
    }                                                            
                                                                 
    theWorker.Terminate(m_pvWorkerParam);                        
  }                                                              
                                                                 
  m_dwThreadEventId = GetCurrentThreadId();                      
  SetEvent(m_hThreadEvent);                                      
                                                                 
  return 0;                                                      
}

The overall logic is fairly common in a thread pool. The thread sits waiting on the I/O Completion port for requests to come in. A special value is used to tell the thread to shut down; if it's not shut down, the request is passed off to the worker object to do the actual work.

The worker class can be anything with a RequestType typedef and the appropriate Execute method.

At this point, ATL Server has already provided a greatly improved ISAPI development experience. The hard work to maintain the performance of the server has been done; all you need to do is write a worker class and implement your logic in the Execute method. This still leaves you with the job of generating the HTML to send to the client. This isn't too hard in C++,^[1] but it is tedious, and building HTML in code means that you have to recompile to change a spelling error. What's really needed is some way to generate the HTML based on a template. ATL Server does this via Server Response Files.

^[1] ATL Server actually provides a framework of classes to assist in HTML generation. Take a look at the CHtmlGen class in the documentation.