A Bug in _CrtMemCheckpoint

Home
Back To Tips Page

Where is my memory going?

A memory leak is a really tedious set of problems to address.  But it is not helped by the fact that one of the tools which allegedly might help find a memory leak is in fact buggy.

You might expect, based on the description of _CrtMemCheckpoint and the associated _CrtMemState structure, that this might be useful in helping you find memory leaks.  For example, you might be led to suspect that if you did

_CrtMemState ms;
...do stuff here to set up
_CrtMemCheckpoint(&ms);
UINT start = ms.lTotalCount;

...do stuff whose expected net consumption is 0

_CrtMemCheckpoint(&ms);
ASSERT(start == ms.lTotalCount); // net consumption not zero!

that you would detect if there was a memory leak between the two calls.

Unfortunately, this won't work.  There is a serious bug in the dbgheap.c logic, or its documentation, that means that the number for lTotalCount is irrelevant.

This bug is present in all versions of VS from at least VS6 through VS2005.

The documentation for _CrtMemState says:


_CrtMemState

To capture a summary snapshot of the state of the heap at a given time, use the _CrtMemState structure defined in CRTDBG.H:
 

typedef struct _CrtMemState
{
 // Pointer to the most recently allocated block:
 struct _CrtMemBlockHeader * pBlockHeader;
 // A counter for each of the 5 types of block:
 size_t lCounts[_MAX_BLOCKS];
 // Total bytes allocated in each block type:
 size_t lSizes[_MAX_BLOCKS];
 // The most bytes allocated at a time up to now:
 size_t lHighWaterCount;
 // The total bytes allocated at present:
 size_t lTotalCount;
} _CrtMemState;

You would suspect from this description that lTotalCount represents the total bytes currently allocated at present.

Unfortunately, you would be wrong.

A simple examination of the code of dbgheap.c shows the following manipulation of the variable that is used to hold lTotalCount:

Version

Line

Function Code
VS6 61 (module-level declaration) static unsigned long _lTotalAlloc;
/*Grand total - sum of all allocations */
398 _heap_alloc_debug _lTotalAlloc += nSize;
664 realloc_help _lTotalAlloc -= pNewBlock->nDataSize;
665 _lTotalAlloc += nNewSize;
1850 _CrtMemCheckpoint state->lTotalCount = _lTotalAlloc;
VS.NET 2003 70 (module-level declaration) static unsigned long _lTotalAlloc;
/*Grand total - sum of all allocations */
420 _heap_alloc_debug _lTotalAlloc += nSize;
715 realloc_help _lTotalAlloc -= pNewBlock->nDataSize;
716 _lTotalAlloc += nNewSize;
1986 _CrtMemCheckpoint state->lTotalCount = _lTotalAlloc;
VS2005 81 (module-level declaration) static unsigned long _lTotalAlloc;
/*Grand total - sum of all allocations */
434 _heap_alloc_debug _lTotalAlloc += nSize;
753 realloc_help _lTotalAlloc -= pNewBlock->nDataSize;
754 _lTotalAlloc += nNewSize;
2174 _CrtMemCheckpoint state->lTotalCount = _lTotalAlloc;

This appears to be an error caused by someone misreading the code or misunderstanding what it meant.  The result is that the documentation produced for end-user consumption would indicate that this number is the total current allocation, but in fact it is, as indicated in the comment, the sum of all allocations throughout the history of the execution.  For example, a loop that allocates and frees 20 bytes would appear to be leaking 20 bytes on each iteration, when in fact there is no net loss of storage.

The particular example was prompted by a question in the newsgroup, where the code given was in support of the claim that GetTextExtent did not free the CString that was being created.  A slight adaptation of the code is shown here

     _CrtMemState ms;
    CClientDC dc(this);

    for(int i = 0; i < 1000; i++)
       { /* use storage */
        CSize sz = dc.GetTextExtent(_T("abcdefg"));
        _CrtMemCheckpoint(&ms);
        TRACE("%Total space allocated: %i\n", ms.lTotalCount);
       } /* use storage */

and a piece of the observed output was

Total space allocated: 9249
Total space allocated: 9269
Total space allocated: 9289
Total space allocated: 9309
Total space allocated: 9329

However, if you write a function that actually counts the number of allocated blocks in the heap, such as this one

/****************************************************************************
*                                  CountWalk
* Result: UINT
*       Amount of space allocated as revealed by _heapwalk
****************************************************************************/

UINT CountWalk()
    {
     int HeapStatus;
     BOOL running = TRUE;
     _HEAPINFO info;
     info._pentry = NULL;
     UINT UsedBytes = 0;
     
     while(running)
        { /* scan heap */
         HeapStatus = _heapwalk(&info);
         switch(HeapStatus)
            { /* check status */
             case _HEAPOK:
                break;
             case _HEAPEND:
                running = FALSE;
                break;
             default:
                ASSERT(FALSE);
                running = FALSE;
                continue;
            } /* check status */

         if(info._useflag == _USEDENTRY)
            { /* used block */
             UsedBytes += info._size;
            } /* used block */
        } /* scan heap */
     return UsedBytes;
    } // CountWalk

and the code is modified as shown below

    for(int i = 0; i < 1000; i++)
       { /* use storage */
        CSize sz = dc.GetTextExtent(_T("abcdefg"));
        _CrtMemCheckpoint(&ms);
        TRACE("%4d: Total space allocated (_CrtMemCheckpoint): %i\n", i, ms.lTotalCount);
        TRACE("%4d: Total space allocated (_heapwalk): %i\n", i, CountWalk());
       } /* use storage */
 

then the new output is

   0: Total space allocated (_CrtMemCheckpoint): 9249
   0: Total space allocated (_heapwalk): 17424
   1: Total space allocated (_CrtMemCheckpoint): 9269
   1: Total space allocated (_heapwalk): 17424
   2: Total space allocated (_CrtMemCheckpoint): 9289
   2: Total space allocated (_heapwalk): 17424
   3: Total space allocated (_CrtMemCheckpoint): 9309
   3: Total space allocated (_heapwalk): 17424
   4: Total space allocated (_CrtMemCheckpoint): 9329
   4: Total space allocated (_heapwalk): 17424
...
 195: Total space allocated (_CrtMemCheckpoint): 13149
 195: Total space allocated (_heapwalk): 17424
 196: Total space allocated (_CrtMemCheckpoint): 13169
 196: Total space allocated (_heapwalk): 17424
 197: Total space allocated (_CrtMemCheckpoint): 13189
...
 473: Total space allocated (_CrtMemCheckpoint): 18709
 473: Total space allocated (_heapwalk): 17424
 474: Total space allocated (_CrtMemCheckpoint): 18729
 474: Total space allocated (_heapwalk): 17424
 475: Total space allocated (_CrtMemCheckpoint): 18749
 475: Total space allocated (_heapwalk): 17424
...
 931: Total space allocated (_CrtMemCheckpoint): 27869
 931: Total space allocated (_heapwalk): 17424
 932: Total space allocated (_CrtMemCheckpoint): 27889
 932: Total space allocated (_heapwalk): 17424
 933: Total space allocated (_CrtMemCheckpoint): 27909
 933: Total space allocated (_heapwalk): 17424
 934: Total space allocated (_CrtMemCheckpoint): 27929
 934: Total space allocated (_heapwalk): 17424
 935: Total space allocated (_CrtMemCheckpoint): 27949
 935: Total space allocated (_heapwalk): 17424
... 

Note that while the _CrtMemCheckpoint value keeps going up, the actual count of bytes allocated remains constant.

And how long is your ruler, exactly?

Back when I used to do lectures on such topics, one of the things I would do is hand out a set of rulers (that had been printed on transparencies) and a piece of paper with two points on it.  I hand these out myself, one at a time, by walking along between the students, or, in other contexts, I have a student assistant do this.  The points were deliberately created to be 1/4" in diameter.  The question is "How far apart are these dots?"  That's the whole question.  The answer was specified as needing to be accurate to 1/32".  The students had to measure the distance, write it down, and then we compared the answers.  The rule was that there was to be no talking to each other during the experiment, no communication at all, and I would answer no questions.  (The dots were exactly two inches apart, center-to-center).

I handed out a set of rulers that were largely erroneous.  They were identified by letters.  Each ruler is four inches long. I can no longer find the file I used to create these, but you can assume that something like this was true

Ruler Characteristic
A Marks are 1/16" apart.  There are only inch marks and 1/16" marks.  There are 15 marks per inch.
B Marks are 1/16" apart.  There are only inch marks and 1/16" marks.  There are 16 marks per inch.  One inch on the ruler is 15/16 of a standard inch.
C Marks are 1/16" apart.  There are only inch marks and 1/16" marks.  There are 16 marks per inch.  One inch on the ruler is 11/16" of a standard inch.
D Same as B, but one inch on the ruler is 7/8 of a standard inch.
E Same as C, but one inch on the ruler is 11/8 of a standard inch.
F Same as B, except the first inch is 31/32", the second inch is , the third inch is  29/32", the fourth inch is 7/8".
G Marks are 1/16" apart.  There are only inch marks and 1/16" marks.  The first inch has only 15 marks.

I poll the students for their results and write them down.  The results were wonderful.  No two people got the same result! 

The protocol was that all they could say when asked was the measurement they got; the experiment was not over yet.

Next, I asked the people on the left to exchange their rulers with the people on the right, and measure again.  I then write down the results of the second set of measurements.

Now they're allowed to talk.

First, one of the questions was "did you want the distance between the dots, or the center-to-center distance, or the edge-to-edge distance?"  Of course, this is a valid question, and I carefully avoided stating it.  The importance of this question is that you first have to be able to specify what you are measuring!  Lacking that specification, all answers are meaningless.  So the first important question is:

1. What are you measuring?

But the real problem is that there is no real calibration of the rulers.  There is no guarantee that they are actually measuring an inch.  This gets back to the question of calibration of the measuring device.  This is the second important question:

2. What are you measuring with?

I note that everyone made exactly one measurement by which they arrived at their conclusion.  I'm not sure what validity this has.  In a course I used to teach to 11-14-year-olds, we had to measure the height of the Museum ceiling, which was three stories about the floor.  The way we did this was to take a laser pointer and place it in a block of wood, which I had carefully drilled so the hole was as precisely vertical as I could manage, and the laser pointer fit snugly into it.  Then a pair of students would take another laser pointer and a protractor, measure out 20 feet from the vertical laser, and point the second laser so that the two dots were as close as they could get.  They would then write down the angle.

When we got back to the classroom, they would plot, on graph paper, the vertical, the horizontal, and the angle.  The intercept of the angle line with the vertical is the height of the ceiling.  Of course, we got a large number of results.  The ceiling was anywhere from 48 to 72 feet high!  I use this to teach them that science is never precise; we work out the standard error, and no competent scientist would say "The answer is precisely x" but would say "based on a series of measurements, the answer is x, ±e".  And that's the lesson.  This is the third important question:

3. How accurate are your measurements?

Applying the rules

So when I see a result that looks wrong, I go looking for independent verification of the behavior.  My first approach was to simply examine the relevant code generated by the compiler.  To make sure I was reading the right code, I actually looked at the generated code.

The original poster had used VS6, so this is the code from VS6.  The argument was that it appeared to happen in debug mode, not in release mode, so I examined the code from both:

CSize sz = dc.GetTextExtent(_T("ABCDEF"));
VS6 DEBUG

00401991 68 D4 53 41 00       push        offset string "ABCDEF" (004153d4)
00401996 8D 4D C8             lea         ecx,[ebp-38h]
00401999 E8 4A 05 00 00       call        CString::CString (00401ee8)
0040199E C6 45 FC 02          mov         byte ptr [ebp-4],2
004019A2 8D 45 C8             lea         eax,[ebp-38h]
004019A5 50                   push        eax
004019A6 8D 4D C0             lea         ecx,[ebp-40h]
004019A9 51                   push        ecx
004019AA 8D 4D D4             lea         ecx,[ebp-2Ch]
004019AD E8 30 05 00 00       call        CDC::GetTextExtent (00401ee2)
004019B2 8B 10                mov         edx,dword ptr [eax]
004019B4 8B 40 04             mov         eax,dword ptr [eax+4]
004019B7 89 55 CC             mov         dword ptr [ebp-34h],edx
004019BA 89 45 D0             mov         dword ptr [ebp-30h],eax
004019BD C6 45 FC 01          mov         byte ptr [ebp-4],1
004019C1 8D 4D C8             lea         ecx,[ebp-38h]
004019C4 E8 37 05 00 00       call        CString::~CString (00401f00)

VS6 RELEASE:

00401379 68 20 30 40 00       push        offset string "ABCDEF" (00403020)
0040137E 8D 4C 24 10          lea         ecx,[esp+10h]
00401382 C7 44 24 38 01 00 00 mov         dword ptr [esp+38h],1
0040138A E8 95 03 00 00       call        CString::CString (00401724)
0040138F 8B 44 24 0C          mov         eax,dword ptr [esp+0Ch]
00401393 8D 54 24 10          lea         edx,[esp+10h]
00401397 52                   push        edx
00401398 8B 48 F8             mov         ecx,dword ptr [eax-8]
0040139B 51                   push        ecx
0040139C 50                   push        eax
0040139D 8B 44 24 2C          mov         eax,dword ptr [esp+2Ch]
004013A1 50                   push        eax
004013A2 FF 15 00 20 40 00    call        dword ptr [__imp__GetTextExtentPoint32A@16 (00402000)]
004013A8 8D 4C 24 0C          lea         ecx,[esp+0Ch]
004013AC E8 7F 03 00 00       call        CString::~CString (00401730) 

The code seems completely obvious: a temporary CString object is created, used, and freed.  So there should be no storage leak.

I posted this, and the argument was made that CString::~CString obviously wasn't releasing the storage.  Now, this seemed very unlikely to me.  Any bug this egregious would have been fixed a decade ago.

Of course, my immediate question was to ask what the basis for this allegation was (What are you measuring?  What are you measuring with?)

The answer was the code shown above that uses _CrtMemCheckpoint.  Now, this didn't make any sense.  If you believe the specification, at least.  So it meant that there was either something very deeply wrong, or the results were a lie.  So I just single-stepped into the code and watched the _CrtMemCheckpoint code execute.  I saw what variable it was storing.  That variable was clearly incrementing.  So I next watched the creation of the string, and saw the line of code that incremented _lTotalAlloc.  Then I watched the code that freed the string, and saw no line of code that decremented _lTotalAlloc.

So next, I wrote the WalkCount function to enumerate the blocks of storage and count the allocations.  It gave a correct value that did not increase.

End of story.  The lTotalCount field is simply wrong. 

But the key when making any measurement is simply to make sure that the measurement is measuring what you think it is measuring, and that it is measuring it correctly, and reporting it correctly.

[Dividing Line Image]

The views expressed in these essays are those of the author, and in no way represent, nor are they endorsed by, Microsoft.

Send mail to newcomer@flounder.com with questions or comments about this web site.
Copyright © 2007 Joseph M. Newcomer, All Rights Reserved.
Last modified: May 14, 2011