volatile: the great debate

Home
Back To Tips Page

There has been a raging debate about the need for the volatile keyword. I have had many years of experience building and using optimizing compilers (not to mention my Ph.D. dissertation on optimizing compilers), but this apparently is thought to be insufficient qualification. I have been personally attacked as being incompetent in the profession, and accused of being irresponsible in asserting that the volatile keyword is required.

My qualifications

Nonetheless, I am considered by some to be unqualified to offer any opinions on optimizing compiler technology, caching technology, or apparently anything much dealing with concurrency.

Normally, I tend to be laid-back, but in this case, the personal attack was just a little too personal, including an allegation made without specific details, making it hard to prove my innocence, if I am indeed innocent.

This debate had degenerated to a "you're wrong" finger-pointing exercise. Until the personal attack, I was willing to drop it; I'm allowed to be wrong. It happens. In this case, I did not believe I was wrong. But everyone is entitled to his or her opinion.

But given that there is an opinion that I know nothing at all about what I'm talking about, I decided that the only recourse was to ask an expert. One who is recognized as a world-class expert in the C language.

I consulted with Dr. Samuel P. Harbison, co-author of the highly-respected book C: A Reference Manual. It is now in its fifth edition.

Sam has been an implementor of C compilers for a couple decades. He was project manager of several C and C++ projects, has been a member of the C Standards group, was chairman of the C++ standards committee for three years, and has more years of detailed C and C++ experience than a considerable number of self-declared C or C++ experts. He is a former Texas Instruments Fellow whose specialty was C and C++ compiler technology. This is not an award that TI bestows lightly.

He has given me permission to print his response to my question to him. I am reproducing the email in its entirety below.

If you don't want to read the whole thing, the summary is this: one of the world's experts on the C language says that volatile is necessary in a multithreaded environment. Not sufficient, but necessary. So in this case, it is no longer my "opinion", as some easy-to-discredit mere ex-academic; this is the opinion of someone whose credentials are absolutely impeccable.

So make your own decision as to who knows what they're talking about here. I will no longer participate in this debate, except to point to this Web page each time the issue comes up.

Note that I kept my position fairly disguised; I merely stated a set of opinions without any attributions. So Sam is not just agreeing with me. He is expressing his opinion as a C/C++ expert.

One of the challenges made to me is "prove me wrong". I leave it to the judgment of the readers to decide if this constitutes sufficient proof.

Hi Joe,

I reviewed the standard and Rationale and it is clear that you need _volatile_.

In C99 5.1.2.3: "The least requirements on a conforming implementation are: At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred....". (Note the use of _volatile_ in the statement.)

Sequence points define the behavior of the abstract machine. They are not binding on actual implementations, which is still free to operate "as if" the precise rules of the abstract machine were followed. One of these "as ifs" (from C99 5.1.2.3): "An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)."

Example 1 in 5.1.2.3 makes it clear that the other behavior in your email is only an option: "[A]n implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. In such an implementation, at the time of each function entry and function return where the calling function and the called function are in different translation units, the values of all externally linked objects and of all objects accessible via pointers therein would agree with the abstract semantics.... In this type of implementation, objects referred to by interrupt service routines activated by the signal function would require explicit specification of volatile storage..."  Although it refers to interrupt service routines, I think threads are the same case.  The standard does not mention processes or threads.

The Rationale says (6.7.3): "A volatile object is also an appropriate model for a variable shared among multiple processes."

Sam
-----Original Message-----

From: Joseph M. Newcomer [mailto:newcomer@flounder.com]
Sent: Sunday, January 04, 2004 12:42 AM
To: harbison@acm.org; Guy.Steele@Sun.com

Subject: A C question


Hi, Happy New Year and all that!

Sam, Guy,

There has been a debate raging on a newsgroup about the role of the 'volatile' keyword in C. There are two radically different views being expressed, and I was hoping either or both of you might be able to shed some light on this problem. The key issue deals with the use of the keyword 'volatile' to deal with variables in a multithreaded, potentially multiprocessor, environment. It applies to the use of this qualifier on variables that are used by more than one thread.

One view holds that it should always be used to defeat any compiler optimizations that may occur. Another view holds that it is never necessary if proper synchronization is performed.

The synchronization view states that if you write code

        lock(mutex);
        ...access and modify shared variables
        unlock(mutex);

that volatile is never needed on the variables because the semantics of the C language demand that all side effects be consolidated at the sequence points. This view also holds that if you have a program of the form

        lock(mutex);
        x = value;      // [1]
        unlock(mutex);
        
        lock(mutex);
        something = x;  // [2]
        unlock(mutex);
        whatever = something; // [3]

that the compiler is obliged, because the lock and unlock functions are being called, to never perform a code motion that would assume the value x can be propagated, and in fact a code motion that allowed the compiler to do value propagation that produced the assignment for [3] of

        whatever = value;

is an illegal optimization.

From H&S 4th edition, §7.14 "As a general rule, a compiler is free to generate any code equivalent in computational behavior to the program written", hence the need to use volatile is suggested. In particular, one view holds that a compiler which is doing aggressive optimizations may detect, via either scope knowledge (as particularly might be available in C++), global optimization techniques up to and including inlining (whether explicitly requested by the programmer, or implicitly determined by the compiler), or whatever, that the call 'unlock(mutex)' in no way can affect the computation, and would feel free perform code motions that would move the memory accesses which are above the unlock call to below the unlock call. This view holds that the use of volatile necessarily defeats such code motions, because of the semantics of volatile which "should not participate in optimizations that would increase, decrease, or delay any references to, or modifications of, the object" (H&S §4.4.5). The volatile-necessary view holds that in a compiler where, for example, x is a member of a C++ class, and lock() and unlock() are global (perhaps OS API) functions, it could be asserted by an aggressive compiler that neither call could possibly modify the value of x, and consequently the optimization suggested by [3] would be legal. A conservative compiler may or may not make such an assumption, but would be free to do so. A very conservative compiler, one which rigidly adhered to the semantics of sequence points, would generate correct code. The volatile-necessary view holds that the programmer may not assume the nature of the optimizer. There seems to be additional confusion about the behavior of caches, pipelines, etc. One assertion is that volatile solves all this. Another assertion is that volatile is unrelated to this problem. Yet another assertion states that as long as the memory accesses all take place within the scope of a lock/unlock, there is no problem, since because on at least one architecture the locking code forces a cache/pipeline-to-memory flush, it must be necessarily true on all architectures, and in fact it is the responsibility of the unlock code to ensure this correctness. Based on the issues of code motion, there is an assertion made that even an explicit call to a memory-fence operation would not stop an aggressive compiler from performing code motions that moved accesses (reads or writes) to beyond the call. Therefore, unless volatile is specified, the compiler has the permission to perform such optimizations. This means that the memory-fence request, unless it is explicitly understood by the compiler (e.g., a compiler intrinsic or other construct that blocks code motions) is indistinguishable from other function call, and therefore an aggressive compiler would not treat it specially. An argument is made that because of the rules of sequence points, the compiler must accumulate all side effects at each sequence point, and consequently the unlock(), which contains two sequence points (completion of argument evaluation, and the end of a full expression) that all the side effects will take place, and the compiler is therefore forced to ensure that all memory references are forced to take place before the completion of the unlock() call. The contrarian view holds that an aggressive optimizing compiler making a single-thread assumption is only required to produce computationally-equivalent code within that single-thread assumption, and consequently would miss the semantic implications of a mutex unlock in a multithreaded environment. A further contrarian view holds that because of pipelines, caches, etc., that memory writes could be delayed by the hardware for arbitrary periods of time, and that volatile in no way solves this, and consequently a portable program requires both volatile and the explicit placement of memory fences. Here's an excerpt from one of the discussions:

============================================================================

Here's a simplistic explanation which is probably not far off the mark  for VC. (To simplify the discussion, I'm going to talk about globals, but if you can find a counter-example which, say, plays games with "static", let me know.) The mutex lock/unlock operations are function calls, and the compiler knows nothing about these functions, so it can't do any interprocedural optimization. Global variables are reachable through functions called by the current function, including lock/unlock. (Surprisingly, it seems incidental that other threads can access them.) The compiler can't see into the lock/unlock functions to determine that they don't access the globals or call other functions which ultimately do access them. Thus, when you have the sequence below, for non-volatile, global variables x and y:

    m.lock();
    y = x;
    x = 2;
    m.unlock();

    The compiler cannot optimize the assignment to x out of existence, because it can't tell that unlock() won't refer to x. It can't move the y and x  assignments before or after the lock/unlock calls, because that can change the values those functions observe. It can't cache the value of x, call lock(), and assign the cached value to y, because lock() may have modified x. Before calling unlock(), it must flush x and y out of registers to  memory, so that unlock() will observe their current values. And so on. The only way I know to screw this up is to write to the variables outside of the critical section, but that's a violation of the locking protocol. So at the compiler level, the variables don't need to be volatile. In addition to providing mutual exclusion, the mutex lock/unlock operations  issue whatever memory barrier instructions are necessary, so that the writes are visible to other threads observing the locking protocol. So at the hardware level, there's no need for the variables to be volatile, assuming volatile implies MB instructions, because they're implicit in the mutex lock/unlock operations.

    What in the world do you think volatile adds to this? As already mentioned, all I see is volatile slowing down execution here, while making you cast away volatile to use member functions of classes like CString, which now that I'm thinking about it, is undefined per the C++ Standard, 7.1.5.1/7. So unless a class X provides volatile member functions, you can't declare a volatile X and call member functions on it, because casting away volatile and referring to non-volatile members is undefined.    (And I defy you to name a class which defines volatile member functions.)  (NB: A compiler which can see into the locking operations would have to mark  them somehow to suppress optimizations which can violate the expected semantics. There's no other reasonable choice.)

    >Given that volatile is strictly dealing with optimizations of the compiler, I'm not sure

    >how its absence could be ignored in an optimized build. I think you misunderstand the role

    >of volatile.

    Nope. However, you clearly believe the common misconception that volatile is  necessary in MT programming.

============================================================================

One of the key issues being debated appears to relate to whether or not a compiler should optimize based on a single-thread or multi-thread model. The notion of sequence points represents an abstract concept of semantics; the notion of optimization allows a compiler to reorder pretty much anything it feels like, as long as the resulting computation is computationally equivalent. Consequently, a compiler which treats sequence points with a single-thread model will fail if placed in a preemptive multithreaded (or multiprocessor) environment, where its key assumption would be violated. The assumption that the compiler must necessarily assume that a global function will modify variables is key here. The volatile-necessary view holds that this is not a safe assumption, that scope rules, particularly in C++, could allow an aggressive compiler to violate program semantics.

Most of the justification for the "sequence points rule" and "synchronization means volatile is not required" views seem to be based on observation of one particular compiler which is fairly conservative in its default optimizations, at least in one particular release. The contrarian opinion states that volatile should always be used both for portability across platforms and for portability across possible compiler changes, up to and including user-selectable optimization choices. The assertion is made that "because this compiler does not show the generation of incorrect code when volatile is omitted and proper synchronization is used, volatile is never necessary". The contrarian view holds that this observation is based solely on a compiler that is very conservative in its optimizations, with a conjecture that it must be conservative because a large body of volatile-free (and by this view, incorrectly-written) code used in multiprocessor environments already exists, and aggressive optimizations would result in difficult-to-handle bugs in the existing code body. If you could offer any clarifications of this, I would appreciate it. I would also like permission to post your replies to a newsgroup, with appropriate credit given.

                        thanks

                                        joe

 

Sure

-----Original Message-----
From: Joseph M. Newcomer [mailto:newcomer@flounder.com]
Sent: Monday, January 05, 2004 4:17 PM
To: 'Dr. Sam Harbison'
Subject: RE: A C question

Thank you for the quick response. Can I post your reply to the newsgroup?

Joe

[Dividing Line Image]

The views expressed in these essays are those of the author, and in no way represent, nor are they endorsed by, Microsoft.

Send mail to newcomer@flounder.com with questions or comments about this web site.
Copyright © 2001-2003, The Joseph M. Newcomer Co. All Rights Reserved.
Last modified: May 14, 2011