volatile: the great debate

	Home
	Back To Tips Page

There has been a raging debate about the need for the volatile keyword. I have had many years of experience building and using optimizing compilers (not to mention my Ph.D. dissertation on optimizing compilers), but this apparently is thought to be insufficient qualification. I have been personally attacked as being incompetent in the profession, and accused of being irresponsible in asserting that the volatile keyword is required.

My qualifications

I have been working with optimizing compilers since 1969. I know most of the bugs an optimizing compiler can have.
From 1973-1975 I did research on optimizing compilers, culminating in a Ph.D. dissertation in the area which is now regarded as one of the founding documents on the topic of automating compiler code generation for optimizing compilers [J.M Newcomer, Machine-Independent Generation of Optimal Local Code, PhD. Dissertation, Carnegie-Mellon University, 1975].
From 1975-1978 I worked on a multithreaded, multiprocessor operating system using an optimizing compiler [William A. Wulf, Roy Levin, and Samuel P. Harbison. HYDRA/C.mmp: An Experimental Computer System. McGraw Hill, 1981]. I was manager on that project, and had considerable technical coding responsibilities for multithreaded applications on that system. Among other problems we had to face were issues of cache coherency (one of the books cited by the linked book is the Wulf/Harbison book on the Hydra operating system).
From 1978-1981 I did research on optimizing compilers and compiler architectures. A paper co-authored with Rick Cattell and Bruce Leverett during that period has been selected as one of the 50 most important papers in compiler technology in the last 20 years (check out item 1 in that listing).
I served as a thesis co-advisor to Rick Cattell, whose work is also considered seminal in the field [R.G.G. Cattell: Formalization and Automatic Derivation of Code Generators. PhD dissertation, Carnegie, Mellon University, April 1978] (Bill Wulf is the de jure advisor, because I was research faculty, not tenure-track faculty, and the University is fussy about these fine points; but Bill and I were co-advisors on this thesis).
In 1980, work that Ben Brosgol and I did on describing the internal data structures of an Ada compiler became the basis of what eventually became a formal intermediate language specification for Ada compilers.
I have worked with a few of the world's experts on optimizing compilers, including Chuck Geshhke, Bill Wulf, Sam Harbison, Bruce Leverett, Steve Hobbs, Rick Cattell, and Guy L. Steele Jr. [e.g., B. W. Leverett, R. G. G. Cattell, S. O. Hobbs, J. M. Newcomer, A. H. Reiner, B. R. Schatz, and W. A. Wulf, "An Overview of the Production-Quality Compiler-Compiler Project," Computer 13:8 (August 1980), pp. 38-49.]
From 1981-1983 I worked for a company that built optimizing compilers, including an optimizing C compiler. Sam Harbison was the manager of the C compiler project. Guy L. Steele Jr. was one of the many people on the project. Bill Wulf was the head of the compiler project effort. I worked on a number of tools involved with that effort, including key components of the automated optimization technology (the "Machine Description Compiler", which built the tables used by the optimizer).
During that period I served as the outside committee member on another optimization-related Ph.D. dissertation [Vegdahl, S. R., Local Code Generation and Compaction in Optimizing Microcode Compilers, PhD Dissertation, December 1982], and a Master's thesis [Reiner, A. Cost Minimization in Register Assignment, Masters paper, July 1983].
In 1999, I was co-author of a book on device drivers. I did this with someone who had been writing device drivers for nearly a decade at that point (starting with Windows 3.0, if I recall correctly). You cannot write a device driver without a lot of concerns about memory fences, cache coherency, write pipes, and a lot of other concurrency-related issues that are down at the hardware level. I now teach a course on device drivers that we developed.

Nonetheless, I am considered by some to be unqualified to offer any opinions on optimizing compiler technology, caching technology, or apparently anything much dealing with concurrency.

Normally, I tend to be laid-back, but in this case, the personal attack was just a little too personal, including an allegation made without specific details, making it hard to prove my innocence, if I am indeed innocent.

This debate had degenerated to a "you're wrong" finger-pointing exercise. Until the personal attack, I was willing to drop it; I'm allowed to be wrong. It happens. In this case, I did not believe I was wrong. But everyone is entitled to his or her opinion.

But given that there is an opinion that I know nothing at all about what I'm talking about, I decided that the only recourse was to ask an expert. One who is recognized as a world-class expert in the C language.

I consulted with Dr. Samuel P. Harbison, co-author of the highly-respected book C: A Reference Manual. It is now in its fifth edition.

Sam has been an implementor of C compilers for a couple decades. He was project manager of several C and C++ projects, has been a member of the C Standards group, was chairman of the C++ standards committee for three years, and has more years of detailed C and C++ experience than a considerable number of self-declared C or C++ experts. He is a former Texas Instruments Fellow whose specialty was C and C++ compiler technology. This is not an award that TI bestows lightly.

He has given me permission to print his response to my question to him. I am reproducing the email in its entirety below.

If you don't want to read the whole thing, the summary is this: one of the world's experts on the C language says that volatile is necessary in a multithreaded environment. Not sufficient, but necessary. So in this case, it is no longer my "opinion", as some easy-to-discredit mere ex-academic; this is the opinion of someone whose credentials are absolutely impeccable.

So make your own decision as to who knows what they're talking about here. I will no longer participate in this debate, except to point to this Web page each time the issue comes up.

Note that I kept my position fairly disguised; I merely stated a set of opinions without any attributions. So Sam is not just agreeing with me. He is expressing his opinion as a C/C++ expert.

One of the challenges made to me is "prove me wrong". I leave it to the judgment of the readers to decide if this constitutes sufficient proof.

Hi Joe,

I reviewed the standard and Rationale and it is clear that you need _volatile_.

In C99 5.1.2.3: "The least requirements on a conforming implementation are: At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred....". (Note the use of _volatile_ in the statement.)

Sequence points define the behavior of the abstract machine. They are not binding on actual implementations, which is still free to operate "as if" the precise rules of the abstract machine were followed. One of these "as ifs" (from C99 5.1.2.3): "An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)."

Example 1 in 5.1.2.3 makes it clear that the other behavior in your email is only an option: "[A]n implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. In such an implementation, at the time of each function entry and function return where the calling function and the called function are in different translation units, the values of all externally linked objects and of all objects accessible via pointers therein would agree with the abstract semantics.... In this type of implementation, objects referred to by interrupt service routines activated by the signal function would require explicit specification of volatile storage..." Although it refers to interrupt service routines, I think threads are the same case. The standard does not mention processes or threads.

The Rationale says (6.7.3): "A volatile object is also an appropriate model for a variable shared among multiple processes."

Sam

-----Original Message-----

From: Joseph M. Newcomer [mailto:newcomer@flounder.com]
Sent: Sunday, January 04, 2004 12:42 AM
To: harbison@acm.org; Guy.Steele@Sun.com

Subject: A C question

Hi, Happy New Year and all that!

Sam, Guy,

There has been a debate raging on a newsgroup about the role of the 'volatile' keyword in C. There are two radically different views being expressed, and I was hoping either or both of you might be able to shed some light on this problem. The key issue deals with the use of the keyword 'volatile' to deal with variables in a multithreaded, potentially multiprocessor, environment. It applies to the use of this qualifier on variables that are used by more than one thread.

One view holds that it should always be used to defeat any compiler optimizations that may occur. Another view holds that it is never necessary if proper synchronization is performed.

The synchronization view states that if you write code

        lock(mutex);

        ...access and modify shared variables

        unlock(mutex);

that volatile is never needed on the variables because the semantics of the C language demand that all side effects be consolidated at the sequence points. This view also holds that if you have a program of the form

        lock(mutex);

        x = value;      // [1]

        unlock(mutex);

        lock(mutex);

        something = x;  // [2]

        unlock(mutex);

        whatever = something; // [3]

that the compiler is obliged, because the lock and unlock functions are being called, to never perform a code motion that would assume the value x can be propagated, and in fact a code motion that allowed the compiler to do value propagation that produced the assignment for [3] of

        whatever = value;

is an illegal optimization.

From H&S 4th edition, §7.14 "As a general rule, a compiler is free to generate any code equivalent in computational behavior to the program written", hence the need to use volatile is suggested. In particular, one view holds that a compiler which is doing aggressive optimizations may detect, via either scope knowledge (as particularly might be available in C++), global optimization techniques up to and including inlining (whether explicitly requested by the programmer, or implicitly determined by the compiler), or whatever, that the call 'unlock(mutex)' in no way can affect the computation, and would feel free perform code motions that would move the memory accesses which are above the unlock call to below the unlock call. This view holds that the use of volatile necessarily defeats such code motions, because of the semantics of volatile which "should not participate in optimizations that would increase, decrease, or delay any references to, or modifications of, the object" (H&S §4.4.5). The volatile-necessary view holds that in a compiler where, for example, x is a member of a C++ class, and lock() and unlock() are global (perhaps OS API) functions, it could be asserted by an aggressive compiler that neither call could possibly modify the value of x, and consequently the optimization suggested by [3] would be legal. A conservative compiler may or may not make such an assumption, but would be free to do so. A very conservative compiler, one which rigidly adhered to the semantics of sequence points, would generate correct code. The volatile-necessary view holds that the programmer may not assume the nature of the optimizer. There seems to be additional confusion about the behavior of caches, pipelines, etc. One assertion is that volatile solves all this. Another assertion is that volatile is unrelated to this problem. Yet another assertion states that as long as the memory accesses all take place within the scope of a lock/unlock, there is no problem, since because on at least one architecture the locking code forces a cache/pipeline-to-memory flush, it must be necessarily true on all architectures, and in fact it is the responsibility of the unlock code to ensure this correctness. Based on the issues of code motion, there is an assertion made that even an explicit call to a memory-fence operation would not stop an aggressive compiler from performing code motions that moved accesses (reads or writes) to beyond the call. Therefore, unless volatile is specified, the compiler has the permission to perform such optimizations. This means that the memory-fence request, unless it is explicitly understood by the compiler (e.g., a compiler intrinsic or other construct that blocks code motions) is indistinguishable from other function call, and therefore an aggressive compiler would not treat it specially. An argument is made that because of the rules of sequence points, the compiler must accumulate all side effects at each sequence point, and consequently the unlock(), which contains two sequence points (completion of argument evaluation, and the end of a full expression) that all the side effects will take place, and the compiler is therefore forced to ensure that all memory references are forced to take place before the completion of the unlock() call. The contrarian view holds that an aggressive optimizing compiler making a single-thread assumption is only required to produce computationally-equivalent code within that single-thread assumption, and consequently would miss the semantic implications of a mutex unlock in a multithreaded environment. A further contrarian view holds that because of pipelines, caches, etc., that memory writes could be delayed by the hardware for arbitrary periods of time, and that volatile in no way solves this, and consequently a portable program requires both volatile and the explicit placement of memory fences. Here's an excerpt from one of the discussions:

============================================================================

Here's a simplistic explanation which is probably not far off the mark for VC. (To simplify the discussion, I'm going to talk about globals, but if you can find a counter-example which, say, plays games with "static", let me know.) The mutex lock/unlock operations are function calls, and the compiler knows nothing about these functions, so it can't do any interprocedural optimization. Global variables are reachable through functions called by the current function, including lock/unlock. (Surprisingly, it seems incidental that other threads can access them.) The compiler can't see into the lock/unlock functions to determine that they don't access the globals or call other functions which ultimately do access them. Thus, when you have the sequence below, for non-volatile, global variables x and y:

    m.lock();

    y = x;

    x = 2;

    m.unlock();

The compiler cannot optimize the assignment to x out of existence, because it can't tell that unlock() won't refer to x. It can't move the y and x assignments before or after the lock/unlock calls, because that can change the values those functions observe. It can't cache the value of x, call lock(), and assign the cached value to y, because lock() may have modified x. Before calling unlock(), it must flush x and y out of registers to memory, so that unlock() will observe their current values. And so on. The only way I know to screw this up is to write to the variables outside of the critical section, but that's a violation of the locking protocol. So at the compiler level, the variables don't need to be volatile. In addition to providing mutual exclusion, the mutex lock/unlock operations issue whatever memory barrier instructions are necessary, so that the writes are visible to other threads observing the locking protocol. So at the hardware level, there's no need for the variables to be volatile, assuming volatile implies MB instructions, because they're implicit in the mutex lock/unlock operations.

What in the world do you think volatile adds to this? As already mentioned, all I see is volatile slowing down execution here, while making you cast away volatile to use member functions of classes like CString, which now that I'm thinking about it, is undefined per the C++ Standard, 7.1.5.1/7. So unless a class X provides volatile member functions, you can't declare a volatile X and call member functions on it, because casting away volatile and referring to non-volatile members is undefined. (And I defy you to name a class which defines volatile member functions.) (NB: A compiler which can see into the locking operations would have to mark them somehow to suppress optimizations which can violate the expected semantics. There's no other reasonable choice.)

>Given that volatile is strictly dealing with optimizations of the compiler, I'm not sure

>how its absence could be ignored in an optimized build. I think you misunderstand the role

>of volatile.

Nope. However, you clearly believe the common misconception that volatile is necessary in MT programming.

============================================================================

One of the key issues being debated appears to relate to whether or not a compiler should optimize based on a single-thread or multi-thread model. The notion of sequence points represents an abstract concept of semantics; the notion of optimization allows a compiler to reorder pretty much anything it feels like, as long as the resulting computation is computationally equivalent. Consequently, a compiler which treats sequence points with a single-thread model will fail if placed in a preemptive multithreaded (or multiprocessor) environment, where its key assumption would be violated. The assumption that the compiler must necessarily assume that a global function will modify variables is key here. The volatile-necessary view holds that this is not a safe assumption, that scope rules, particularly in C++, could allow an aggressive compiler to violate program semantics.

Most of the justification for the "sequence points rule" and "synchronization means volatile is not required" views seem to be based on observation of one particular compiler which is fairly conservative in its default optimizations, at least in one particular release. The contrarian opinion states that volatile should always be used both for portability across platforms and for portability across possible compiler changes, up to and including user-selectable optimization choices. The assertion is made that "because this compiler does not show the generation of incorrect code when volatile is omitted and proper synchronization is used, volatile is never necessary". The contrarian view holds that this observation is based solely on a compiler that is very conservative in its optimizations, with a conjecture that it must be conservative because a large body of volatile-free (and by this view, incorrectly-written) code used in multiprocessor environments already exists, and aggressive optimizations would result in difficult-to-handle bugs in the existing code body. If you could offer any clarifications of this, I would appreciate it. I would also like permission to post your replies to a newsgroup, with appropriate credit given.

thanks

joe

Sure

-----Original Message-----
From: Joseph M. Newcomer [mailto:newcomer@flounder.com]
Sent: Monday, January 05, 2004 4:17 PM
To: 'Dr. Sam Harbison'
Subject: RE: A C question

Thank you for the quick response. Can I post your reply to the newsgroup?

Joe

The views expressed in these essays are those of the author, and in no way represent, nor are they endorsed by, Microsoft.

volatile: the great debate

Send mail to newcomer@flounder.com with questions or comments about this web site. Copyright © 2001-2003, The Joseph M. Newcomer Co. All Rights Reserved. Last modified: May 14, 2011

Send mail to newcomer@flounder.com with questions or comments about this web site.
Copyright © 2001-2003, The Joseph M. Newcomer Co. All Rights Reserved.
Last modified: May 14, 2011