Are global variables really all that bad?

There has been, for a long time, a nagging in the back of my mind about the issue of global variables. We all know that they are bad, don’t we? I am as much indoctrinated into this belief as anyone…. but is it actually a valid view of things?

Here is a page describing some reasons why they are considered bad practice, which I quote:

  • Non-locality — Source code is easiest to understand when the scope of its individual elements are limited. Global variables can be read or modified by any part of the program, making it difficult to remember or reason about every possible use.
  • No Access Control or Constraint Checking — A global variable can be get or set by any part of the program, and any rules regarding its use can be easily broken or forgotten. (In other words, get/set accessors are generally preferable over direct data access, and this is even more so for global data.) By extension, the lack of access control greatly hinders achieving security in situations where you may wish to run untrusted code (such as working with 3rd party plugins).
  • Implicit coupling — A program with many global variables often has tight couplings between some of those variables, and couplings between variables and functions. Grouping coupled items into cohesive units usually leads to better programs.
  • Concurrency issues — if globals can be accessed by multiple threads of execution, synchronization is necessary (and too-often neglected). When dynamically linking modules with globals, the composed system might not be thread-safe even if the two independent modules tested in dozens of different contexts were safe.
  • Namespace pollution — Global names are available everywhere. You may unknowingly end up using a global when you think you are using a local (by misspelling or forgetting to declare the local) or vice versa. Also, if you ever have to link together modules that have the same global variable names, if you are lucky, you will get linking errors. If you are unlucky, the linker will simply treat all uses of the same name as the same object.
  • Memory allocation issues — Some environments have memory allocation schemes that make allocation of globals tricky. This is especially true in languages where “constructors” have side-effects other than allocation (because, in that case, you can express unsafe situations where two globals mutually depend on one another). Also, when dynamically linking modules, it can be unclear whether different libraries have their own instances of globals or whether the globals are shared.
  • Testing and Confinement – source that utilizes globals is somewhat more difficult to test because one cannot readily set up a ‘clean’ environment between runs. More generally, source that utilizes global services of any sort (e.g. reading and writing files or databases) that aren’t explicitly provided to that source is difficult to test for the same reason. For communicating systems, the ability to test system invariants may require running more than one ‘copy’ of a system simultaneously, which is greatly hindered by any use of shared services – including global memory – that are not provided for sharing as part of the test.

Now those of you who are interested in programming and have been following me in my thoughts will know that I am both a low level hacker and a software architect. I pretty much explored the art of data hacking for the first 15 years of my career, but later wanted to also explore ways of making software development more successful and effective in terms of re-usability, maintainability and cross platform development. This has lead me to investigate the use of C++ and explore such issues as object orientated programming. I am stills searching for a better way, and this has lead me to question many assumptions and common opinions of software development. I am now trying to develop a new language that solves what I see as the major problems with current languages, and recently this has brought me to seriously question the aversion to globality in software.

Let’s take a look at the above list of issues of the negative aspects of global variables and see what we can make of them.

Non Locality

The argument that code is easier to understand when it is isolated is quite valid and true. A fundamental concept in software design is encapsulation; the idea that we create a black box that has inputs and outputs that work to a specification, and we do not care how it works inside. By using global variables, so the argument goes, we break this encapsulation… we are exposing the inner workings of the black box. However, it is important to consider that one can use global variables in the file scope without exposing such variables to the rest of the code base.

An important point here is that it is not global data that is evil. In fact the non locality of global data can be a major advantage in maintaining simplicity. The danger is that completely global data can be modified by any code without regulation, and this can cause major problems if abused.

Since it is possible to use file global variables and provide functions to access that data, we can conclude that it is erroneous to use the non-locaity argument to suggest that we should avoid global variables. It is more correct to say that when using globals, one should be careful not to over expose the data.

No Access Control or Constraint Checking

This issue is actually not so much about global variables, as about regulation of access to data. Interestingly, if one can solve the access control issue on global data, the non locality argument is greatly diminished (if not eliminated).

A further very important point is to note that access control and constraint checking is a problem with all data, not just global data. Again there is a current fashion in some circles right now for suggesting the heuristic “don’t use accessor functions”. It may be that such espousers of the anti-accessor viewpoint really mean “choose when to use accessor functions carefully” rather than “don’t use them at all”; however the way the debate sometimes seems to go it is almost as if this movement is ready to lynch anyone who even thinks about using them.

Be that as it may, there are some very strong reasons for maintaining control of when data is accessed and modified. One very relevant example is to ensure that code logic is aware of when data has been modified. It is easy to forget that sending a message only when data changes (which requires an accessor function, as C++ does not support messages on data modification directly) can be a lot more efficient than polling (that is, constantly checking the data to observe when it is changed).

Because this issue just just as relevant for local data, I do not think it is a relevant argument against the use of global variables.

Implicit coupling

Implicit coupling means means dependencies between data elements that are not clearly defined, but occur as a result of the functionality of the code. However this is not a problem with global variable usage; once again it is a problem with organisation of data. You can create very clean code that uses file global variables that group related data together. So once again, we can discard this argument.

Concurrency issues

At first we might think that we find here a valid argument against globals. It might seem that global variables do not generally work well in multi-threaded code, but this is because C++ is not a multi-threaded language in the first place and does not natively support multi-threading constructs and protocols. As a result, we tend to group code into objects that can be operated on by different threads. However, this is not a problem with the concept of globality. It is a problem with the management of global data, and the lack of facilities in the language for dealing with this. To get a perspective on how concurrency issues can be solved in a complex multi-threaded environment, think about the architecture of the Internet. This has at its heart the client/server model, where the servers exist in a global database known as the Internet. Yes, the entire Internet is a global database, and has to operate with sometimes thousands of simultaneous connections. If global works in that instance, then how can we say that global data is the problem. Again, I feel that the argument is weak.

Namespace pollution

This reason for avoiding global variables is, again, not very strong. There is a generic problem with all languages concerning the naming of things. If we look at any C++ code that uses classes, it is clear that the classes themselves are global and must be named. What do we do if we need to integrate two different code bases or APIs that both have a class named the same? C++ solves this problem through the addition of namespaces to the language. This is not an entire solution, but it works most of the time. In any case, classes merely shift the problem from the naming of variables to the naming of classes. Is the namespace concern really valid when it comes to it?

Memory allocation issues

Could this me our first strong hit? Perhaps, but really only in the context of language characteristics. The point is not that global data causes memory allocation issues, but that data initialised at start up through being defined at the global scope has characteristics in C++ that cause complications. One common complication from trying is the issue of initialisation order: if many objects are initialised by the run time system before main() is even called, how can we be sure of their order of initialisation? What happens of such objects depend on each other?

Well all that may be true, but what about, for example, creating a global <vector> of pointers to objects? I can not think of any real reason to avoid that, from a memory allocation point of view.  Certainly, this would mean that we can only have one such vector, but that is a different issue, solvable by creating a vector of vectors, or placing the vector in a class that has multiple instances in a vector

Testing and confinement

Although this, again, might seem a valid argument at first, it really does depend on how global data is organised. There is nothing to stop multiple instances running if a global array is used, or to make copies of global data for testing. In fact one could argue that having everything in one place (globally) makes testing easier.. it is certainly much easier to serialise the entire state of a running program if that program uses exclusively global data organised to be all grouped together in a single entity (such as a database like object).

What does this all mean?

These arguments against using global variables seem weak to me, and yet there is an origin to this idea that is founded in good practice. To understand the real problem with globals, it helps to remember that many early languages did not have local variables at all. In such circumstances coding becomes very difficult. The use of global variables exclusively meant that even things such as the index variables of a for loop would be global. It would mean that a for loop using the variable N calling a function that also used and modified N would not work correctly. It is actually hard to convey all the problems that occur if you are forced to use only global variables. In fact, a good exercise might me to try and write something that only uses globals to fully appreciate why languages need local variables to be effective.  And don’t think you can write functions that take arguments: those are local variables. Try making all your functions take (void) and pass data around in global variables. Yes, try to do this, and you will understand why globals used for everything is bad, if you do not already.

I have a good understanding of this because I started to learn to program in Basic. Subroutines existed, but in the early versions of the language, arguments had to be passed in global variables.

So what do we have here? I think that in the search of better programming practices there has been a fondness for simple to follow rules. One such is the idea that global variables are bad, and this has resulted in a very specific way of thinking about programming. However this does not mean that this pattern is ideal. In fact it seems that most programmers (including myself) tend to take certain rules of thumb as the ultimate truth, without questioning them.

I put forward the idea that we should abolish the idea that global variables are bad, and replace this with a different, but more effective rule:

Always limit variables to the scope for which they are needed.

This is a very different rule of thumb, which prevents the bad use of globals, but still allows globals to be freely used where appropriate. It also covers issues with nested scopes for variables.

If some variable (or object) is global in nature, let it be so. Don’t bother trying to hide it from the global scope… you will just end up passing it around in function arguments which is inefficient in every way. And if you want to support expansion for multiple instances (for example, rendering to multiple screens, meaning that you can’t have a single global screen pointer for example) consider creating a global array or vector of such pointers and reference them by index. Or ID.

Now what is the bigger picture here? Well, as I develop a new language and also a new way of looking at architecture (at least new to me) I have found the need to embrace globality rather than stay away from it. This is reflected in the architecture of my light engine, and perhaps more importantly, the evolution of a new language. The principle I am investigating is that of keeping data and functionality separate, and to have all the data located in a single central and globally accessible database. Maybe this is a dead end, but something tells me that there’s a future in it. We shall see.

2 Responses to “Are global variables really all that bad?”

  1. Daniel Müßener Says:

    “If some variable (or object) is global in nature, let it be so”

    Finally somebody telling the truth!
    Can’t stand programmers preaching “globals are evil” or, similar, “goto is evil”. As with almost all things you can use it the right or the wrong way, period.

    Best regards,
    Daniel

    p.s.:
    Found your refreshing blog when checking the content of my “Amiga Forever” disk, stumbling over KickOff 2 and your name. Brought back some good memories 🙂

  2. Mr. X Says:

    Im working on a PHP framework and will be using a global $registry variable to keep all global data (site config values, defined hooks, etc).In PHP using a variable directly is about 10 times faster than doing a function call and about 20 times faster than an object’s method call. Also, using a global doesn’t limit me in any way and is just one variable.

    Been hearing all these people preach how globals are bad and X is bad and Y is bad and they all end up building CMS’ or frameworks that are slow and complicated for the sake of using design patterns just because they exist or using OOP just because it’s cool. PHP is not Java however but people keep forgetting that.

    They say globals are bad but nobody says overuse or bad use of OOP is bad and makes an app super hard to read.

    It is not the use of globals or a certain feature or design pattern that is bad. What is bad is poor design and implementation. For example WordPress is not crap because it is procedural but because of how it is built. Also it is not slow as f**k because it is procedural but because of how is built. You can take nuts and bolts, an engine and 4 wheels and build a heavy and slow car or you can build an F1 car. It’s all in the choices you make and what you’re trying to achieve.

    Thanks for the post

Leave a comment