There has been, for a long time, a nagging in the back of my mind about the issue of global variables. We all know that they are bad, don’t we? I am as much indoctrinated into this belief as anyone…. but is it actually a valid view of things?
Here is a page describing some reasons why they are considered bad practice, which I quote:
- Non-locality — Source code is easiest to understand when the scope of its individual elements are limited. Global variables can be read or modified by any part of the program, making it difficult to remember or reason about every possible use.
- No Access Control or Constraint Checking — A global variable can be get or set by any part of the program, and any rules regarding its use can be easily broken or forgotten. (In other words, get/set accessors are generally preferable over direct data access, and this is even more so for global data.) By extension, the lack of access control greatly hinders achieving security in situations where you may wish to run untrusted code (such as working with 3rd party plugins).
- Implicit coupling — A program with many global variables often has tight couplings between some of those variables, and couplings between variables and functions. Grouping coupled items into cohesive units usually leads to better programs.
- Concurrency issues — if globals can be accessed by multiple threads of execution, synchronization is necessary (and too-often neglected). When dynamically linking modules with globals, the composed system might not be thread-safe even if the two independent modules tested in dozens of different contexts were safe.
- Namespace pollution — Global names are available everywhere. You may unknowingly end up using a global when you think you are using a local (by misspelling or forgetting to declare the local) or vice versa. Also, if you ever have to link together modules that have the same global variable names, if you are lucky, you will get linking errors. If you are unlucky, the linker will simply treat all uses of the same name as the same object.
- Memory allocation issues — Some environments have memory allocation schemes that make allocation of globals tricky. This is especially true in languages where “constructors” have side-effects other than allocation (because, in that case, you can express unsafe situations where two globals mutually depend on one another). Also, when dynamically linking modules, it can be unclear whether different libraries have their own instances of globals or whether the globals are shared.
- Testing and Confinement – source that utilizes globals is somewhat more difficult to test because one cannot readily set up a ‘clean’ environment between runs. More generally, source that utilizes global services of any sort (e.g. reading and writing files or databases) that aren’t explicitly provided to that source is difficult to test for the same reason. For communicating systems, the ability to test system invariants may require running more than one ‘copy’ of a system simultaneously, which is greatly hindered by any use of shared services – including global memory – that are not provided for sharing as part of the test.
Now those of you who are interested in programming and have been following me in my thoughts will know that I am both a low level hacker and a software architect. I pretty much explored the art of data hacking for the first 15 years of my career, but later wanted to also explore ways of making software development more successful and effective in terms of re-usability, maintainability and cross platform development. This has lead me to investigate the use of C++ and explore such issues as object orientated programming. I am stills searching for a better way, and this has lead me to question many assumptions and common opinions of software development. I am now trying to develop a new language that solves what I see as the major problems with current languages, and recently this has brought me to seriously question the aversion to globality in software.
Let’s take a look at the above list of issues of the negative aspects of global variables and see what we can make of them.
The argument that code is easier to understand when it is isolated is quite valid and true. A fundamental concept in software design is encapsulation; the idea that we create a black box that has inputs and outputs that work to a specification, and we do not care how it works inside. By using global variables, so the argument goes, we break this encapsulation… we are exposing the inner workings of the black box. However, it is important to consider that one can use global variables in the file scope without exposing such variables to the rest of the code base.
An important point here is that it is not global data that is evil. In fact the non locality of global data can be a major advantage in maintaining simplicity. The danger is that completely global data can be modified by any code without regulation, and this can cause major problems if abused.
Since it is possible to use file global variables and provide functions to access that data, we can conclude that it is erroneous to use the non-locaity argument to suggest that we should avoid global variables. It is more correct to say that when using globals, one should be careful not to over expose the data.
No Access Control or Constraint Checking
This issue is actually not so much about global variables, as about regulation of access to data. Interestingly, if one can solve the access control issue on global data, the non locality argument is greatly diminished (if not eliminated).
A further very important point is to note that access control and constraint checking is a problem with all data, not just global data. Again there is a current fashion in some circles right now for suggesting the heuristic “don’t use accessor functions”. It may be that such espousers of the anti-accessor viewpoint really mean “choose when to use accessor functions carefully” rather than “don’t use them at all”; however the way the debate sometimes seems to go it is almost as if this movement is ready to lynch anyone who even thinks about using them.
Be that as it may, there are some very strong reasons for maintaining control of when data is accessed and modified. One very relevant example is to ensure that code logic is aware of when data has been modified. It is easy to forget that sending a message only when data changes (which requires an accessor function, as C++ does not support messages on data modification directly) can be a lot more efficient than polling (that is, constantly checking the data to observe when it is changed).
Because this issue just just as relevant for local data, I do not think it is a relevant argument against the use of global variables.
Implicit coupling means means dependencies between data elements that are not clearly defined, but occur as a result of the functionality of the code. However this is not a problem with global variable usage; once again it is a problem with organisation of data. You can create very clean code that uses file global variables that group related data together. So once again, we can discard this argument.
At first we might think that we find here a valid argument against globals. It might seem that global variables do not generally work well in multi-threaded code, but this is because C++ is not a multi-threaded language in the first place and does not natively support multi-threading constructs and protocols. As a result, we tend to group code into objects that can be operated on by different threads. However, this is not a problem with the concept of globality. It is a problem with the management of global data, and the lack of facilities in the language for dealing with this. To get a perspective on how concurrency issues can be solved in a complex multi-threaded environment, think about the architecture of the Internet. This has at its heart the client/server model, where the servers exist in a global database known as the Internet. Yes, the entire Internet is a global database, and has to operate with sometimes thousands of simultaneous connections. If global works in that instance, then how can we say that global data is the problem. Again, I feel that the argument is weak.
This reason for avoiding global variables is, again, not very strong. There is a generic problem with all languages concerning the naming of things. If we look at any C++ code that uses classes, it is clear that the classes themselves are global and must be named. What do we do if we need to integrate two different code bases or APIs that both have a class named the same? C++ solves this problem through the addition of namespaces to the language. This is not an entire solution, but it works most of the time. In any case, classes merely shift the problem from the naming of variables to the naming of classes. Is the namespace concern really valid when it comes to it?
Memory allocation issues
Could this me our first strong hit? Perhaps, but really only in the context of language characteristics. The point is not that global data causes memory allocation issues, but that data initialised at start up through being defined at the global scope has characteristics in C++ that cause complications. One common complication from trying is the issue of initialisation order: if many objects are initialised by the run time system before main() is even called, how can we be sure of their order of initialisation? What happens of such objects depend on each other?
Well all that may be true, but what about, for example, creating a global <vector> of pointers to objects? I can not think of any real reason to avoid that, from a memory allocation point of view. Certainly, this would mean that we can only have one such vector, but that is a different issue, solvable by creating a vector of vectors, or placing the vector in a class that has multiple instances in a vector
Testing and confinement
Although this, again, might seem a valid argument at first, it really does depend on how global data is organised. There is nothing to stop multiple instances running if a global array is used, or to make copies of global data for testing. In fact one could argue that having everything in one place (globally) makes testing easier.. it is certainly much easier to serialise the entire state of a running program if that program uses exclusively global data organised to be all grouped together in a single entity (such as a database like object).
What does this all mean?
These arguments against using global variables seem weak to me, and yet there is an origin to this idea that is founded in good practice. To understand the real problem with globals, it helps to remember that many early languages did not have local variables at all. In such circumstances coding becomes very difficult. The use of global variables exclusively meant that even things such as the index variables of a for loop would be global. It would mean that a for loop using the variable N calling a function that also used and modified N would not work correctly. It is actually hard to convey all the problems that occur if you are forced to use only global variables. In fact, a good exercise might me to try and write something that only uses globals to fully appreciate why languages need local variables to be effective. And don’t think you can write functions that take arguments: those are local variables. Try making all your functions take (void) and pass data around in global variables. Yes, try to do this, and you will understand why globals used for everything is bad, if you do not already.
I have a good understanding of this because I started to learn to program in Basic. Subroutines existed, but in the early versions of the language, arguments had to be passed in global variables.
So what do we have here? I think that in the search of better programming practices there has been a fondness for simple to follow rules. One such is the idea that global variables are bad, and this has resulted in a very specific way of thinking about programming. However this does not mean that this pattern is ideal. In fact it seems that most programmers (including myself) tend to take certain rules of thumb as the ultimate truth, without questioning them.
I put forward the idea that we should abolish the idea that global variables are bad, and replace this with a different, but more effective rule:
Always limit variables to the scope for which they are needed.
This is a very different rule of thumb, which prevents the bad use of globals, but still allows globals to be freely used where appropriate. It also covers issues with nested scopes for variables.
If some variable (or object) is global in nature, let it be so. Don’t bother trying to hide it from the global scope… you will just end up passing it around in function arguments which is inefficient in every way. And if you want to support expansion for multiple instances (for example, rendering to multiple screens, meaning that you can’t have a single global screen pointer for example) consider creating a global array or vector of such pointers and reference them by index. Or ID.
Now what is the bigger picture here? Well, as I develop a new language and also a new way of looking at architecture (at least new to me) I have found the need to embrace globality rather than stay away from it. This is reflected in the architecture of my light engine, and perhaps more importantly, the evolution of a new language. The principle I am investigating is that of keeping data and functionality separate, and to have all the data located in a single central and globally accessible database. Maybe this is a dead end, but something tells me that there’s a future in it. We shall see.