macdevcenter.com
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

The Double Life of Variables

by Seth Roby
10/07/2003

When Batman went home at the end of a night spent fighting crime, he put on a suit and tie and became Bruce Wayne. When Clark Kent saw a news story getting too hot, a phone booth hid his change into Superman. When you're programming, all the variables you juggle around are doing similar tricks as they present one face to you and a totally different one to the machine.

These secret identities serve a variety of purposes, and they help us to understand how variables work. In this lesson, we'll be writing a little less code than we've done in previous articles, but we'll be taking a detailed look at how variables live and work.

The Ghost in the Machine

Related Reading

Learning Cocoa with Objective-C
By James Duncan Davidson, Apple Computer, Inc.

The most basic duality that exists with variables is how the programmer sees them in a totally different way than the computer does. When you're typing away in Project Builder, your variables are normal words smashed together, like software titles from the 80s. You deal with them on this level, moving them around and passing them back and forth.

When the machine compiles your code, however, it does a little bit of translation. At run time, the computer sees nothing but 1s and 0s, which is all the computer ever sees: a continuous string of binary numbers that it can interpret in various ways.

This back and forth is an important concept to understand in C programming, especially on the Mac's RISC architecture. Almost every variable you work with can be represented in 32 bits of memory: thirty-two 1s and 0s define the data that a simple variable can hold. There are exceptions, like on the new 64-bit G5s and in the 128-bit world of AltiVec; but for the most part, when we're dealing with variables like int and the other types we'll learn later in the lesson, we're going to be dealing with convenient names for blocks of thirty-two 1s and 0s.

Being able to understand that basic idea opens up a vast amount of power that can be used and abused, and we're going to look at a few of the better ways to deal with it in this article.

The Life of a Variable

A variable leads a simple life, full of activity but quite short (measured in nanoseconds, usually). It all begins when the program finds a variable declaration, and a variable is born into the world of the executing program. There are two possible places where the variable might live, but we will venture into that a little later.

This variable is then used in various lines of code, holding values given it by variable assignments along the way. In the course of its life, a variable can hold any number of variables and be used in any number of different ways. This flexibility is built on the precept we just learned: a variable is really just a block of bits, and those bits can hold whatever data the program needs to remember. They can hold enough data to remember an integer from as low as -2,147,483,647 up to 2,147,483,647 (one less than plus or minus 2^31). They can remember one character of writing. They can keep a decimal number with a huge amount of precision and a giant range. They can hold a time accurate to the second in a range of centuries. A few bits is not to be scoffed at.

When a variable is finished with it's work, it does not go into retirement, and it is never mentioned again. Variables simply cease to exist, and the thirty-two bits of data that they held is released, so that some other variable may later use them.

But variables get one benefit people do not; the end is always clearly marked and easy to determine. Every variable is declared inside a block of code, and that block determines its lifespan. When that block closes, the variables declared within it are freed. If the block they are declared in has blocks within it, the variable lives on through those blocks. Contrariwise, variables declared within those blocks cease to exist outside of their blocks of origin. This hierarchical relationship provides a simple pattern to follow, and the variable's lifetime is called its scope.

We can see an example of this in our code we've written so far. In each function's block, we declare variables that hold our data. When each function ends, the variables within are disposed of, and the space they were using is given back to the computer to use. The variables live in the blocks of conditionals and loops we write, but they don't cascade into functions we call, because those aren't sub-blocks, but different sections of code entirely. Every variable we've written has a well-defined lifetime of one function.

But some variables are immortal. These variables are declared outside of blocks, outside of functions. Since they don't have a block to exist in they are called global variables (as opposed to local variables), because they exist in all blocks, everywhere, and they never go out of scope. Although powerful, these kinds of variables are generally frowned upon because they encourage bad program design.

Stack Your Claim

Earlier I mentioned that variables can live in two different places. We're going to examine these two places one at a time, and we're going to start on the more familiar ground, which is called the Stack. Understanding the stack helps us understand the way programs run, and also helps us understand scope a little better.

The Stack is just what it sounds like: a tower of things that starts at the bottom and builds upward as it goes. In our case, the things in the stack are called "Stack Frames" or just "frames". We start with one stack frame at the very bottom, and we build up from there.

Each Stack Frame represents a function. The bottom frame is always the main function, and the frames above it are the other functions that main calls. At any given time, the stack can show you the path your code has taken to get to where it is. The top frame represents the function the code is currently executing, and the frame below it is the function that called the current function, and the frame below that represents the function that called the function that called the current function, and so on all the way down to main, which is the starting point of any C program.

Inside each stack frame is a slew of useful information. It tells the computer what code is currently executing, where to go next, where to go in the case a return statement is found, and a whole lot of other things that are incredible useful to the computer, but not very useful to you most of the time. One of the things that is useful to you is the part of the frame that keeps track of all the variables you're using. So the first place for a variable to live is on the Stack. This is a very nice place to live, in that all the creation and destruction of space is handled for you as Stack Frames are created and destroyed. You seldom have to worry about making space for the variables on the stack. The only problem is that the variables here only live as long as the stack frame does, which is to say the length of the function those variables are declared in. This is often a fine situation, but when you need to store information for longer than a single function, you are instantly out of luck.

Heap on Some More Memory

To address this issue, we turn to the second place to put variables, which is called the Heap. If you think of the Stack as a high-rise apartment building somewhere, variables as tenets and each level building atop the one before it, then the Heap is the suburban sprawl, every citizen finding a space for herself, each lot a different size and locations that can't be readily predictable. For all the simplicity offered by the Stack, the Heap seems positively chaotic, but the reality is that each just obeys its own rules.

When compared to the Stack, the Heap is a simple thing to understand. All the memory that's left over is "in the Heap" (excepting some special cases and some reserve). There is little structure, but in return for this freedom of movement you must create and destroy any boundaries you need. And it is always possible that the heap might simply not have enough space for you.

A Pointer Between Two Worlds

Since the Heap has no definite rules as to where it will create space for you, there must be some way of figuring out where your new space is. And the answer is, simply enough, addressing. When you create new space in the heap to hold your data, you get back an address that tells you where your new space is, so your bits can move in. This address is called a Pointer, and it's really just a hexadecimal number that points to a location in the heap. Since it's really just a number, it can be stored quite nicely into a variable.

Let's see an example by converting our favoriteNumber variable from a stack variable to a heap variable. The first thing we'll do is find the project we've been working on and open it up in Project Builder. In the <main.c> file, we'll start right at the top and work our way down. Under the line:

#include <stdio.h>

insert

#include <libc.h>

This will allow us to use a few functions we didn't have access to before. These lines are still a mystery for now, but we'll explain them soon. Now we'll start working within the main function, where favoriteNumber is declared and used. The first thing we need to do is change how we declare the variable. Instead of


	int favoriteNumber = (3 * 4) / 2;

We're going to break this apart and make it two lines:


    int* favoriteNumber = malloc(sizeof(int));
    *favoriteNumber = (3 * 4) / 2;

Note first that favoriteNumbers type changed. Instead of our familiar int, we're now using int*. The asterisk here is an operator, which is often called the "star operator". You will remember that we also use an asterisk as a sign for multiplication. The positioning of the asterisk changes its meaning. This operator effectively means "this is a pointer". Here it says that favoriteNumber will be not an int but a pointer to an int. And instead of simply going on to say what we're putting in that int, we have to take an extra step and create the space, which is what <malloc> does. This function takes an argument that specifies how much space you need and then returns a pointer to that space. We've passed it the result of another function, <sizeof>, which we pass int, a type. In reality, <sizeof> is a macro, but for now we don't have to care: all we need to know is that it tells us the size of whatever we gave it, in this case an int. So when <malloc> is done, it gives us an address in the heap where we can put an integer. It is important to remember that the data is stored in the heap, while the address of that data is stored in a pointer on the stack.

Our next line looks familiar, except it starts with an asterisk. Again, we're using the star operator, and noting that this variable we're working with is a pointer. If we didn't, the computer would try to put the results of the right hand side of this statement (which evaluates to 6) into the pointer, overriding the value we need in the pointer, which is an address. This way, the computer knows to put the data not in the pointer, but into the place the pointer points to, which is in the Heap. So after this line, our int is living happily in the Heap, storing a value of 6, and our pointer tells us where that data is living.

Let's take a moment to reexamine that. What we've done here is create two variables. The first variable is in the Heap, and we're storing data in it. That's the obvious one. But the second variable is a pointer to the first one, and it exists on the Stack. This variable is the one that's really called favoriteNumber, and it's the one we're working with. It is important to remember that there are now two parts to our simple variable, one of which exists in each world. This kind of division is common is C, but omnipresent in Cocoa. When you start making objects, Cocoa makes them all in the Heap because the Stack isn't big enough to hold them. In Cocoa, you deal with objects through pointers everywhere and are actually forbidden from dealing with them directly.

The rest of our conversion follows a similar vein. Instead of going through line by line, let's just compare end results: when the transition is complete, the code that used to read:

int main (int argc, const char * argv[]) {
    //Computes our favorite number
    int favoriteNumber = (3 * 4) / 2; //is anyone's favorite 
                                      //number not an int?
    favoriteNumber = integerForSeedValue(favoriteNumber + 2);

    /* now let's tell the world
        what our favorite number is! */
    countTo(favoriteNumber);

	return 0;
}

Should now look like this:

int main (int argc, const char * argv[]) {
//Computes our favorite number
    int* favoriteNumber = malloc(sizeof(int));
    *favoriteNumber = (3 * 4) / 2;
    *favoriteNumber = integerForSeedValue(*favoriteNumber + 2);

    /* now let's tell the world
        what our favorite number is! */
    countTo(*favoriteNumber);

    free(favoriteNumber);
    return 0;
}

Note the new asterisks whenever we reference favoriteNumber, except for that new line right before the return.

This is another function provided for dealing with the heap. After you've created some space in the Heap, it's yours until you let go of it. When your program is done using it, you have to explicitly tell the computer that you don't need it anymore or the computer will save it for your future use (or until your program quits, when it knows you won't be needing the memory anymore). The call to <free> simply tells the computer that you had this space, but you're done and the memory can be freed for use by something else later on.

This code should compile and run just fine, and you should see no changes in how the program works. So why did we do all of that?

For this program, it was a bit of overkill. It's a lot of overkill, actually. There's usually no need to store integers in the Heap, unless you're making a whole lot of them. But even in this simpler form, it gives us a little bit more flexibility than we had before, in that we can create and destroy variables as we need, without having to worry about the Stack. It also demonstrates a new variable type, the pointer, which you will use extensively throughout your programming. And it is a pattern that is ubiquitous in Cocoa, so it is a pattern you will need to understand, even though Cocoa makes it much more transparent than it is here.

free(reader)

That gives us a pretty good starting point to understand a lot more about variables, and that's what we'll be examining next lesson. Those new variable types I promised last lesson will finally make an appearance, and we'll examine a few concepts that we'll use to organize our data into more meaningful structures, a sort of precursor to the objects that Cocoa works with. And we'll delve a little bit more into the fun things we can do by looking at those ever-present bits in a few new ways.

Seth Roby graduated in May of 2003 with a double major in English and Computer Science, the Macintosh part of a three-person Macintosh, Linux, and Windows graduating triumvirate.


Return to the Mac DevCenter