The Double Life of Variables
by Seth Roby10/07/2003
When Batman went home at the end of a night spent fighting crime, he put on a suit and tie and became Bruce Wayne. When Clark Kent saw a news story getting too hot, a phone booth hid his change into Superman. When you're programming, all the variables you juggle around are doing similar tricks as they present one face to you and a totally different one to the machine.
These secret identities serve a variety of purposes, and they help us to understand how variables work. In this lesson, we'll be writing a little less code than we've done in previous articles, but we'll be taking a detailed look at how variables live and work.
The Ghost in the Machine
|
Related Reading
Learning Cocoa with Objective-C |
The most basic duality that exists with variables is how the programmer sees them in a totally different way than the computer does. When you're typing away in Project Builder, your variables are normal words smashed together, like software titles from the 80s. You deal with them on this level, moving them around and passing them back and forth.
When the machine compiles your code, however, it does a little bit of translation. At run time, the computer sees nothing but 1s and 0s, which is all the computer ever sees: a continuous string of binary numbers that it can interpret in various ways.
This back and forth is an important concept to understand in C programming,
especially on the Mac's RISC architecture. Almost every variable you work with can be represented in 32
bits of memory: thirty-two 1s and 0s define
the data that a simple variable can hold. There are exceptions, like
on the new 64-bit G5s and in the 128-bit world of AltiVec; but for the
most part, when we're dealing with variables like int
and the other types we'll learn later in the lesson, we're
going to be dealing with convenient names for blocks of thirty-two 1s
and 0s.
Being able to understand that basic idea opens up a vast amount of power that can be used and abused, and we're going to look at a few of the better ways to deal with it in this article.
The Life of a Variable
A variable leads a simple life, full of activity but quite short (measured in nanoseconds, usually). It all begins when the program finds a variable declaration, and a variable is born into the world of the executing program. There are two possible places where the variable might live, but we will venture into that a little later.
This variable is then used in various lines of code, holding values given it by variable assignments along the way. In the course of its life, a variable can hold any number of variables and be used in any number of different ways. This flexibility is built on the precept we just learned: a variable is really just a block of bits, and those bits can hold whatever data the program needs to remember. They can hold enough data to remember an integer from as low as -2,147,483,647 up to 2,147,483,647 (one less than plus or minus 2^31). They can remember one character of writing. They can keep a decimal number with a huge amount of precision and a giant range. They can hold a time accurate to the second in a range of centuries. A few bits is not to be scoffed at.
When a variable is finished with it's work, it does not go into retirement, and it is never mentioned again. Variables simply cease to exist, and the thirty-two bits of data that they held is released, so that some other variable may later use them.
But variables get one benefit people do not; the end is always clearly marked and easy to determine. Every variable is declared inside a block of code, and that block determines its lifespan. When that block closes, the variables declared within it are freed. If the block they are declared in has blocks within it, the variable lives on through those blocks. Contrariwise, variables declared within those blocks cease to exist outside of their blocks of origin. This hierarchical relationship provides a simple pattern to follow, and the variable's lifetime is called its scope.
We can see an example of this in our code we've written so far. In each function's block, we declare variables that hold our data. When each function ends, the variables within are disposed of, and the space they were using is given back to the computer to use. The variables live in the blocks of conditionals and loops we write, but they don't cascade into functions we call, because those aren't sub-blocks, but different sections of code entirely. Every variable we've written has a well-defined lifetime of one function.
But some variables are immortal. These variables are declared outside of blocks, outside of functions. Since they don't have a block to exist in they are called global variables (as opposed to local variables), because they exist in all blocks, everywhere, and they never go out of scope. Although powerful, these kinds of variables are generally frowned upon because they encourage bad program design.
Stack Your Claim
Earlier I mentioned that variables can live in two different places. We're going to examine these two places one at a time, and we're going to start on the more familiar ground, which is called the Stack. Understanding the stack helps us understand the way programs run, and also helps us understand scope a little better.
The Stack is just what it sounds like: a tower of things that starts at the bottom and builds upward as it goes. In our case, the things in the stack are called "Stack Frames" or just "frames". We start with one stack frame at the very bottom, and we build up from there.
Each Stack Frame represents a function.
The bottom frame is always the main function, and the frames
above it are the other functions that main calls.
At any given time, the stack can show you the path your code has taken
to get to where it is. The top frame represents the function the code
is currently executing, and the frame below it is the function that
called the current function, and the frame below that represents the
function that called the function that called the current function,
and so on all the way down to main, which is the starting
point of any C program.
Inside each stack frame is a slew of useful information. It tells the
computer what code is currently executing, where to go next, where to
go in the case a return statement
is found, and a whole lot of other things that are incredible useful
to the computer, but not very useful to you most of the time. One of
the things that is useful to you is the part of the frame that keeps
track of all the variables you're using. So the first
place for a variable to live is on the Stack. This is a
very nice place to live, in that all the creation and destruction of
space is handled for you as Stack Frames are created and destroyed.
You seldom have to worry about making space for the variables on the
stack. The only problem is that the variables here only live as long
as the stack frame does, which is to say the length of the function
those variables are declared in. This is often a fine situation, but when you need to store information for longer than a single
function, you are instantly out of luck.
Heap on Some More Memory
To address this issue, we turn to the second place to put variables, which is called the Heap. If you think of the Stack as a high-rise apartment building somewhere, variables as tenets and each level building atop the one before it, then the Heap is the suburban sprawl, every citizen finding a space for herself, each lot a different size and locations that can't be readily predictable. For all the simplicity offered by the Stack, the Heap seems positively chaotic, but the reality is that each just obeys its own rules.
When compared to the Stack, the Heap is a simple thing to understand. All the memory that's left over is "in the Heap" (excepting some special cases and some reserve). There is little structure, but in return for this freedom of movement you must create and destroy any boundaries you need. And it is always possible that the heap might simply not have enough space for you.
A Pointer Between Two Worlds
Since the Heap has no definite rules as to where it will create space for you, there must be some way of figuring out where your new space is. And the answer is, simply enough, addressing. When you create new space in the heap to hold your data, you get back an address that tells you where your new space is, so your bits can move in. This address is called a Pointer, and it's really just a hexadecimal number that points to a location in the heap. Since it's really just a number, it can be stored quite nicely into a variable.
Let's see an example by converting our favoriteNumber
variable from a stack variable to a heap variable. The first thing we'll
do is find the project we've been working on and open it up in
Project Builder. In the <main.c> file, we'll start right
at the top and work our way down. Under the line:
#include <stdio.h>
insert
#include <libc.h>
This will allow us to use a few functions we didn't have access
to before. These lines are still a mystery for now, but we'll
explain them soon. Now we'll start working within the main
function, where favoriteNumber is declared and used. The
first thing we need to do is change how we declare the variable. Instead
of
int favoriteNumber = (3 * 4) / 2;
We're going to break this apart and make it two lines:
int* favoriteNumber = malloc(sizeof(int));
*favoriteNumber = (3 * 4) / 2;
Note first that favoriteNumbers
type changed. Instead of our familiar int, we're
now using int*. The asterisk here is an operator, which
is often called the "star operator".
You will remember that we also use an asterisk as a sign for multiplication.
The positioning of the asterisk changes its meaning. This operator
effectively means "this is a pointer". Here it says that
favoriteNumber will be not an int but a pointer
to an int. And instead of simply going on to say what we're
putting in that int, we have to take an extra step and
create the space, which is what <malloc> does. This function takes
an argument that specifies how much
space you need and then returns a pointer to that space. We've passed
it the result of another function, <sizeof>, which we pass int,
a type. In reality, <sizeof> is a macro,
but for now we don't have to care: all we need to know is that
it tells us the size of whatever we gave it, in this case an int.
So when <malloc> is done, it gives us an address in the heap where
we can put an integer. It is important to remember that the data is
stored in the heap, while the address of that data is stored in a pointer
on the stack.
Our next line looks familiar, except it starts with an asterisk. Again,
we're using the star operator, and noting that this variable we're
working with is a pointer. If we didn't, the computer would try
to put the results of the right hand side of this statement (which evaluates
to 6) into the pointer, overriding the value we need in the pointer,
which is an address. This way, the computer knows to put the data not
in the pointer, but into the place the pointer points to, which is in
the Heap. So after this line, our int is living happily
in the Heap, storing a value of 6, and our pointer tells us where that
data is living.
Let's take a moment to reexamine that. What we've done here
is create two variables. The first variable is in the Heap, and we're
storing data in it. That's the obvious one. But the second variable
is a pointer to the first one, and it exists on the Stack. This variable
is the one that's really called favoriteNumber, and
it's the one we're working with. It is important to remember
that there are now two parts to our simple variable, one of which exists
in each world. This kind of division is common is C, but omnipresent
in Cocoa. When you start making objects, Cocoa makes them all in the
Heap because the Stack isn't big enough to hold them. In Cocoa,
you deal with objects through pointers everywhere and are actually
forbidden from dealing with them directly.
The rest of our conversion follows a similar vein. Instead of going through line by line, let's just compare end results: when the transition is complete, the code that used to read:
int main (int argc, const char * argv[]) {
//Computes our favorite number
int favoriteNumber = (3 * 4) / 2; //is anyone's favorite
//number not an int?
favoriteNumber = integerForSeedValue(favoriteNumber + 2);
/* now let's tell the world
what our favorite number is! */
countTo(favoriteNumber);
return 0;
}
Should now look like this:
int main (int argc, const char * argv[]) {
//Computes our favorite number
int* favoriteNumber = malloc(sizeof(int));
*favoriteNumber = (3 * 4) / 2;
*favoriteNumber = integerForSeedValue(*favoriteNumber + 2);
/* now let's tell the world
what our favorite number is! */
countTo(*favoriteNumber);
free(favoriteNumber);
return 0;
}
Note the new asterisks whenever we reference favoriteNumber,
except for that new line right before the return.
This is another function provided for dealing with the heap. After you've
created some space in the Heap, it's yours until you let go of
it. When your program is done using it, you have to explicitly tell
the computer that you don't need it anymore or the computer will
save it for your future use (or until your program quits, when it knows
you won't be needing the memory anymore). The call to <free>
simply tells the computer that you had this space, but you're
done and the memory can be freed for use by something else later on.
This code should compile and run just fine, and you should see no changes in how the program works. So why did we do all of that?
For this program, it was a bit of overkill. It's a lot of overkill, actually. There's usually no need to store integers in the Heap, unless you're making a whole lot of them. But even in this simpler form, it gives us a little bit more flexibility than we had before, in that we can create and destroy variables as we need, without having to worry about the Stack. It also demonstrates a new variable type, the pointer, which you will use extensively throughout your programming. And it is a pattern that is ubiquitous in Cocoa, so it is a pattern you will need to understand, even though Cocoa makes it much more transparent than it is here.
free(reader)
That gives us a pretty good starting point to understand a lot more about variables, and that's what we'll be examining next lesson. Those new variable types I promised last lesson will finally make an appearance, and we'll examine a few concepts that we'll use to organize our data into more meaningful structures, a sort of precursor to the objects that Cocoa works with. And we'll delve a little bit more into the fun things we can do by looking at those ever-present bits in a few new ways.
Seth Roby graduated in May of 2003 with a double major in English and Computer Science, the Macintosh part of a three-person Macintosh, Linux, and Windows graduating triumvirate.
Return to the Mac DevCenter
-
copying an NSArray to a float[]...
2004-10-23 16:46:52 wince [View]
-
Seth, U need to write a book.
2004-01-20 22:49:28 ghydle [View]
-
Excellent stuff so far...
2003-12-04 14:55:35 anonymous2 [View]
-
Minimum and Maximum int
2003-10-31 21:41:59 mapplex [View]
-
great explanation
2003-10-25 11:07:57 anonymous2 [View]
-
A Typo?
2003-10-13 11:19:29 halliday [View]
-
Could you some rules of thumb
2003-10-09 09:30:34 anonymous2 [View]
-
Could you some rules of thumb
2003-10-09 09:24:08 anonymous2 [View]
-
Could you some rules of thumb
2003-10-09 10:11:08 tallama [View]
-
Title of this article
2003-10-08 18:17:48 anonymous2 [View]
-
Title of this article
2003-10-20 12:31:17 anonymous2 [View]
-
As a Programmer
2003-10-07 22:10:48 anonymous2 [View]
-
C for humans and Cocoa
2003-10-07 18:55:22 rrucker [View]
-
C for humans and Cocoa
2003-10-07 21:21:12 anonymous2 [View]

