Introduction
In this article i am writing about pointers, one of the exciting features of C and C++ language. One must say that pointers are the source of much confusion, as well as the cause of many programming errors. Internally almost every C program makes use of pointers. Prior to understanding the importance of pointers, it is necessary to understand about the basic concept of memory in computers.
Memory and Addresses
In computer’s memory, every byte has an address. Addresses are basically numbers that starts at 0 and end at the highest address, which will vary according to the amount of memory available in the computer as shown in Figure 1. From this figure, it is clear that storage locations are the numbers, just as they are for post offices. For a 640kb of memory, we have addresses from 0 to 65535. Note that the address of a variable is not the same as its contents. The address is the memory location, which is used to store the data. One or more storage location is capable of storing a piece of data. But the address of a data item is the address of its first storage location. Storage location is primarily used when we want to extract data or when we want to store data. The address of a data item is called a pointer to that data item.
Figure 1
The Concept of Pointers
Every variable and every function in a C program starts at a particular address. The address of a data item is called a pointer and a variable that contains the address of another variable or a function is called a pointer variable. Like any basic data type variable, a pointer variable must be declared prior to its use. Each pointer variable can point only to one specific basic data type, such as int, float, char, or user-defined data type, such as struct and union.
The Address Operator (&)
To obtain the address of a variable, C provides a unary operator & (ampersand). This operator is known as the address operator. When the address operator (&) precedes a variable name, it yields address of that variable. Note that this address operator can precede only a variable name or array element, but never a constant, an expression, the unsubscripted name of an array.
Now let us start from the beginning. Firstly we will see what happens when we create a variable. For example, if we declare an integer variable ‘x’ as:
int x = 100; /* int variable */
then the C compiler performs three actions:
(i) reserves memory space to contain an integer value, say 65524
(ii) associate a name ‘x’ with this memory location
(iii) stores the value 100 at this memory location
Figure-.2 shows these actions.
This figure shows that the variable ‘x’ has selected memory location 65524 in order to store the value 100. You can obtain this address easily by using the address operator, that is &x gives the address of variable ‘x’.
Following program-1.c illustrates the use of ‘&’ operator.
/* Program - 1.c */
#include
main()
{
int x = 100;
printf("\nAddress : %u contains a value : %d", &x, x);
}
When we execute this program, we get following output….
Address : 65524 contains a value : 100
This output displays the address in decimal notation. By default in C, the address is displayed in decimal notation. However by default in a C++ we obtain the address in hexadecimal notation. Now look at the printf statement:
printf("\nAddress : %u contains a value : %d", &x, x);
In this statement, we have used format specifier %u and %d. As we know the %d specifier is used for integer number. But what about format specifier ‘%u’ ? Actually we all know that a memory location can not be a negative number, and a non-negative number is always an unsigned number. For unsigned numbers, C provides %u specifier for displaying unsigned decimal number. However in C, you can also display the address of a variable in hexadecimal notation by replacing %u by %x. Thus the following printf() statement
printf("\nAddress : %x contains a value : %d", &x, x);
displays the following output:
Address : fff4 contains a value : 100
Also remember that the address of any user-defined data type variable will also be an unsigned integer number.
Pointer Variables
A variable that contains the address of another variable is called a pointer variable. A pointer variable is a variable that must be declared like any ordinary variable. The pointer is declared using an asterisk (*) operator. The general form of a pointer declaration is:
datatype *pointervariablename;
Here datatype could be any basic data type like int, float, char or user-defined data types, such as struct or union. The pointervariablename could be any valid C variable name. The pointer-variable-name holds the address of any variable of the specified datatype. However, you can also declare a pointer variable as:
datatype* pointer-variable-name;
For example, in the following declaration
int ptr; /* pointer to int variable */
‘ptr’ is a pointer variable which stores an address of the location where the integer variable is stored. Similarly a pointer to a float variable is declared as:
float *j;
A pointer variable does not point to anything until it is explicitly assigned the address of an ordinary variable. When a variable is declared as a pointer, it is not automatically initialized to any value, we have to explicitly assign the address of another variable to it. For example, in the following statements:
int a, *b;
b = &a;
the address of integer variable ‘a’ is assigned to a pointer variable ‘b’. However a pointer variable can also be initialized in its declaration. Above two statements can also be rewritten as:
int a, *b = &a;
This statement declares ‘a’ as an ordinary variable of type int, declares ‘b’ as a pointer to int, and initializes ‘b’ to the address of ‘a’.
The Indirection Operator (*)
The operator *, when applied to an address (pointer) fetches the value at that address. The operator * is referred to as the indirection or dereferencing operator. When an indirection operator is applied on a pointer variable it yields the value contained in the memory location pointed to by the pointer variable’s.
For example, if ‘b’ is a pointer to an integer then the type of *b would an integer value. For example, consider the following statements:
int a = 10, n, *b;
b = &a;
n = b; /* retrieve value that int pointer points to */
Here the last two statements
b = &a;
n = *b;
are equivalent to the single statement
n = *(&a);
or
n = a;
From this result, it is clear that the address operator (&) is the inverse of the indirection operator. Following program illustrates this concept.
/* Program - 2.c */
#include
main()
{
int a=10, *b;
b = &a;
printf("\nAddress : %u contains a value : %d", &a, a);
printf("\nAddress : %u contains a value : %d", &a, *(&a));
printf("\nAddress : %u contains a value : %d", b, a);
printf("\nAddress : %u contains a value : %d", b, *b);
}
When you run this program, you get the following output….
Address : 65524 contains a value : 10
Address : 65524 contains a value : 10
Address : 65524 contains a value : 10
Address : 65524 contains a value : 10
One main point to remember is that pointers and integers are not interchangeable. Thus the following statements are completely invalid:
int a=10, *b;
b = 65524; /* Illegal */
An exception to this rule is the constant 0 that can be assigned to a pointer at any time because a pointer value 0 is defined as NULL value in the standard header file . Thus the following statement is completely valid:
b = 0; /* Legal */
The void Pointers
If a pointer is defined to be of a specific data type then it can not hold the address of any other type of variable. It means that we can not store the address of a float variable in an integer pointer variable as:
int *b;
float p = 40.75;
b = &p /* Illegal */
Likewise a pointer of one data type can not be assigned to a pointer of another data type until it is explicitly performed typecasting as:
int a=10, *b;
float p = 40.75, *x;
b=&a;
x=&p;
x = b; /* Illegal */
x = (float *)b; /* Legal */
Here the last statement performs the conversion of an int pointer into a float pointer. The basic principal behind this conversion of one pointer type to another is that the conversion of a pointer ‘b’ into a pointer to ‘x’ and back is possible only if ‘x’ requires less or equally strict storage alignment when compared to ‘b’; otherwise some addressing exception may occur when the resulting pointer is dereferenced. This type of problem is overcome by using generic pointers. These generic pointers are called as void pointers.
The syntax of declaring a void pointer is as:
void *vptr;
Here void is a keyword and vptr is any valid identifier name. The void pointer does not have any type associated with it and therefore it can contain the address of any type of variable. Following code segment illustrates this:
void *vptr;
int a=10, *b;
float p = 40.75, *x;
b=&a;
x=&p;
vptr=&a; /* Legal */
vptr=&p; /* Legal */
vptr = b; /* Legal */
vptr = x; /* Legal */
Thus any pointer can be converted to a void pointer and back without loss of information. But one should remember very carefully that pointer to void can not be directly dereferenced like other normal pointer variable by using the indirection operator (*). Following statements illustrates this.
void *vptr;
int a=10, *b;
vptr = b; /* Legal */
a = *vptr+10; /* Illegal */
A void pointer can only be dereferenced only when it is suitably typecast to the required data type, as illustrated below:
a = *((int *)vptr)+10; /* Legal */
Here a void pointer is typecast by preceding its name with an int data type followed by an asterisk, both of which are enclosed within a pair of parentheses.
Arithmetical View of Pointers
As stated earlier pointers are unsigned integer variables by which we can add integer values as well as subtract them from a pointer. But the main difference between a normal integer and an integer pointer is that pointer arithmetic adds and subtracts the size of the data type to which the pointer points.
For example, when we add 1 to any integer pointer, it is incremented by the size of an integer variable. We know that each integer data occupies two bytes in memory therefore if we add 1 to an integer pointer then actually we are adding 2 to it. For example, let we have an integer pointer variable ‘iptr’ containing an address 62400 then after execution of the following statement:
iptr = iptr+4;
the value of iptr becomes 62408. Similarly when we subtract any integer number say 2, it is decremented by 4, that is
iptr = iptr-2;
results in 62404. A float pointer variable value increases or decreases by 4 when it is incremented or decremented. In the same fashion if we say add 1 to any float pointer, it is incremented by 4, that is the size of a float data type. Similarly when we increment/decrement a character pointer variable, its value is increment/decrement by 1. It means that in pointer arithmetic, all pointers increase and decrease by the length of the data type they point to.
While using pointer arithmetic, one should be very careful. Pointer arithmetic is restricted to only a few basic operations. These operations are summarized as:
1. Adding a constant integer value to a pointer
2. Subtracting a constant integer value from a pointer
3. Subtracting the value of one pointer from another of the same data type
4. Comparing the values two pointers of the same data type
Pointers and Arrays
One should be surprised to know that all arrays make use of pointers internally. In general, any operation that can be achieved by array subscripting can also be done with pointers. Now let us see how this is possible. We know that array elements are always stored in contiguous memory locations irrespective of the size of the array. And a pointer when incremented always point to an immediately location of its type. Since all the array elements are stored in contiguous memory locations, therefore if we store the base address of an array in a pointer variable then we can easily access other array elements. For example, let ptr is a pointer variable and it contains the base address of array arr[] as:
int *ptr = arr; /* address of array – arr */
or
int *ptr = &arr[0];
on incrementing the value of ptr as:
ptr++;
it will point to the second element of the array. And once we have an address, we can easily access the value stored at that address.
Following program shows this concept.
/* Program - 3.c */
#include
main()
{
int arr[5] = {10, 20, 30, 40, 50};
int i, *ptr;
ptr = arr;
for(i=0; i<5; i++)
{
printf("\nAddress : %u contains a value : %d", ptr, *ptr);
ptr++;
}
}
The output of this program is as:
Address : 64820 contains a value : 10
Address : 64822 contains a value : 20
Address : 64824 contains a value : 30
Address : 64826 contains a value : 40
Address : 64828 contains a value : 50
In this program, an integer pointer ptr is explicitly declared and assigned the starting address (base address) of the array and within loop, the value of ptr is incremented in each iteration. Since the name of array is a synonym for the location of the initial element, therefore the address of first element can be expressed as either arr or (arr+0) or &arr[0]. The address of the second element can be written as either arr+1 or &arr[1]. In general the address of (i+1)th array element can be expressed as either (arr+i) or &arr[i]. Now if we apply indirection operator then we can easily obtain the contents stored on that particular address. Following table summarizes this.
*(arr+0) is equivalent to *(&arr[0]) or arr[0]
*(arr+1) is equivalent to *(&arr[1]) or arr[1]
*(arr+2) is equivalent to *(&arr[2]) or arr[2]
*(arr+3) is equivalent to *(&arr[3]) or arr[3]
…. …. ….
*(arr+i) is equivalent to *(&arr[i]) or arr[i]
You will be surprised to know that internally the C/C++ compiler converts the arr[i] to *(arr+i). This means that all the following notations are same:
arr[i]
*(arr+i)
*(i+arr)
i[arr]
Following program shows this fact.
/* Program - 4.c */
#include
main()
{
int arr[5] = {10, 20, 30, 40, 50};
int i;
for(i=0; i<5; i++)
printf("\nAddress : %u contains a value : %d %d %d %d", (arr+i), arr[i],
*(arr+i), *(i+arr), i[arr]);
}
When we run this program, we get the following output….
Address : 65516 contains a value : 10 10 10 10
Address : 65518 contains a value : 20 20 20 20
Address : 65520 contains a value : 30 30 30 30
Address : 65522 contains a value : 40 40 40 40
Address : 65524 contains a value : 50 50 50 50
Although array names and pointers have strong similarity, but array names are not variables. Thus if arr is an array name and ptr is an integer pointer variable as:
ptr = arr;
then the statement
ptr++
is completely valid, but the statement
arr++
is completely invalid. It is so because array names are constant pointers whose values can not be changed. If you try this then it will mean that you are attempting to change the base address of an array. And fortunately the C compiler would never allow this.
Another main difference between arrays and pointers is the way sizeof operator treats them. It means that if arr is an array name and ptr is an integer pointer as:
ptr = arr;
then
sizeof (arr)
would result in total number of memory occupied by array elements and
sizeof(ptr)
would result in 2 because the size of any pointer variable is 2 bytes.
Pointers and Two Dimensional Arrays
Two-dimensional arrays are arrays with two dimensions, namely rows and columns. Each row of a two dimensional array can be thought as one dimensional array. This is very important fact if we wish to access elements of a two-dimensional arrays using pointer.
Let we have following two dimensional array:
int num[4][3] = {{210, 614, 127},
{174, 443, 242},
{161, 820, 667},
{511, 203, 200} };
can be though of as setting up a one dimensional array of 4 elements, each of which is a one dimensional array of 3 elements. Since each row of the two dimensional array is treated as a one dimensional array, therefore the name of the array points to the starting address (base address) of the array. If we suppose that array elements are stored in memory in the row major form as shown in figure-3 then the expression num[0] points to the 0th row, num[1] points to the 1st row and so on. In other words, you can say that num[i] points to the ith row of the array.
Figure 3
Now we will see how to access the element num[i][j] using pointers. We know that num[i] gives the address of ith one dimensional array. Therefore num[i]+j points to the jth element in the ith row of the array. And the value at this address can be obtained by using the value at address operator as:
*( num[i]+j)
But earlier we have seen that num[i] is same as *(num+i). Therefore the expression
*( num[i]+j)
is equivalent to
*(*( num+i)+j)
Thus all the following notations refer to the same element:
num[i][j]
*( num[i]+j)
*(*( num+i)+j)
Following program illustrates these concepts.
/* Program - 5.c */
#include
main()
{
int num[4][3] = {
{210, 614, 127},
{174, 443, 242},
{161, 820, 667},
{511, 203, 200}
};
int i;
for(i=0; i<4; i++)
{
for(j=0; j<3; j++)
printf("\nAddress : %u contains the value : %d %d %d", num[i]+j,
num[i][j], *(num[i]+j),*(*(num+i)+j));
}
}
Here is the output of this program….
Address : 65502 contains the value : 210 210 210
Address : 65504 contains the value : 614 614 614
Address : 65506 contains the value : 127 127 127
Address : 65508 contains the value : 174 174 174
Address : 65510 contains the value : 443 443 443
Address : 65512 contains the value : 242 242 242
Address : 65514 contains the value : 161 161 161
Address : 65516 contains the value : 820 820 820
Address : 65518 contains the value : 667 667 667
Address : 65520 contains the value : 511 511 511
Address : 65522 contains the value : 203 203 203
Address : 65524 contains the value : 200 200 200
Pointers and Three Dimensional Arrays
Three-dimensional arrays are arrays of two dimensional arrays. For example, let we have a three dimensional array with dimensions (3, 2, 4) as follows:
int num[3][2][4] = {
{
{10, 14, 12, 15},
{17, 43, 22, 18}
},
{
{11, 20, 24, 66},
{50, 44, 23, 20}
},
{
{90, 94, 112, 75},
{67, 31, 32, 28}
}
};
Since the array name points to the starting address (base address) of three dimensional array. The array name, num, with a single script ‘i’ gives the starting address of the ith two-dimensional array. Thus num[i] is the address of the ith two dimensional array. Similarly num[i][j] would give the starting address of the jth row of the ith two dimensional array. Finally (num[i][j]+k) gives the address of the kth element in the jth row of the ith two dimensional array. And the value at this address can be obtained by using the value at address operator as:
*( num[i][j]+k)
This expression can be equivalent to the following expressions:
*( *(num[i]+j)+k)
*( *(*(num+i)+j)+k)
Using these concepts, the following program displays each element of a three dimensional array using pointers.
/* Program -6.c */
#include
main()
{
int i, j, k;
int num[3][2][4] = {
{
{10, 14, 12, 15},
{17, 43, 22, 18}
},
{
{11, 20, 24, 66},
{50, 44, 23, 20}
},
{
{90, 94, 112, 75},
{67, 31, 32, 28}
}
};
for(i=0; i<3; i++)
{
for(j=0; j<2; j++)
{
for(k=0; k<4; k++)
{
printf("\nAddress : %u contains value : %d\t %d\t %d\t",
(num[i][j]+k), num[i][j][k], *(num[i][j]+k), *(*(num[i]+j)+k),
*(*(*(num+i)+j)+k);
}
}
}
}
When you run this program, you get following output….
Address : 65476 contains value : 10 10 10 10
Address : 65478 contains value : 14 14 14 14
Address : 65480 contains value : 12 12 12 12
Address : 65482 contains value : 15 15 15 15
Address : 65484 contains value : 17 17 17 17
Address : 65486 contains value : 43 43 43 43
Address : 65488 contains value : 22 22 22 22
Address : 65490 contains value : 18 18 18 18
Address : 65492 contains value : 11 11 11 11
Address : 65494 contains value : 20 20 20 20
Address : 65496 contains value : 24 24 24 24
Address : 65498 contains value : 66 66 66 66
Address : 65500 contains value : 50 50 50 50
Address : 65502 contains value : 44 44 44 44
Address : 65504 contains value : 23 23 23 23
Address : 65506 contains value : 20 20 20 20
Address : 65508 contains value : 90 90 90 90
Address : 65510 contains value : 94 94 94 94
Address : 65512 contains value : 112 112 112 112
Address : 65514 contains value : 75 75 75 75
Address : 65516 contains value : 67 67 67 67
Address : 65518 contains value : 31 31 31 31
Address : 65520 contains value : 32 32 32 32
Address : 65522 contains value : 28 28 28 28
Array of Pointers
An array of pointers is similar to an array of any other data type. An array of pointers is a collection of addresses. The addresses present in an array of pointers can be addresses of isolated variables or of array elements or any other addresses.
Let we have an array of 5 integers as:
int arr[5] = {10, 20, 30, 40, 50};
then we can have an array of pointers ‘ptr’ as:
int *ptr[5] = {arr, arr+1, arr+2, arr+3, arr+4, arr+5};
Here ptr[0] will contain the address of first element of array ‘arr’, ptr[1] of second element of array ‘arr’ and so on. Following table shows:
ptr[0] = (arr+0);
ptr[1] = (arr+1);
ptr[2] = (arr+2);
ptr[3] = (arr+3);
ptr[4] = (arr+4);
Let first element of an array of integer ‘arr’ is stored at address 65516 then the successive elements of the array ‘arr’ are stored in successive memory location. Figure-4 shows the memory representation of the array of pointers, ptr and an integer array, arr.
Figure 4
Remember that an array of pointers can also contain addresses of isolated variables. Following code segment illustrates this fact.
int a, b, c, d, e;
int *ptr[5];
ptr[0] = &a;
ptr[1] = &b;
ptr[2] = &c;
ptr[3] = &d;
ptr[4] = &e;
You can say that all rules that apply to an array of basic data type apply to the array of pointers as well.