Why Use a Disk?
If you've worked with computers very much, and if you've done some programming in other languages, you know the importance of file storagefor data. The typical computer system has much less memory storage than hard disk storage. Your disk drive holds much more data than can fit in your computer's RAM. The disk memory, because it is nonvolatile, lasts longer because the disk retains its contents when you power-off your computer. Also, when your data changes, you (or more important, your users) do not have to edit the program and look for a set of assignment statements. Instead, the users run previously writtenprograms that make changes to the disk data.
Types of Disk File Access
Your programs can access files two ways: through sequential access or random access. Your application determines the method you should choose. The access mode of a file determines how you read, write, change, and delete data from the file. Some of your files can be accessed in both ways, sequentially and randomly, as long as your programs are written properly and the data lends itself to both types of file access.
A sequential file must be accessed in the same order the file was written. This is analogous to cassette tapes: You play music in the same order it was recorded. (You can quickly fast-forward or rewind through songs that you do not want to listen to, but the order of the songs dictates what you do to play the song you want.) It is difficult, and sometimes impossible, to insert data in the middle of a sequential file. How easy is it to insert a new song in the middle of two other songs on a tape? The only way to truly add or delete records from the middle of a sequential file is to create a completely new file that combines both old and new songs.
It might seem that sequential files are limiting, but it turns out that many applications lend themselves to sequential file processing.
Unlike with sequential files, you can access random access files in any order you want. Think of data in a random access file as you would think of songs on a compact disc or a record; you can go directly to any song you want without having to play or fast-forward through the other songs. If you want to play the first song, the sixth song, and then the fourth song, you can do so. The order of play has nothing to do with the order in which the songs appear on the recording. Random file access sometimes takes more programming but rewards that effort with a more flexible file access method. You'll learn about both file storage methods in this chapter.
Learning Sequential File Concepts
There are three operations you can perform on sequential disk files. You can
• Create disk files
• Add to disk files
• Read from disk files
Your application determines what you need to do. If you are creating a disk file for the first time, you must create the file and write the initial data to it. Suppose that you wanted to create a customer data file. You would create a new file and write your current customers to that file. The customer data might originally be in arrays or arrays of structures, pointed to with pointers, or typed into regular variables by the user.
Over time, as your customer base grows, you can add new customers to the file. When you add to the end of a file, you append to that file. As your customers enter your store, you would read their information from the customer data file.
Customer disk processing brings up one disadvantage of sequential files, however. Suppose that a customer moves and wants you to change his or her address in your files. Sequential access files do not lend themselves well to changing data stored in them. It is also difficult to remove information from sequential files. Random files will provide a much easier approach to changing and removing data. The primary approach to changing or removing data from a sequential access file is to create a new one from the old one with the updated data.
Opening and Closing Sequential Files
Before you can create, write to, or read from a disk file, you must open the file. This is analogous to opening a file cabinet before working with a file stored in the cabinet. As soon as you are done with a cabinet's file, you close the file door. You must also close a disk filewhen you finish with it.
When you open a disk file, you must inform C only of the filename and what you want to do (write to, add to, or read from). C and youroperating system work together to make sure that the disk is ready, and they create an entry in your file directory (if you are creating a file) for the filename.
When you close a file, C writes any remaining data to the file, releases the file from the program, and updates the file directory to reflect the file's new size.
To open a file, you must call the fopen() function (for ''file open"). To close a file, call the fclose() function. Here is the format of these two function calls:
filePtr = fopen(fileName, access);
and
fclose(filePtr);
a definition in the stdio.h header file. The examples that follow show you how to define a file pointer.The filePtr is a special type of pointer that points only to files, not to data variables. You must define a file pointer with FILE
Your operating system handles the exact location of your data in the disk file. You do not want to worry about the exact track and sector number of your data on the disk. Therefore, you let the filePtr point to the data you are reading and writing. Your program only has to generically manage the filePtr while C and the operating system take care of locating the actual physical data.
The fileName is a string (or a character pointer that points to a string) containing a valid filename for your computer. If you are using a PC or a UNIX-based computer, the fileName can contain a complete disk and directory pathname. If you are using a mainframe, you must use the complete dataset name in the fileName string. Generally, you can specify the filename in uppercase or lowercase letters, as long as your operating system does not have a preference.
Sometimes you see programs that contain a t or a b in the access mode, such as ''rt" or "wb+". The t means text file and is the default mode; each of the access modes listed in Table 24.1 is equivalent to using t after the access mode letter ("rt" is identical to "r", and so on). A text file is an ASCII file, compatible with most other programming languages and applications. Text files do not always contain text, in the word processing sense of the word. Any data you need to store can go in a text file. Programs that read ASCII files can read data you create as C text files. The b in the access mode means binary mode.
If you open a fil for writing (using access modes of "w", "wt", "wb", or "w+"), C creates the file. If a file by that name already exists, C overwrites the old file with no warning. When opening files, you must be careful that you do not overwrite existing data you want to save.
If an error occurs during the opening of a file, C does not return a valid file pointer. Instead, C returns a file pointer equal to the value NULL. NULL is defined in stdio.h. For example, if you open a file for output, but use a disk name that is invalid, C cannot open the file and will make the file pointer point to NULL. Always check the file pointer when writing disk file programs to ensure that the file opened properly.
Writing to a File
Any input or output function that requires a device performs input and output with files. You have seen most of these already. The most common file I/O functions are
getc() and putc()
fprintf()
fgets() and fputs()
There are a few more, but the most common I/O function left that you have not seen is the fscanf() function. fscanf() is to scanf() as fprintf() is to printf(). The only difference between fscanf() and scanf() is its first parameter. The first parameter to fscanf() must be a file pointer (or any C device, such as stdin and stdaux).
The following function reads three integers from a file pointed to by filePtr:
fscanf(filePtr, "%d %d %d", &num1, &num2, &num3);
As with scanf(), you do not have to specify the & before array variable names. The following fscanf() reads a string from the disk file:
fscanf(filePtr, "%s", name);
The fscanf() is not as potentially dangerous as the scanf() function. scanf() gets input from the user. The user does not always enter data in the format that scanf() expects. When you get data from a disk file, however, you can be more certain about the format because you probably wrote the program that created the file in the first place. Errors still can creep into a data file, and you might be wrong about the file's format when using fscanf(), but generally, fscanf() is more secure than scanf().
There is always more than one way to write data to a disk file. Most of the time, more than one function will work. For instance, if you write many names to a file, both fputs() and fprintf() will work. You also can write the names using putc(). You should use whichever function you are most comfortable with for the data being written. If you want a newline character (\n) at the end of each line in your file, the fprintf() and fputs() probably are easier than putc(), but all three will do the job.
Writing to a Printer
The fopen() and other output functions were not designed to just write to files. They were designed to write to any device, including files, the screen, and the printer. If you need to write data to a printer, you can treat the printer as if it were a file. The following program opens a FILE pointer using the MS-DOS name for a printer located at LPT1 (the MS-DOS name for the first parallel printer port):
Adding to a File
You can easily add data to an existing file or create new files by opening the file in append access mode. Data files on the disk rarely are static; they grow almost daily due to (with luck!) increased business. Being able to add to data already on the disk is very useful indeed.
Files you open for append access (using ''a", "at", "ab", "a+b", and "ab+") do not have to exist. If the file exists, C appends data to the end of the file when you write the data. If the file does not exist, C creates the file (as is done when you open a file for write access).
Reading from a File
As soon as the data is in a file, you must be able to read that data. You must open the file in a read access mode. There are several ways to read data. You can read character data a character at a time or a string at a time. The choice depends on the format of the data. If you stored numbers using fprintf(), you might want to use a mirror-image fscanf() to read the data.
Files you open for read access (using ''r", "rt", and "rb") must exist already, or C gives you an error. You cannot read a file that does not exist. fopen() returns NULL if the file does not exist when you open it for read access.
Another event happens when reading files. Eventually, you read all the data. Subsequent reading produces errors because there is no more data to read. C provides a solution to the end-of-file occurrence. If you attempt to read from a file that you have completely read the data from, C returns the value EOF, defined in stdio.h. To find the end-of-file condition, be sure to check for EOF when performing input from files.
Random File Records
Random files exemplify the power of data processing with C. Sequential file processing is slow unless you read the entire file into arrays and process them in memory. Random files provide you a way to read individual pieces of data from a file in any order needed and process them one at a time.
Generally, you read and write file records. A record to a file is analogous to a C structure. A record is a collection of one or more data values (called fields) that you read and write to disk. Generally, you store data in structures and write the structures to disk, where they are called records. When you read a record from disk, you generally read that record into a structure variable and process it with your program.
Unlike some other programming languages, not all C-read disk data has to be stored in record format. Typically, you write a stream of characters to a disk file and access that data either sequentially or randomly by reading it into variables and structures.
The process of randomly accessing data in a file is simple. Consider the data files of a large credit card organization. When you make a purchase, the store calls the credit card company to get an authorization. Millions of names are in the credit card company's files. There is no quick way the credit card company could read every record sequentially from the disk that comes before yours. Sequential files do not lend themselves to quick access. In many situations, looking up individual records in a data file with sequential access is not feasible.
The credit card companies must use a random file access so that their computers can go directly to your record, just as you go directly to a song on a compact disc or a record album. The functions you use are different from the sequential functions, but the power that results from learning the added functions is worth the effort.