Last time, I discussed how to create a form using HTML code. I mentioned at that time
that a server-side program is required to read the data submitted by the user from the form. In
this article I will discuss how to create a server-side program for reading form data. Let me say
at this point that this is a complex topic. I will only be able to skim the surface. I have a couple
of references at the end of the article for people who want to learn the details.
CGI Programs
When submitting form data from a Web page, a CGI (Common Gateway Interface) program is required. A CGI program is one that receives its starting commands from a Web page, usually one containing a form. Before a CGI program executes, the Web server creates a special processing environment in which the program operates. Such things as HTTP (Hyper-Text Transport Protocol) headers are parsed into environment variables that your CGI program can use. Other information is also made available. The program can use this information in whatever way it chooses. It should then output some information. The output includes some HTTP response headers and possibly some HTML code. Some of this is done for you by the server. I will only discuss what needs to be done by the program itself.
Many different programming languages can be used for CGI programs. A shell script, a C program, or a C++ program would work fine. However, the language of choice for these type of server-side programs is Perl. Never heard of it? Well, I hadn't either just a few months ago. After I investigated it, I found out why it is so popular. Perl programs are stored as ASCII files and then parsed for errors before they are executed. Perl makes use of associative arrays which is very useful when storing form data. Also, Perl is free. I use a Perl program for reading the data from the HTML form discussed in the last installment.
For security reasons, some Internet Service Providers (ISPs) require that all CGI
programs reside in a root directory named /cgi-bin. Other do not have this requirement. You
need to find out what your ISP requires before creating a CGI program. My ISP allows CGI
programs to reside anywhere on their server as long as the program name has a .cgi extension. I
put all of my CGI programs in a subdirectory named cgi-bin in my personal Web page directory.
GET or POST
I mentioned last time that a form can submit data to a CGI program using the GET
method or the POST method. The GET method creates an environment variable named
QUERY_STRING containing a specially encoded string of characters with the form data in it
and then executes the CGI program. The program can read this environment variable and decode
it. The POST method executes the program first and then feeds the encoded string via its
standard input (STDIN), which should be familiar to all you C programmers out there. Again,
the program can read the string and decode it. The disadvantage of the GET method is that the
server may have a maximum length for environment variables. It the form allows a lot of data to
be entered, it could overflow the buffer. The POST method does not have this disadvantage and
is thus the preferred method of submitting data.
URI Encoding
When form data is submitted to a CGI program, either using the GET method or the
POST method, it is first encoded into one long string. A URI (Universal Resource Identifiers;
also called URL or URN) encoding scheme is used for this task. The rules for this type of
encoding are:
1. Create name/value pairs for each input field on the form. The name is supplied by the NAME attribute of a control on the form (see last installment for details). The value is the data supplied by the user for that control, a default value, or an empty value.
2. Names and values are separated by equal signs (=).
3. Name/value pairs are separated from each other by ampersands (&).
4. Spaces are converted to plus signs (+).
5. Reserved characters (there are many) and special characters (such as %) must be encoded by
converting them to their HEX values in the form %HH where H is a hexadecimal digit.
If you look at the HTML code in the last installment, you will see that all of the controls have a
NAME attribute associated with them. The first four controls are named LASTNAME,
FIRSTNAME, CITY, and STATE. If my name is Joe Van Dyke, Jr. and I live in New York, NY,
then the first part of the URI encoded string for this form would be:
LASTNAME=Van+Dyke%2C+Jr%2E&FIRSTNAME=Joe&CITY=New+York&STATE=NY
Notice the plus signs where spaces should go, the equal sign separating the name and values, and the ampersand separating the name/value pairs. Two reserved characters, a comma and a period, are converted to their hexadecimal values, %2C and %2E. This string, with the additional name/value pairs not shown, will be assigned to the environment variable QUERY_STRING if the GET method is used by the form, or the string will be passed to the CGI program's standard input if the POST method is used.
Of course, once the CGI program receives the string, it must be decoded before the data is
useful. Fortunately, there is a free routine available that automates this task. It will be discussed
in the next section.
Finally, Reading the Form Data
Listing 1 shows a Perl program I wrote entitled survey.cgi. It reads in the URI encoded string discussed above, decodes it, and then produces some HTML code to be sent back to the user's browser. Let's see how it works.
The first line of the program is: #! /usr/local/bin/perl. All of you UNIX buffs are already overly familiar with the meaning of this line. It tells the shell that the following code should be executed by the Perl interpreter. By making this the first line, the program can be executed by just typing the name of the program, survey.cgi, rather than typing "perl survey.cgi". Some UNIX systems do not support this feature, so check it out first.
The next line is: require("cgi-lib.pl");. This statement tells the Perl interpreter that an
external library of routines, named cgi-lib.pl, will be required by this program. The cgi-lib.pl
library is a nifty (a little 60's lingo) library of Perl subroutines for performing some common CGI
tasks. The library was written by Steven Brenner and is freely distributable. The library may or
may not be available on your ISP's server. It was not on mine. Therefore, I put a copy of it in
my personal cgi-bin subdirectory. Make sure you put the proper path in the require command.
Parsing the URI Encoded String
The cgi-lib.pl subroutine that my program uses is ReadParse. It is used in the third line of
survey.cgi. In order to give you a taste of Perl, I show the ReadParse routine in its entirety in
Listing 2. I cannot describe it in detail, but here is a synopsis. A local variable named "in" is
created and will be used to store data from the URI encoded string. The final decoded data will
be passed back to the calling program using the variable passed to ReadParse (in this case, input).
ReadParse determines whether the GET or POST method is being used to pass the URI encoded
string. If the GET method is being used, $in (a scalar variable), is equated to the environment
variable QUERY_STRING. If the POST method is being used, $in is read from STDIN. In this
case, the length of the input is determined from the environment variable CONTENT_LENGTH.
At this point, let's use the URI encoded string discussed earlier as our example string and see
what happens to it as it is processed by ReadParse.
1. At this point,
$in = LASTNAME=Van+Dyke%2C+Jr%2E&FIRSTNAME=Joe&CITY=New+York&STATE=NY
2. The split function divides the scalar variable, $in, into an array, @in, of name/value pairs. Now,
$in[0] = LASTNAME=Van+Dyke%2C+Jr%2E
$in[1] = FIRSTNAME=Joe
$in[2] = CITY=New+York
$in[3] = STATE=NY
3. A loop is executed where $I is set equal to values between 0 and one minus the number of array elements in @in. The latter number is represented by $#in. Within the loop, several operations are performed on each element of the @in array. First, plus signs are converted to spaces. Now,
$in[0] = LASTNAME=Van Dyke%2C Jr%2E
$in[1] = FIRSTNAME=Joe
$in[2] = CITY=New York
$in[3] = STATE=NY
4. The name=value strings are split into two scalar variables $key and $val, where the former will equal the name and the latter will equal the value. Now,
On 1st pass:
$key = LASTNAME
$val = Van Dyke%2C Jr%2E
On 2nd pass:
$key = FIRSTNAME
$val = Joe
On 3rd pass:
$key = CITY
$val = New York
On 4th pass:
$key = STATE
$val = NY
5. The hexadecimal numbers in the $key and $val strings are converted to their alphanumeric values. Now,
On 1st pass:
$key = LASTNAME
$val = Van Dyke, Jr.
On 2nd pass:
$key = FIRSTNAME
$val = Joe
On 3rd pass:
$key = CITY
$val = New York
On 4th pass:
$key = STATE
$val = NY
6. Finally, an associative array, %in, is created where the indexes of the array are the $key strings and the values of the array elements are the $val strings. Multiple values for the same name, such as from a multi-select list box control, are separated by the characters \0. This is what will be passed back to the input variable in the survey.cgi program. Now,
$in{'LASTNAME'} = Van Dyke, Jr.
$in{'FIRSTNAME'} = Joe
$in{'CITY'} = New York
$in{'STATE'} = NY
The final construct, an associative array, may be new to you. It uses strings to index an array rather than a numeric value. This construct is not unique to Perl; other languages use it. However, it is rather rare. It is easy to see how useful it is for handling form data. You can easily obtain the value from a control by indexing the array with the name of the control.
As you have noticed, Perl can be very powerful, performing complex tasks with simple
commands. However, it can be rather cryptic. I do not have the space to delve deeper into Perl
programming. Whole books have been written about this topic. I list two that I own at the end
of this article.
Cleaning Up the Form Data
Now that the form data has been parsed into an associative array, it is time to use it for something. It is generally a good idea to regurgitate the data entered by the user so she can check it for errors. If the information is correct, it can be passed on to another CGI program for final handling. If the data is incorrect, the survey page needs to be redisplayed so the user can re-enter the data. The purpose of survey.cgi is to regurgitate.
After parsing the data with ReadParse, survey.cgi next separates the multiple values for the INTERESTS control by substituting a comma and a space for each occurrence of \0. This procedure is not necessary for the other controls as they do not accept multiple inputs.
Check box controls that are checked by the user pass a value of "on" to the CGI program.
Unchecked boxes do not send a value at all. These values are no good for displaying to the user.
Therefore, the $input{'SENDINFO'} array element is set equal to "Yes" if the box is checked
and a value of "No" otherwise.
Creating an On-the-Fly Web Page
A Web browser expects to receive an HTTP response header before any other
information so it can know how to handle what follows. There are many different response
headers. One of the most common ones is the Content-type response header. This header
informs the browser what type of data is about to be passed to it. The print command can be
used in a Perl CGI program to pass information back to the browser. Therefore, the command
print "Content-type: text/html\n\n";
sends a message to the browser that some HTML code is about to be sent. The rest of survey.cgi is devoted to sending the HTML code. You C programmers will recognize the \n as a newline character. All response headers must end with two newline characters or an error will occur.
The next line,
print <<"ending_tag";
tells Perl that all of the following lines of text, up to the label ending_tag, should be printed as though each line had the print command in front of it. The double quotes around the tag name specify that variable names in the text should be replaced with their current values before printing the line. (Note: If single quotes are used, as in print <<'ending_tag';, then no substitutions will occur.)
You will recognize the lines to be printed as standard HTML code. Basically, a table with two columns is created. The first column contains a description of the information provided by the user. The second column contains the information itself. This format makes it easy for the user to verify whether the information he provided is correct.
Near the top of the HTML code is a form. The ACTION attribute for this form is set to execute yet another CGI program named handlesurvey.cgi. This form is unusual in that it does not have any displayable controls for entering information! However, it does contain 11 controls of type hidden. These controls have names that are the same as the controls on the form discussed in the last installment. The values of these hidden controls are equal to the values parsed from the previous form. (Remember that the $input{} associative array elements will have their actual values substituted before the lines are printed.) Hidden controls are a way of passing data through an intermediate form without the data actually being displayed on the form. The current form also has two submit buttons, one with a value of "Information is Correct" and the other with a value of "Information is Incorrect". The CGI program handlesurvey.cgi will determine which button was selected and take the appropriate action. Handlesurvey.cgi will be discussed in the next installment.
Figure 1 shows the survey form filled out with information about myself. Figure 2
shows the verification page that is created on the fly by the survey.cgi program.
What Have We Accomplished?
We have covered a lot of territory in a short amount of time. I'm sure some of you are totally confused at this point, as would I if I were hit cold with this article. This is unfortunate. It would have been better to examine how to program in Perl in greater detail before presenting this information. However, this was simply not possible. Perl is a substantive language with many special variables, operators, etc. It would require an entire series of articles on Perl programming alone. Therefore, I am taking a surface level approach, presenting only an overview of how to write server-side CGI programs using Perl to make use of form data. If you want to learn all the details, you need to purchase some good books about Perl and/or CGI programming and start digging in.
As far as the CGI program in Listing 1 goes, it did the following:
1. Parsed the URI encoded string created from the user's input data and created an associative array containing this information.
2. Cleaned up some of the input data such as separating multiple values with a comma and presenting check box data in a more readable way.
3. Sent an HTTP response header back to the browser to inform it that HTML code was on the way.
4. Sent HTML code on-the-fly. The code created hidden controls for passing information
on to another CGI program. A table was used to display the user's input. Two submit
buttons were created for the user to verify that the data is correct or that it is not.
Until Next Time
Next time I will discuss the final step in handling the form data. Until then, get some
CGI and Perl books and start reading. Perhaps then some of the stuff I discussed in this article
will make some sense. Below are two books I own that I have found useful. Remember to visit
my Web site, http://fly.hiwaay.net/~rcfinch, and fill out my survey form.
For learning Perl:
Wall, Larry and Schwartz, Randal L.; Programming Perl; O'Reilly & Associates, Inc.; 1991.
For learning CGI programming using Perl:
Herrmann, Eric; Teach Yourself CGI Programming with Perl in a Week; Sams Net; 1996.