Input with GET method

This pages shows how to obtain and parse user input from the GET method.

Programming Issues

With the HTTP GET method, user input is passed to the CGI script via the QUERY_STRING environment variable.

User input is URL encoded, which entails the following:

Input is passed in the form of name/value pairs. Each pair is separated with an & character, and each name is separated from its value with an = character;
Spaces are not allowed in URLs, so are replaced by the + character;
Certain characters need to be encoded. These characters inclued:
- Non-ASCII characters;
- ASCII control characters;
- "Reserved characters", including:
  - The dollar sign ($);
  - The ampersand character (&);
  - The plus character (+);
  - Commas (,);
  - Forward slashes (/);
  - Colons (:);
  - Semi-colons (;);
  - The equals character (=);
  - The question mark (?); and
  - The "at" character (@).
- "Unsafe characters" should be encoded. These include, amongst others, spaces, quotation marks, less-than and greater-than characters, the pound character (#) and the percent character.
All these characters are encoded using a percent character followed by their two-digit hexadecimal representation of the ISO-Latin code point for the character. For instance, spaces are replaced by %20, less-than symbols are replaced by %3C, and the percent character itself is replaced by %25. Note that spaces may be represented by the + character as well as by %20, which is why you need to encode the actual + character if you want to include it.

There are therefore three stages to getting the user input into a usable form:

Retrieve the input from the QUERY_STRING environment variable;
Extract the name/value pairs and store them in an appropriate data structure; and
Decode the input.

These stages are not necessarily distinct, as we will see.

Here is how our CGI script accomplishes the task:

Use Perl's split function to extract name/value pairs from the QUERY_STRING environment variable into a list;
Replace + characters with spaces. We can do this before separating the pairs into a name and a value.
Use Perl's split function again to split the name/value pairs into a name and a value;
Decode the URL encoded special characters. Note that we must split the name/value pairs into a name and a value before we do this, as one of the URL encoded special characters may very well be an equals sign, and if we decoded this first we may end up with more than two fields in our pair!
Replace any &, <, > and " characters with their HTML character entity references so that they display properly and to avoid any sneaky SSI attacks. Note that we only need to do this if we are going to directly output the input.
Assign the name and its associated value to an associative array, for easy recall.

Aside from outputting a table with our name/value pairs, that's all there is to it!

Note than when submitting data from a form, your browser will automatically create an appropriate URL, complete with query string, using the name attribute of your form controls and either the value attribute or the value entered by the user.

For our purposes, we also exit the script if the GET method is not used, or if the QUERY_STRING environment variable is empty. Since the whole point of this script is to demonstrate parsing the query string, which is only passed with the GET method, we are at perfect liberty to reject any other form of input.

As a final note, when you run the script you'll see that Perl doesn't sort associative arrays by name; they are deliberately randomised to help search efficiency. Thus, although you may see name/value pairs in the order name1, name2 and name3, they won't necessarily output in that order. Of course, if we wanted to manually sort them in any particular way, we could do so easily.

Usage

You can call the script directly, using URLs such as the following (which include some badly formed URLs — marked with * — as examples):

You can also call it from a form, such as the one below:

Type your favourite word:
Choose a fruit:
Which is best?	Guns Beer Planes
Choose all the languages you know:	C Perl JavaScript Cobol
Type something interesting:	The square of the longest side on a right-angled triangle is equal to the sum of the squares of the other two sides.

Note that we end up with multiple values in the "langsknown" variable because we have used the same name attribute for each of the check boxes. In the CGI script, we have used "---" as the multiple separator, but in practice we would probably use \0 in case the user actually entered "---". Using \0 also allows us to treat langsknown as a Perl list if we want to do that.

Also note that the POST method is more appropriate than GET for some forms. There are two reasons for this:

Although RFC 2616, which describes the HTTP/1.1 protocol, does not mandate a maximum length for URLs, it does not require servers to support unbounded length URLs either. It does say that:
Servers MUST be able to handle the URI of any resource they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs.
but your average commercial web hosting company has no idea whether its customers are implementing GET-based forms on their web sites, so you cannot depend on them doing this. Also, most user agents implement a URL-length limit. At the time of writing, for example, Microsoft Internet Explorer puts a limit of 2,048 characters on URLs. This includes the URL of the script, so the amount of characters free for actual form data is less than this. Unless you want to alienate over 50% of your users, if you use forms which could possibly accept more data than this, use the POST method.
It has been stated — by its inventor — that it is a principle of the HTTP protocol that:
In HTTP, GET must not have side effects … In HTTP, anything which does not have side-effects should use GET.
In other words, if your form has side effects, such as adding a user to a mailing list, or updating a database, then it should use the POST method. This is because "the implication is that the GET operation in HTTP is an operation which is expected to repeatably return the same result" and because "a user can never be held accountable to anything as a result of doing a GET". This is a philosophical, rather than a legal, axiom, but it is still a good idea to follow it. If the purpose of a form is to send some data to the server in order to change state, then use POST. If the purpose of a form is not to do this (e.g. a search form on a search engine) then GET is appropriate. The World Wide Web is a better place for everyone if everything works as it is expected to work.

Source and Downloads

View the Perl source code

Home > Computer programming > CGI with Perl > Basic techniques > Input with GET method

Input with GET method

Programming Issues

Usage

Source and Downloads

Navigation menu

Search this site

Conformance

Ads