Input with GET method
This pages shows how to obtain and parse user input from the GET method.
Programming Issues
With the HTTP GET method, user input is passed to the CGI script via the QUERY_STRING environment variable.
User input is URL encoded, which entails the following:
- Input is passed in the form of name/value pairs. Each pair is separated with an
&
character, and each name is separated from its value with an=
character; - Spaces are not allowed in URLs, so are replaced by the
+
character; - Certain characters need to be encoded. These characters inclued:
- Non-ASCII characters;
- ASCII control characters;
- "Reserved characters", including:
- The dollar sign (
$
); - The ampersand character (
&
); - The plus character (
+
); - Commas (
,
); - Forward slashes (
/
); - Colons (
:
); - Semi-colons (
;
); - The equals character (
=
); - The question mark (
?
); and - The "at" character (
@
).
- The dollar sign (
- "Unsafe characters" should be encoded. These include, amongst others, spaces, quotation
marks, less-than and greater-than characters, the pound character (
#
) and the percent character.
%20
, less-than symbols are replaced by%3C
, and the percent character itself is replaced by%25
. Note that spaces may be represented by the+
character as well as by%20
, which is why you need to encode the actual+
character if you want to include it.
There are therefore three stages to getting the user input into a usable form:
- Retrieve the input from the
QUERY_STRING
environment variable; - Extract the name/value pairs and store them in an appropriate data structure; and
- Decode the input.
These stages are not necessarily distinct, as we will see.
Here is how our CGI script accomplishes the task:
- Use Perl's
split
function to extract name/value pairs from theQUERY_STRING
environment variable into a list; - Replace
+
characters with spaces. We can do this before separating the pairs into a name and a value. - Use Perl's
split
function again to split the name/value pairs into a name and a value; - Decode the URL encoded special characters. Note that we must split the name/value pairs into a name and a value before we do this, as one of the URL encoded special characters may very well be an equals sign, and if we decoded this first we may end up with more than two fields in our pair!
- Replace any
&
,<
,>
and"
characters with their HTML character entity references so that they display properly and to avoid any sneaky SSI attacks. Note that we only need to do this if we are going to directly output the input. - Assign the name and its associated value to an associative array, for easy recall.
Aside from outputting a table with our name/value pairs, that's all there is to it!
Note than when submitting data from a form, your browser will automatically create an appropriate URL, complete
with query string, using the name
attribute of your form controls and either the value
attribute or the value entered by the user.
For our purposes, we also exit the script if the GET method is not used, or if the QUERY_STRING
environment variable is empty. Since the whole point of this script is to demonstrate parsing the query string,
which is only passed with the GET method, we are at perfect liberty to reject any other form of input.
As a final note, when you run the script you'll see that Perl doesn't sort associative arrays by name; they
are deliberately randomised to help search efficiency. Thus, although you may see name/value pairs in the order
name1
, name2
and name3
, they won't necessarily output in that order. Of
course, if we wanted to manually sort them in any particular way, we could do so easily.
Usage
You can call the script directly, using URLs such as the following (which include some badly formed URLs —
marked with *
— as examples):
- http://www.paulgriffiths.net/cgi-bin/parsequery.pl?name1=value1&name2=value2
- http://www.paulgriffiths.net/cgi-bin/parsequery.pl?name1=value1&name2=value2&name3=value3
- http://www.paulgriffiths.net/cgi-bin/parsequery.pl?name1=
*
- http://www.paulgriffiths.net/cgi-bin/parsequery.pl?name1=&name2=value2
*
- http://www.paulgriffiths.net/cgi-bin/parsequery.pl?&name2=value2
*
- http://www.paulgriffiths.net/cgi-bin/parsequery.pl?&=name2=value2
*
- http://www.paulgriffiths.net/cgi-bin/parsequery.pl?name1=value1=somethingelse
*
- http://www.paulgriffiths.net/cgi-bin/parsequery.pl?name1=value1%3Dsomethingelse
You can also call it from a form, such as the one below:
Note that we end up with multiple values in the "langsknown
" variable because we have used
the same name
attribute for each of the check boxes. In the CGI script, we have used "---
"
as the multiple separator, but in practice we would probably use \0
in case the user actually entered
"---
". Using \0
also allows us to treat langsknown
as a Perl list
if we want to do that.
Also note that the POST method is more appropriate than GET for some forms. There are two reasons for this:
- Although RFC 2616, which describes the HTTP/1.1 protocol, does
not mandate a maximum length for URLs, it does not require servers to support unbounded length URLs either. It does
say that:
but your average commercial web hosting company has no idea whether its customers are implementing GET-based forms on their web sites, so you cannot depend on them doing this. Also, most user agents implement a URL-length limit. At the time of writing, for example, Microsoft Internet Explorer puts a limit of 2,048 characters on URLs. This includes the URL of the script, so the amount of characters free for actual form data is less than this. Unless you want to alienate over 50% of your users, if you use forms which could possibly accept more data than this, use the POST method.Servers MUST be able to handle the URI of any resource they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs.
- It has been stated — by its inventor — that
it is a principle of the HTTP protocol that:
In other words, if your form has side effects, such as adding a user to a mailing list, or updating a database, then it should use the POST method. This is becauseIn HTTP, GET must not have side effects … In HTTP, anything which does not have side-effects should use GET.
"the implication is that the GET operation in HTTP is an operation which is expected to repeatably return the same result"
and because"a user can never be held accountable to anything as a result of doing a GET"
. This is a philosophical, rather than a legal, axiom, but it is still a good idea to follow it. If the purpose of a form is to send some data to the server in order to change state, then use POST. If the purpose of a form is not to do this (e.g. a search form on a search engine) then GET is appropriate. The World Wide Web is a better place for everyone if everything works as it is expected to work.