A common web task in web application development is to validate data input by a user. This is
usually done to ensure that the type of data entered matches the expected data types for an
underlying database. It is also a good security practice to limit the data that a web based form will
accept. While it is common in web 2.0 applications to use client side code such as Javascript to
validate form fields, this should not be relied on for a number of reasons. Some users disable
Javascript so you want to provide a working interface for these users. From a security perspective,
you have to keep in mind that it is possible to post data to a web form without loading the form and
its client side code.
If you develop using a PHP framework such as CakePHP, there is a built in validation system that
makes validating form data trivial. There are also a number of third party scripts and libraries
available that can provide a series of handy classes for validating form data. These libraries can be
great time savers but for various reasons, you may need to write your own form validation routines.
While this type of defensive coding will help many users breeze through your forms, there will still be
those times when a user enters incorrect data or an incorrect format. The best way to handle this is
to provide a useful error message. Some forms will display all the validation errors at the top of
form. While this helps users see what errors occurred, it is far more helpful to display the error
message next to the field that caused the error. This allows a user to scan down a form and fix all
their errors in a single pass. To accomplish this, I use an array called $form_errors. I key the array
using the name of the form field that caused the error. I also order my validation rules in order of
priority. This is because I generally only want to display one error per field or because failing one
validation may make other rules invalid. For example, if a required field is left blank, you want to
display that the field is required. It is not necessary to show errors regarding format or data type at
this stage. I display errors in the label of the form field between <strong> tags. I can then use CSS
to style my errors, most often by making them red. Here’s a snippet of the code to display an error:
<label for="First Name">First Name:
<br><strong>
<?php
if (isset($form_errors['first_name'])) {
echo $form_errors['first_name']; }
?>
</strong></label>
<input type="text" name="first_name" id="id_first_name" maxlength="40" />
If you do need to display multiple errors, you can make a multidimensional array. Then simply iterate
over the array for each form field into an unordered list.
This function simply makes sure that a value was set for the field and that it is not an empty field.
This second step is important because an HTML form will return an element for a text input even if
the text field is empty. You can use this function to test whether a required field is present and then
set an appropriate error if the function returns false.
This function defines the general formats we’ll accept and places them in an array. Each format
uses # where a number should be. We then use the preg_replace function which uses a regular
expression to make replacements. In our function we’re telling it to replace any digit as indicated by
[0-9] with a # character. The trim function removes any trailing whitespace. The result is that the
user’s input will look a lot like one of our formats with #’s in place of numbers. The final step is to
see if the modified phone number is contained in our $formats array. If it is, the user entered 10
digits in an appropriate format. Otherwise, the phone number fails validation. This technique can be
adapted to other data fields where formatting is variable but we know for certain that only digits will
be used. It’s also better if the number of digits is defined. As an example, social security numbers
and EINs used by corporations contain the same number of digits but differ only in where the
dashes are placed. You could use a $formats array like:
$formats = array('#########','##-###-####','###-##-####','##-###-####');
Validating Dates
There are a number of techniques for validating dates. The most common is to split the user’s input
into the component month, day, year and then use PHP’s checkdate function to see if it is a valid
date. This works better than a simple regular expression since it will catch errors that a regular
expression might miss such as a date like 11/31/2010. Here’s a sample function:
function validate_date($value) {
if (preg_match("/^([0-9]{2})\/([0-9]{2})\/([0-9]{4})$/",$value,$parts)) {
if (checkdate($parts[1],$parts[2],$parts[3])){
return true;
}
} else {
if (preg_match("/^([0-9]{2})-([0-9]{2})-([0-9]{4})$/",$value,$parts)) {
if (checkdate($parts[1],$parts[2],$parts[3])) {
return true;
}
}
}
return false;
}
In this function, we’re using a regular expression to split the user input into its component parts. Let’s
look a minute at this expression. The ^ symbol means to start at the beginning of the string. [0-9]{2}
means we’re looking for 2 digits. The \/ is simply escaping the forward slash since it has a different
meaning in regular expressions. Then [0-9]{2} means 2 more digits followed by a forward slash
again \/ and finally 4 digits: [0-9]{4}. We store the matched elements in an array called $parts. Then
$parts is passed to checkdate to see if it is a valid date. Notice that if this regular expression fails
we try again with the same regular expression but using “-” as a separator. This allows our users to
enter dates as 9/21/2010 or 9-21-2010. We could also test for a two digit year to make our system
more flexible as to the input it will accept.
While this will handle most date fields, sometimes we’re looking for the user to input a date but we
may not be storing it as a date or may not be able to validate it as a full date. As a for instance, I
recently coded an employment application that required a month/year date for the start and end
dates of a candidate’s employment history. There are two approaches that you can use when
validating these types of dates. The first is to use a regular expression and constraint what can be
accepted. Instead of the pattern [0-9]{2} for the month you would use something like [0-1]{1}[0-9]{1}
for the month. This pattern only allows a 0 or 1 for the first digit and any digit for the second. While
this will be “good enough” for many purposes, it does allow for invalid input such as 19 for the
month. A second method is to split the month and year as we did in the prior example and set the
day part to “01″. Then you can use the checkdate function to make sure that it is a valid date.
Validating Time
I mentioned that I recently coded a job application for a client. On this project, they had a section that
an applicant used to indicate their availability to work. They wanted a simple time format such as
07:00a or 12:45p. This validation was easily solved with a regular expression:
function validate_time($value) {
if (!preg_match("/^[0-1][0-9]:[0-9]{2}[ap]/",$value)) {
return false;
} else {
return true;
}
}
Again the ^ represents the start of the string. [0-1] indicates that the first digit must be a 0 or 1. [0-9]
indicates that the second digit can be any digit. After the : we’re looking for 2 digits and finally [ap]
means the final character should be an “a” or “p” only. While this function met the needs of my client,
it should be noted that it suffers from the same issue as the earlier month/year validation by regular
expression. It’s still possible for someone to enter invalid data such as 17:00a. For more accurate
validation, PHP’s strtotime function can be used. You’ll need to adjust your regular expression to
ensure that your user enters a time in one of the valid formats used by strtotime. Check the PHP
documentation for more information on strtotime and valid formats.
This will allow only lowercase letters, numbers, dashes, underscores and periods in the first part of
the email. While this is the most popular form of validation for an email address, it will prevent valid
email addresses from validating. Linux Journal ran an excellent article on this issue. In the article
they discuss RFC 3696 which deals with valid emails being rejected by popular validation methods.
While there are many good quality validation libraries available for PHP, developing your own library
of validation routines allows you to customize validation to fit your unique needs. This often leads to
better user experiences by accepting more formats and styles. Validation is an important step to
guiding users on what kind of data your web application is expecting. It is also a key step in
securing your application by narrowing the types of data a user can submit to your application.
While validation can increase security, sanitizing data being sent to a database is still a crucial
step.