Sie sind auf Seite 1von 41

Data Validation &

Research
Agenda
Validation
Data validation definitions & concepts
Data Validation in Excel
Research
Limitations
Validation
Checking the reasonableness of the data before it
is processed
GIGO: garbage in, garbage out.
Processing rubbish data gives rubbish information
◦ Eg: If all the foremen in the factory just made up the
numbers on the workers' time cards, working on the pay
cheques would be totally pointless
Validation is ensuring inputted data is of the
right type (e.g. numeric) and within
reasonable limits (e.g. ages between 1 and 130)
Databases and spreadsheets can have validation
rules built into data fields to reject impossible
entries
Validation can include
Existence:
is some essential data simply missing, such as a name?
Reasonableness: does it seem logical?
◦ Checking hours are within reasonable limits (e.g. anything over say 50
hours a week may be questioned)
Consistency: Checking for inconsistencies in surveys (e.g. a
person claims to be unemployed but earns $80,000 a year.) Some
surveys ask similar questions in different parts of the survey to
check whether people are lying when they answer.
Type check: e.g. have letters been entered instead of digits?
Format check: e.g. an ID must be three letters followed by 4
digits. Is date entered as dd/mm/yy?
Range check: is a date in August between 1 and 31?
◦ Sending data back to its source for confirmation before it is entered into a
system (e.g. people joining a club might be sent back the data they put on
their application forms: they have to confirm it is correct before the data is
entered into the club database)
Using a check digit to validate a credit card number

Note: Validation does not check that data are accurate (e.g.


that when Fred says he's 23, he actually is 23) but it can highlight
suspicious entries
Data can be valid, but inaccurate!
Process to ensure the quality of data by checking they have been
entered correctly

A set of rules you can apply to form fields to restrict the type of
information site visitors enter into forms
◦ For example, you can set rules so that only letters, and not numbers

A process used to determine if data are incomplete, or


unreasonable. The process may include format checks,
completeness, checks, check key tests, reasonableness checks, and
limit checks

A systematic effort to review data to identify any outliers or errors


and thereby cause deletion or flagging of suspect values to assure
the validity of the data to the user.

A term used to describe the process of evaluating data once it has


been entered into a software program. Using a set of rules, which
may contain a range of acceptable values, the evaluation results in
either the entry being accepted or rejected.
Methods for enforcing valid data entry
Verification- the process of entering data twice,
and comparing the two entries to find differences.
◦ Enter a new password twice to ensure it has been
entered accurately

Limited lists (value lists) - Some controls that


help enforce validation rules are limited lists 
◦ Eg: where the user must choose data from a list of
options. This is perfect to ensure they enter only
legitimate values in a form that can be understood

Calendar controls are good for preventing the


entry of invalid dates.
Note that limited lists are NOT a validation measure - they do
not check for invalid data; they prevent invalid data being
entered in the first place
BE CAREFUL ABOUT...
Validation only involves checking the reasonableness 
(not the accuracy) of input data - usually checks of existence,
type and range
◦ e.g. if a student in year 7 says he's 13, the data is valid (unlike an
age of 99 or -3), but not necessarily accurate - he could be 12

Validation
is not the same as testing. Testing ensures
the output is accurate

Efficient processing
means manipulating data in a way that is
not wasting time, money or effort

Effective validation is that which works well and leads to


valuable, reliable output.

Remember: validate input data, and  test


output information (e.g. the answers produced by
calculations, the readability of printed text)
DATA VALIDATION DEFINITIONS AND
CONCEPTS
 Making sure that all data (whether user input variables, read
from file or read from a database) are valid for their intended
data types and stay valid throughout the application that is
driving this data
 User Interactivity Screens And Forms
◦ Human error always end up as the prime suspect for invalid data

 File Manipulation Routines


◦ Entails all file related operations such reading from the file or writing to the
file

◦ Database driven applications or even just a game screen for the settings

 Import and Export Routines


◦ Plan to have your application able to save the same data in different file
formats for different other applications
Types of validation
Degree of importance (as far as the data
available to the application) and the actually type
of the data as well
 Electronic validation

 Manual validation

 Field Level Validation


 Form Level Validation
 Data Saving Validation
 Search Criteria Validation
Electronic validation
Performed by the RDBMS, spreadsheet, program etc
◦ range checking in database (e.g. age between 5 and 25)
◦ existence checking (is box filled in?) [may be referred to as 'null testing']
◦ spell checking (treat electronic spellchecking with great suspicion - treat it
more like a "typing error checker"!)
◦ validation rules in databases, spreadsheets (rejects or queries dubious
input)
◦ spreadsheet formulas to check values in other cells
 (e.g. =IF(AND(A10< >"M",A10< > "F","Gender must be M or F","")

Note

◦ Using drop-down limited lists to prevent erroneous data entry is not really
a validation technique which checks for invalid data : it is a tool
to prevent invalid data being entered
◦ Do not use electronic validation for data that must be checked
with human judgment (e.g. people's surnames, dates of events). For that
you need manual validation...
Manual validation
Proof-reading a document to see if it makes sense

Checking the placement of graphics and text on a

page/screen

Items that require human common sense to tell if they are

valid or not

Checking if an image is too blurry or obscure

Check color combinations for readability and attractiveness

Checking spelling of words or names that are not in a dictionary

Checking that data is complete


◦ (e.g. that "Printing" is included in a set of instructions)
Field Level Validation
Typical scenario you have here is you present a

screen with a list of values that you will need the

user to enter

Usually this entails that once the data is entered, it

will be saved to a data file, or a database

More than one thing you can do, at the field level

(for each value to be entered) to avoid errors caused

by human interaction

Not all forms need a field level validation


Form Level Validation
Same scenario as above but no field need to be entered

before any other fields on the form

User will enter the information and validation is done

once, for the whole form, usually before a save of the

information is needed

Client information screens and other general information

inquiry screens can do fine with a simple form level

validation

Mandatory fields are provided by the user and the rest of

the information generally of the expected type or empty

(if any field can be considered optional)


Data Saving Validation

Performed at the routine that will be performing the actual

saving of the information to the file or database record

Used in option screens or multiple data entry forms that all need

to be entered before the record is physically saved


◦ For example, an option screen typically has tabs offering more than one

page of information that can be set by the user

The user can go to any and all of these tabs, change the

values that are needed to be changed and the data gets saved

once the user presses a "Save" or "Ok" button of some sort


Search Criteria Validation
This type of form (a search for form) can do without data
validation

Think of the time saved, make sure that results were


actually returned and that those results were relevant to
what the user is looking for to a certain degree

In many cases this might not be important , but it like to


believe that in other cases, this type of validation would be
well sought after and most definitely appreciated by the
users of your application
REMOVING/MINIMIZING HUMAN ERRORS

"prevention is the best policy"

The best place to prevent human error is of

course at the data entry screen level

Different techniques

◦ Range Validation

◦ Lookup Validation

◦ Masked Input Validation


Range Validation

Numeric values or even dates

Make sure that a value entered is within a range of specific values


◦ Note that this could apply to characters as well. For example say you're

making a questionnaire application that offers multiple choice questions

wouldn't be of much use to accept Z or any other letters if the only

choices are A, B, C, D or E

If there is a reason to have a minimum or maximum value, of any

data type, then a range validation routine would become

mandatory
◦ Another example is say you have a field that expects a numeric entry.

◦ You could code to only accept numeric keys, the decimal point and the

minus sign and reject the rest of the keys


Lookup Validation

Value entered needs to be compared to a list of possible


values.
◦ A good example that relates to our Mortgage calculation program
is the number of compounding periods per year
◦ Most financial institution would need something like 1, 2, 3, 4, 6,
12, 24, 26, 52 and 365 (or 366 for leap years)

Value entered and compare it against that list of values to


report if it's a valid entry or not

Perhaps a drop down list offering the choices would be


good, this way the user can only select a valid value and
therefore no validation per se is needed for this field

Anytime you can integrate something to force the user (so


to speak) to have no choice but to enter valid data only
Masked Input Validation
As a list of examples, a telephone number, a zip code, a social
insurance number and a UPC code all have one thing in common

Masked or filtered input is the art of only allowing valid characters


to be entered

Typically,masked input is present to give the user an indication of


the type of information is required by guiding them through the
process of entering the value in a field

Visual aids could also be used for example, if a date is expected in


a given field value, perhaps a little popup[ calendar to let the user
pick the date to be entered visually would be good since it would
then be impossible to select an invalid date from a calendar. This
is but an example

Entering the expected data properly will help your application work
Effectively
Data Validation in Excel
RejectInvalid Dates
Budget Limit
Prevent Duplicate Entries
Product Codes
Drop-down List
Dependent Drop-down Lists
Basic Steps
To make sure that users enter certain
values into a cell
Data Validation Example
Create Data Validation Rule
Input Message
Error Alert
Data Validation Result

Note: to remove data validation from a cell, select


the cell, on the Data tab, in the Data Tools group,
click Data Validation, and then click Clear All. You
can use Excel's Go To Special feature to quickly
select all cells with data validation
Data Validation Example
Restrict
users to enter a whole
number between 0 and 10

Select cell C2

On the Data tab, in the Data Tools


group, click Data Validation
Create Data Validation Rule
On the Settings tab
Input Message
Appear when the user selects the
cell and tell the user what to enter
Error Alert
Ifusers ignore the input message and
enter a number that is not valid, you can
show them an error alert
Data Validation Result
Select cell C2
Try to enter a number higher than
10

Result
Reject Invalid Dates
Data validation to reject invalid
dates
On the Data tab, in the Data Tools
group, click Data Validation
Outside a Date Range
 In the Allow list, click Date.
 In the Data list, click between.
 Enter the Start date and End date shown below and click
OK

 Enter the date 5/19/2016 into cell A2


Example
 Explanation: all dates between 5/20/2016 and
today's date + 5 days are allowed. All dates
outside this date range are rejected
 Result. Excel shows an error alert
Sundays and Saturdays
 Select Allow list, click Custom.
 Choose Formula box, enter the formula shown below and
click OK
Budget Limit
 To prevent users from exceeding a budget limit
 Select the range B2:B8
 On the Data tab, in the Data Tools group, click Data Validation
 Click Allow list, click Custom.
 Choose Formula box, enter the formula shown below and click OK

 Result. Excel shows an error alert. You cannot exceed your budget limit of $100.
Prevent Duplicate Entries
 To prevent users from entering duplicate values
 Select the range A2:A20

Result: Excel shows an error alert. You've already entered that invoice
number
Product Codes
 To prevent users from entering 
incorrect product codes
 Select the range A2:A7
 Enter an incorrect product code

 Result. Excel shows an error alert


Drop-down List
 On the second sheet, type the items you want to appear in the drop-down list

 On the first sheet, select cell B1

 You can also type the items directly into the text box, without using Sheet2. This gives the
exact same result.
Dependent Drop-down Lists
 The user selects Pizza from a drop-down list

 As a result, a second drop-down list contains the Pizza items

 To create these dependent drop-down lists, execute the


following steps
 1. On the second sheet, create the following named ranges
Research - Scope and benefits
 Scope
 Recorded factual material commonly retained by and accepted in the
scientific community as necessary to validate research findings

 Benefits
◦ Reinforcing open scientific inquiry

◦ Stimulating new approaches to data collection and methods of analysis

◦ Increasing awareness of research in related areas leading to more


opportunities for collaboration

◦ Allowing re-use of data for research not foreseen by the initial


investigators – this increases the efficiency of use of public funding by
avoiding unnecessary duplication of data collection

◦ Permitting the creation of more highly powered data analysis by


combining data from multiple sources

◦ Facilitating education of new researchers and the wider public


Research process
 Planned sequence that consists of the following
six Steps

1. Developing a statement of the research question

2. Developing a statement of the research hypothesis

3. Defining the instrument (questionnaire, unobtrusive


measures)

4. Gathering the data

5. Analyzing the data

6. Drawing conclusions regarding the hypothesis


Basic Approaches to Research

Non-experimental Research
Experimental Research
Non-experimental Research
 Non-manipulative, correlation or observational
research

 A naturally occurring variable is a variable that is not


manipulated or controlled by the researcher(Measure as it
normally Exists)

 Response variable
 Outcome variable or criterion variable
 Predict from one or more predictor variables
 Focus of a study because it is mentioned in the statement of the research
problem

 Predictor variable
 Variable used to predict values of the response
 Even believe that the predictor variable has a causal effect on the
response
 Predictor variable is also known as the independent variable
Experimental Research
Three Major characteristics
◦ Subjects are randomly assigned to experimental conditions
◦ The researcher manipulates an independent predictor variable
◦ Subjects in different experimental conditions are treated similarly with
regard to all variables except the independent variable

 Independent variable
 Variable whose values (or levels) the experimenter selects to
determine what effect this independent variable has on the
dependent variable
 Experimental counterpart to a predictor variable

 Dependent variable
 Subject’s behavior assessed to reflect the effects of the
independent variable
 Experimental counterpart to a response variable
Limitation of Research
Some of the researchers can hide
the real information
A sample size cannot always
represent the whole population
Time and money was one of the
constraints while conducting the
research

Das könnte Ihnen auch gefallen