Sie sind auf Seite 1von 32

POORNIMA COLLEGE OF ENGINEERING , JAIPUR

A MAJOR SEMINAR ON
Data Validation System For A Relational Database

Guided By: Ms. Swati Jain (HOD I.T Dept.)

Submitted To: Ms. Shazia Haque Mr. Manish Prajapati

Submitted By: Ajay Kumar (IT/09/18)

DEPARTMENT OF INFORMATION TECHNOLOGY POORNIMA COLLEGE OF ENGINEERING, JAIPUR

SEMINAR OUTLINE

Introduction About base paper Need of data validation Methods of data validation Data validation techniques Relational Database Relational model RDBMS Base & derived relation Relational operators Normalizations Conclusion References

AUTHORS
BAAH Barida Computer Science Department University of Port Harcourt Port Harcourt, Nigeria baridakara1@yahoo.com

Kabari, Ledisi Giok* (Member, IEEE) Computer Science department Rivers State Polytechnic, Bori, Nigeria ledisigiokkabari@yahoo.com

JOURNAL

(IJARCS) focusing on theories, methods and applications in computer science and relevant fields. It is an international scientific journal that aims to contribute to the constant scientific research and training, so as to promote research in the field of computer science. It covers areas like computer engineering, computer networks, biometrics and bioinformatics, database management system, Artificial Intelligence, Software Engineering and many more.

INTRODUCTION

Data validation is the process of ensuring that a program operates on clean, correct and useful data. The simplest data validation verifies that the characters provided come from a valid set. Incorrect data validation can lead to data corruption or a security vulnerability.

A validation process involves two distinct steps: (a) Validation Check (b) Post-Check action

NEED OF DATA VALIDATION


To avoid system failure. To check the validity and consistency of data before using the data set.

METHODS OF DATA VALIDATION

Character check The character check ensure that only the expected characters are present in a field. Batch totals- This checks the missing records. The numerical fields of the all records may be added together in a batch. Check digits- This check is performed for numerical data. In this check an extra digit is added to the end of the number that is calculated from the digits of that number. When the data are entered then the computer checks this calculation. Consistency checks- this methods checks data in these fields corresponds to the other fields. Control totals- In this type of checking a total is done on one or more columns of database which is available in almost all records of that table. Cross-system consistency checks- This type of check compares data in different system to confirm its consistency.

CONT

Data type checks- checks the data type of the input data and if it does not appear the desired data type then an error message will be displayed to the user. File existence check- This type of check, checks whether a file with the specified name exists. Format check- This type of check ensures that whether the data is in a specified format. For e.g. dates have to be in format DD/MM/YYYY. We can use regular expression for this type of checking. Hash totals- It is same as the batch total that is done on one or more numeric fields that appear in the tuple of a relation in the relational database. Limit check- Unlike range check, the data is check for only one limit i.e. upper limit or lower limit. Logic check- This type of checking ensures that whether the input value does not create an logical error.

CONT

Presence check- This type of check ensures whether the important data is not missed out. Range check- This type of check ensures that the entered data should lie in a specified range. Referential integrity- In a relational database if we want to link two table then primary key and foreign key are used. For foreign key validation the referencing table must refer to a valid tuple in the referenced table. Spelling and grammar check- This type of checking looks for the spelling and grammar errors. Uniqueness check- In this type of checking the uniqueness of desired values is checked. This can be applied to several fields like address, Mobile number etc.

DATA VALIDATION TECHNIQUES


1.

Accept Known good


Also known as whitelist or positive validation. In this the data is one of a set of tightly constrained known good values. Any entered data that doesnt match should be rejected. The data should beo o o

length checked Range checked if a numeric value Syntax or grammar should be checked

CONT
2. Reject Known Bad

Also known as negative or blacklist. The Reject Known Bad strategy is very dangerous, because we have to maintain the set of known bad data For this strategy we use regular expressions. So to validate the data the regular expression should run over every field. That is the reason this strategy is slow and not secure.

CONT
3. Sanitize In this rather reject or accept the entered data is converted into an acceptable format. Sanitize with Whitelist Any characters which are not part of an approved list can be removed, encoded or replaced.
Sanitize

with Blacklist Eliminate or translate characters (such as to HTML entities or to remove quotes) in an effort to make the input "safe". As most fields have a particular grammar, it is simpler, faster, and more secure to simply validate a single correct positive test than to try to include complex and slow sanitization routines for all current and future attacks.

RELATIONAL DATABASE

A method for structuring data in the form of sets of records or tuples so that relations between different entities and attributes can be used for data access and transformation. A database that is perceived by the user as a collection of two dimensional tables. Each table contains one or more columns those define the attributes of that table.

RDBMS

A database system made up of files with data elements in twodimensional array (rows and columns). This database management system has the capability to recombine data elements to form different relations resulting in a great flexibility of data usage.

RELATIONAL MODEL

In this model data is stored in tables. Each table contains columns for each field. Applications access data by specifying queries, which use operations such as select , project and join. The relational model contains the following components: Collection of objects or relations Set of operations to act on the relations Data integrity for accuracy and consistency

CONT

BASE AND DERIVED RELATION

Base
o o

Relations those store data. in implementations are called tables.

Derived o The relations those are derived from the base relations. o we can also apply operators on these derived relations. o In implementations these are called view or queries.

RELATIONAL OPERATORS

Queries made against the relational database, and the derived relations in the database are expressed in a relational calculus or a relational algebra. In total there are eight operators are found in relational theory, namely SELECT, PROJECT, JOIN, INTERSECT, UNION, DIFFERENCE, PRODUCT and DIVIDE.

OPERATOR: SELECT

Needs a single table as its operand. Can be used to list either all row values or it can yield only those row values that match a specified criterion.

OPERATOR: PROJECT

Uses a single table as its operand Yields all values for selected attributes

OPERATOR: UNION

Needs two tables as its operands Combines all rows from two tables, excluding duplicate rows. Tables, used as operands, must be UNION compatible with each other.

OPERATOR: INTERSECT

Needs two tables as its operands Yields only the rows that appear in both the tables Operand tables must be UNION compatible with each other

OPERATOR: DIFFERENCE

Needs two tables as its operands Yields all rows in one table not found in the other tablethat is, it subtracts one table from the other. Requires the UNION compatibility of the operand tables.

OPERATOR: PRODUCT

Needs two tables as its operands Yields all possible pairs of rows from the two tables. The yielded result is also known as the Cartesian product.

OPERATOR: DIVIDE

DIVIDE requires the use of one single-column table and one twocolumn table

OPERATOR: JOIN

Allows us to combine information from two tables Uses two table having a common attribute as its operands JOIN allows the use of independent tables, linked by common attributes, resulting in minimal redundancy possible.

NORMALIZATION

normalization is the process of splitting tables with redundant information into two or more tables The goal of normalization is to reduce or even eliminate data redundancy 1st Normal Form (1NF) o There are no duplicated rows in the table. o Each cell is single-valued (i.e., there are no repeating groups or arrays). o Entries in a column (attribute, field) are of the same kind.

CONT

2nd Normal Form (2NF) A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the key. 3rd Normal Form (3NF) A table is in 3NF if it is in 2NF and if it has no transitive dependencies.

Boyce- Codd Normal Form (BCNF) A table is in BCNF if it is in 3NF and if every determinant is a candidate key.

CONCLUSION

As Data validation has to do with client side or end user to ensure that only clean, correct and useful data are accepted while those data that are not useful to the relational database system are rejected by the display of an error message to alert the user or client while entering data into the database system.

REFERENCES

M. Arkady Data Quality Assessment, Technics Publication, LLC(2007) D. Scott and R. Sharp Specifying and enforcing application-level web security policies, IEEE knowledge Data Engineering, vol. 15, no. 4(2003) "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks", E.F. Codd, IBM Research Report, 1969 E. F. Codd, The Relational Model for Database Management, AddisonWesley Publishing Company, 1990

QUERIE S

Das könnte Ihnen auch gefallen