Welcome to Scribd! Start your free trial and access books, documents and more.Find out more
2) improving the understanding of vendor purchases to negotiate volume discounts. The non-profit group GS1 is among the groups spearheading this movement. Matching may use "fuzzy logic" to find duplicates in the data. currency. accurate and time stamped GS1  History Before the rise of the inexpensive server. IFIP International Conference on Decision Support Systems (DSS2004): Decision Support in an Uncertain and Complex World. Alternatively. The totality of features and characteristics of data that bears on their ability to satisfy a given purpose.asp?id=6321) •Ivanov. In some organizations. 1993). This technology saved large companies millions of dollars compared to manually correcting customer data. consistent. massive mainframe computers were used to maintain name and address data so that the mail could be properly routed to its destination. and Shanks. Article (http:// mitiq. e.. In addition. T. unitn. org/gdsn/ dqf http:/ / www. Accroitre la qualité et la valeur des données de vos clients. com/ article. reducing measurement error. the question of internal consistency within data becomes paramount. Data Quality Requirements Analysis and Modelling. standards based. timeliness and accuracy that makes data appropriate for a specific use.  One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year. as bills and direct marketing materials made their way to the intended customer more accurately. the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. 2. adtmag. In fact. as low-cost and powerful server technology became available. Strong. November 1996. with Larry English perhaps the most popular guru. Glossary of data quality terms  published by IAIDQ  5. sims. the USPS and PricewaterhouseCoopers released a report stating that 23. It often recognizes that 'Bob' and 'Robert' may be the same individual. 6.S. cfm?articleId=1007211) •Wand. & Madnick.Batch and Real time . M. mail sent is incorrectly addressed. the sum of the degrees of excellence for factors related to data. it/ ~themis/ publications/ iciq08. transactional data. MIT has a Total Data Quality Management program. the International Association for Information and Data Quality (IAIDQ)  was established in 2004 to provide a focal point for professionals and researchers in this field. information-management. R. 86–95. verifying data integrity. Government agencies began to make postal data available to a few service companies to cross-reference customer data with the National Change of Address registry (NCOA).Data standardization . pp. which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality.umu.Matching or Linking . Y. One industry study estimated the total cost to the US economy of data quality problems at over US$600 billion per annum (Eckerson. Problems with data quality don't only arise from incorrect data. (1993). died.Once the data is initially cleansed (batch). Proc. ranging from data warehousing and business intelligence to customer relationship management and supply chain management. April 2002.com/ issues/ 20060801/ 1060128-1. pdf Further reading •Eckerson. Most data quality tools offer a series of tools for improving data. Chayka O. etc. Inconsistent data is a problem as well.Data profiling . this data governance function has been established as part of a larger Regulatory Compliance function .a recognition of the importance of Data/Information Quality to organizations. These lists commonly include accuracy. Doctoral dissertation. Austria. B. R. It might be able to manage 'householding'.se/ ~kivanov/ diss-avh. or data migration and conversion projects. Palpanas T. Article (http:/ / web. G. other types of data have few recognized standards. ca/ other/ daf/IRM_Glossary. •Daniel F. bc. decision making and planning" (J. and nearly every other category of data found in the enterprise. led by Professor Richard Wang. The state of completeness.. Glossary of Quality Assurance Terms  4. Companies with an emphasis on marketing often focus their quality efforts on name and address information. 1972). companies often want to build the processes into enterprise applications to keep it clean. H. Juran). cross tabulation. edu/ tdqm/ www/ tdqmpub/ WandWangCACMNov96.pdf) •Fournel Michel. pdf) •Wang. Incorrect data – which includes invalid and outdated information – can originate from different data sources – through data entry. pp.usps. K. (1972) "Quality-control of information: On the concept of accuracy of information in data banks and in management information systems" (http:/ /www. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts.org/main/ glossary. the data are deemed of high quality if they correctly represent the real-world construct to which they refer. Data quality 2 Overview There are a number of theoretical frameworks for understanding data quality. data quality can include developing protocols for research methods. MIT. but slightly different records can be aligned. This article discusses the concept as it related to business data processing. References http:/ / www. directionsmag. éditions Publibook.a way to compare data so that similar. 2007." Communications of the ACM.initially assessing the data to understand its quality challenges 2. validity. (2002) "Information Quality Benchmarks: Product and Service Performance.org/ http:/ / www.” Communications of the ACM. The mainframes used business rules to correct common misspellings and typographical errors in name and address data. International Conference on Information Quality (ICIQ). bounds checking of the data. The University of Stockholm and The Royal Institute of Technology. pdf) •Price. correctness. edu.. consistency. Article (http:// vishnu. edu/ tdqm/ www/ tdqmpub/ IEEEDEApr93. their definitions or measures (Wang et al.mit. There is a movement in the industry today to standardize certain non-address data. pdf) •Redman. html). Corrects data to US and Worldwide postal standards 4. ICIQ).com/ article_sub. Definitions 1. informatik. mit. Casati F. For companies with significant research efforts. completeness and relevance.dmreview. but data quality is recognized as an important property of all types of data. Prato. (2008) "Enabling Better Decisions through Quality-aware Reports". edu/ Documents/ Publications/ TDQMpub/ 2002/ IQ Benchmarks.. and 3) avoiding logistics costs in stocking and shipping parts across a large organization. Vienna. Article (http:// dit. cio. it often can build a 'best of breed' record. htm http:// www. While name and address data has a clear standard as defined by local postal authorities. hanford. 2004).Geocoding . There are several well-known authors and self-styled experts. meaning and use of the data (Price and Shanks.mit. •Kahn. com/ article. S. gov/ move_update/ documents/ tech_guides/ PUB363. for example.6 percent of all U. R. Initially sold as a service. In practice. (2004) A Semiotic Information Quality Framework. A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. and Wang.a business rules engine that ensures that data conforms to quality rules 3. A number of vendors make tools for analysing and repairing poor quality data in situ.g. Data quality: The processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria 6. gov. W. (2004) Data: An Unfolding Quality Disaster Article (http:// www. taking the best components from multiple data sources and building a single super-record. (2002) "Data Warehousing Special Report: Data quality and the bottom line". divorced. A systems-theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality. Large companies saved on postage. a person's age and birth date may conflict within different parts of a database. 184–192. gs1. Software can also auto-correct the variations based on pre-defined business rules. Government of British Columbia  3. apart from these definitions. pdf) Bit rot 4.for name and address data. shtml http:/ / iaidq. D. regardless of fitness for use for any external purpose. Principles of data quality can be applied to supply chain data. data quality is a concern for professionals involved with a wide range of information systems.Monitoring . 2002). Complete. au/ dss2004/ proceedings/ pdf/65_Price_Shanks. Eliminating data shadow systems and centralizing data in a warehouse is one of the initiatives a company can take to ensure data consistency. For example. (1996) “Anchoring Data Quality Dimensions in Ontological Foundations. monash. Software engineers may recognise this as a similar problem to "ilities". gone to prison. Ninth International Conference of Data Engineering. Finally. Article (http:// web. or experienced other life-changing events.. One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang. goals or criteria?). as well as to track customers who had moved. One framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al. married. as data volume increases. The market is going some way to providing data quality assurance. Wang.. modeling and outlier detection. Cappiello C. making supply chain data conform to a certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock. ISO 8000 is the international standard for data quality. even about the same set of data used for the same purpose.pdf http:/ / iaidq. ISBN 978-2748338478. and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science (Ivanov.keeping track of data quality over time and reporting variations in the quality of data. The first views can often be in disagreement.html http:/ / www. In 2002. Furthermore.gov/ dqo/ glossaries/ Glossary_of_Quality_Assurance_Terms1. R. or finding links between husband and wife at the same address.. service providers can clean the data on a contract basis and consultants can advise on fixing processes or systems to avoid data quality problems in the first place. which may include some or all of the following: 1.. Data quality 3 5. data quality moved inside the walls of corporations.php?article_id=509 http:// ribbs. Data Quality refers to the degree of excellence exhibited by the data in relation to the portrayal of the actual scenario. Another framework is based in semiotics to evaluate the quality of the form. Article (http:// www. 2002). although of course other data have various quality issues as well. 1996). C. Kon. Data are of high quality "if they are fit for their intended uses in operations
Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly re...
Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. Furthermore, apart from these definitions, as data volume increases, the question of internal consistency within data becomes paramount, regardless of fitness for use for any external purpose, e.g. a person's age and birth date may conflict within different parts of a database.
The first views can often be in disagreement, even about the same set of data used for the same purpose. This book discusses the concept as it related to business data processing, although of course other data have various quality issues as well.
This book is your ultimate resource for Data Quality. Here you will find the most up-to-date information, analysis, background and everything you need to know.
In easy to read chapters, with extensive references and links to get you to know all there is to know about Data Quality right away, covering: Data quality, Bit rot, Cleansing and Conforming Data, Data auditing, Data cleansing, Data corruption, Data integrity, Data profiling, Data quality assessment, Data quality assurance, Data Quality Firewall, Data truncation, Data validation, Data verification, Database integrity, Database preservation, DataCleaner, Declarative Referential Integrity, Digital continuity, Digital preservation, Dirty data, Entity integrity, Information quality, Link rot, One-for-one checking, Referential integrity, Soft error, Two pass verification, Validation rule, Abstraction (computer science), ADO.NET, ADO.NET data provider, WCF Data Services, Age-Based Content Rating System, Aggregate (Data Warehouse), Data archaeology, Archive site, Association rule learning, Atomicity (database systems), Australian National Data Service, Automated Tiered Storage, Automatic data processing, Automatic data processing equipment, BBC Archives, Bitmap index, British Oceanographic Data Centre, Business intelligence, Business Intelligence Project Planning, Change data capture, Chunked transfer encoding, Client-side persistent data, Clone (database), Cognos Reportnet, Commit (data management), Commitment ordering, The History of Commitment Ordering, Comparison of ADO and ADO.NET, Comparison of OLAP Servers, Comparison of structured storage software, Computer-aided software engineering, Concurrency control, Conference on Innovative Data Systems Research, Consumer Relationship System, Content Engineering, Content format, Content inventory, Content management, Content Migration, Content re-appropriation, Content repository, Control break, Control flow diagram, Copyright, Core Data, Core data integration, Customer data management, DAMA, Dashboard (business), Data, Data access, Data aggregator, Data architect, Data architecture, Data bank, Data binding, Data center, Data classification (data management), Data conditioning, Data custodian, Data deduplication, Data dictionary, Data Domain (corporation), Data exchange, Data extraction, Data field, Data flow diagram, Data governance, Data independence, Data integration, Data library, Data maintenance, Data management, Data management plan, Data mapping, Data migration, Data processing system, Data proliferation, Data recovery, Data Reference Model, Data retention software, Data room, Data security, Data set (IBM mainframe), Data steward, Data storage device, Data Stream Management System, Data Transformation Services, Data Validation and Reconciliation, Data virtualization, Data visualization, Data warehouse, Database administration and automation...and much more.
This book explains in-depth the real drivers and workings of Data Quality. It reduces the risk of your technology, time and resources investment decisions by enabling you to compare your understanding of Data Quality with the objectivity of experienced professionals.