Beruflich Dokumente
Kultur Dokumente
Seminar Report
Submitted in partial fulfillment of the requirements of the degree of Bachelor of Engineering In omputer Engineering By Abhishe! Aggar"al Roll No: 37 Under the Guidance of Mrs# R#G# ME$TA
E0aminers
Guide
A 2N%34EDGEMENT
$ %ould li&e to ac&no%ledge the contribution of certain distinguished people' %ithout %hom support and guidance this seminar %ould not have been concluded" $ ta&e this opportunity to e(press my sincere than&s and deep sense of gratitude to my seminar guide )rs" R" G" )ehta' for her guidance and moral support during the course of preparation of this seminar report *inally' $ %ould li&e to than& my family and friends for their all time support and help in each + every aspect of the course of my seminar preparation" ,bhishe& ,ggar%al
$NTR-.U T$-N -* .,T, )$N$NG //////////////5 .0*$N,T$-N -* .,T, )$N$NG//////////////""5 *-UN.,T$-N -* .,T, )$N$NG//////////////"5 N00. -* .,T, )$N$NG//////////////////"* ,R 1$T0 TUR0 -* .,T, )$N$NG//////////////", R02,T$-N B0T300N .,T, )$N$NG ,N. .,T,3,R0 1-US0/" 6 3-R4$NG -* .,T, )$N$NG/////////////////7 .,T, )$N$NG 5R- 0SS//////////////////8 .,T, )$N$NG T0 1N$6U0S////////////////9 BUS$N0SS -* .,T, )$N$NG/////////////////5+ .,T, )$N$NG ,N BR$NG 5-$NT T- 5-$NT , UR, 7 $N S,20//////////////////////////""5+ B0N0*$TS ,N. ,552$ ,T$-N -* .,T, )$N$NG///////""5* 407 SU 0SS *, T-R -* .,T, )$N$NG 5R-80 TS/////"" 5: .,T, )$N$NG 5R-80 T )0T1-.-2-G7//////////"" 5; .,T, )$N$NG T--2S ,N. T0 1N-2-G7//////////" 5; .,T, )$N$NG: , T1R0,T T- 5R$9, 7///////////""*+ , 5-SS$B20 S 0N,R$- -* .,T, )$N$NG $N *UTUR0/////** -N 2US$-N///////////////////////"*,
ABSTRA T
Data mining, the e(traction of hidden predictive information from large databases , is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. hen implemented on high performance client!server or parallel processing computers, data mining tools can analy"e massive databases to deliver answers to questions such as, # hich clients are most likely to respond to my next promotional mailing, and why$# This report provides an introduction to the basic technologies of data mining. %xamples of profitable applications illustrate its relevance to today&s business environment as well as a basic description of how data warehouse architectures can evolve to deliver the value of data mining to end users.
proactive information delivery" .ata mining is ready for application in the business community because it is supported by three technologies that are no% sufficiently mature:
ommercial databases are gro%ing at unprecedented rates" , recent )0T, Group survey of data %arehouse pro:ects found that !=> of respondents are beyond the ?@ gigabyte level' %hile ?=> e(pect to be there by second <uarter of !==A"$n some industries' such as retail' these numbers can be much larger" The accompanying need for improved computational engines can no% be met in a cost;effective manner %ith parallel multiprocessor computer technology" .ata mining algorithms embody techni<ues that have e(isted for at least !@ years' but have only recently been implemented as mature' reliable' understandable tools that consistently out perform older statistical methods" $n the evolution from business data to business information' each ne% step has built upon the previous one" *or e(ample' dynamic data access is critical for drill;through in data navigation applications' and the ability to store large databases is critical to data mining" The core components of data mining technology have been under development for decades' in research areas such as statistics' artificial intelligence' and machine learning" Today' the maturity of these techni<ues' coupled %ith high;performance relational database engines and broad data integration efforts' ma&e these technologies practical for current data %are housing
To best apply .ata )ining techni<ues' they must be fully integrated %ith a data %arehouse as %ell as fle(ible interactive business analysis tools" )any data mining tools currently operate outside of the %arehouse' re<uiring e(tra steps for e(tracting' importing' and analyBing the data" *urthermore' %hen ne% insights re<uire operational implementation' integration %ith the %arehouse simplifies the application of results from data mining" The resulting analytic data %arehouse can be applied to improve business processes throughout the organiBation' in areas such as promotional campaign management' fraud detection' ne% product rollout' and so on" *igure ! illustrates architecture for advanced analysis in a large data %arehouse"
The ideal starting point is a data %arehouse containing a combination of internal data trac&ing all customer contact coupled %ith e(ternal mar&et data about competitor activity" Bac&ground information on potential customers also provides an e(cellent basis for prospecting" This %arehouse can be implemented in a variety of relational database systems: Sybase' -racle' Redbric&' and so on' and should be optimiBed for fle(ible and fast data access"
,n -2,5 E-n;2ine ,nalytical 5rocessingF server enables a more sophisticated end;user business model to be applied %hen navigating the data %arehouse" The multidimensional structures allo% the user to analyBe the data as they %ant to vie% their business G summariBing by product line' region' and other &ey perspectives of their business" The .ata )ining Server must be integrated %ith the data %arehouse and the -2,5 server to embed R-$;focused business analysis directly into this infrastructure" ,n advanced' process;centric metadata template defines the data mining ob:ectives for specific business issues li&e campaign management' prospecting' and promotion optimiBation" $ntegration %ith the data %arehouse enables operational decisions to be directly implemented and trac&ed" ,s the %arehouse gro%s %ith ne% decisions and results' the organiBation can continually mine the best practices and apply them to future decisions" This design represents a fundamental shift from conventional decision support systems" Rather than simply delivering data to the end user through <uery and reporting soft%are' the ,dvanced ,nalysis Server applies usersH business models directly to the %arehouse and returns a proactive analysis of the most relevant information" These results enhance the metadata in the -2,5 Server by providing a dynamic metadata layer that represents a distilled vie% of the data" Reporting' visualiBation' and other analysis tools can then be applied to plan future actions and confirm the impact of those plans"
.ata %arehouses store the information used in the data mining process" They provide a consolidated source for data from numerous other databases" .ata %arehouses are systems intended for storing massive amounts of data in a central location that allo% the use of access' reporting and analysis tools to interpret the data" Since a data %arehouse has consistent data definitions and includes contents of many databases' it can be used to support decision;ma&ing and planning' as %ell as data mining tools"
randomly go out and mail coupons to the general population ; :ust as you could randomly sail the seas loo&ing for sun&en treasure" $n neither case %ould you achieve the results you desired and of course you have the opportunity to do much better than random ; you could use your business e(perience stored in your database to build a model" ,s the mar&eting director you have access to a lot of information about all of your customers: their age' se(' credit history and long distance calling usage" The good ne%s is that you also have a lot of information about your prospective customers: their age' se(' credit history etc" 7our problem is that you donIt &no% the long distance calling usage of these prospects Esince they are most li&ely no% customers of your competitionF" 7ouId li&e to concentrate on those prospects that have large amounts of long distance usage" 7ou can accomplish this by building a model"
The goal in prospecting is to ma&e some calculated guesses about the information t based on the model that %e build" *or instance' a simple model for a telecommunications company might be: =K> of my customers %ho ma&e more than LA@'@@@Myear spend more than LK@Mmonth on long distance This model could then be applied to the prospect data to try to tell something about the proprietary information that this telecommunications company does not currently have access to" 3ith this model in hand ne% customers can be selectively targeted" Test mar&eting is an e(cellent source of data for this &ind of modeling" )ining the results of a test mar&et representing a broad but relatively small sample of prospects can provide a foundation for identifying good prospects in the overall mar&et" $f someone told you that he had a model that could predict customer usage ho% %ould you &no% if he really had a good modelJ The first thing you might try %ould be to as& him to apply his model to your customer base ; %here you already &ne% the ans%er" 3ith data mining' the best %ay to accomplish this is by setting aside some of your data in a vault to isolate it from the mining process" -nce the mining is complete' the results can be tested against the data held in the vault to confirm the modelHs validity" $f the model %or&s' its observations should hold for the vaulted data"
*rom a process;oriented vie%' there are three classes of data mining activity: discovery, predictive modeling and forensic analysis, as shown in figure below "
.iscovery is the process of loo&ing in a database to find hidden patterns %ithout a predetermined idea or hypothesis about %hat the patterns may be" $n other %ords' the program ta&es the initiative in finding %hat the interesting patterns are' %ithout the user thin&ing of the relevant <uestions first"
$n predictive modeling patterns discovered from the database are used to predict the future" 5redictive modeling thus allo%s the user to submit records %ith some un&no%n field values' and the system %ill guess the un&no%n values based on previous patterns
discovered from the database" 3hile discovery finds patterns in data' predictive modeling applies the patterns to guess values for ne% data items"
*orensic analysis is the process of applying the e(tracted patterns to find anomalous or unusual data elements" To discover the unusual' %e first find %hat is the norm' and then %e detect those items that deviate from the usual %ithin a given threshold" .iscovery helps us find Nusual &no%ledge'N but forensic analysis loo&s for unusual and specific cases"
.ata )ining has three ma:or components lustering or lassification' ,ssociation Rules and Se<uence ,nalysis" lassification The clustering techni<ues analyBe a set of data and generate a set of grouping rules that can be used to classify future data" The mining tool automatically identifies the clusters' by studying the pattern in the training data" -nce the clusters are generated' classification can be used to identify' to %hich particular cluster' an input belongs" *or e(ample' one may classify diseases and provide the symptoms' %hich describe each class or subclass"
Association ,n association rule is a rule that implies certain association relationships among a set of ob:ects in a database" $n this process %e discover a set of association rules at multiple levels of abstraction from the relevant setEsF of data in a database" *or e(ample' one may discover a set of symptoms often occurring together %ith certain &inds of diseases and further study the reasons behind them"
Sequential Anal'sis
$n se<uential ,nalysis' %e see& to discover patterns that occur in se<uence" This deals %ith data that appear in separate transactions Eas opposed to data that appear in the same transaction in the case of associationF e"g" if a shopper buys item , in the first %ee& of the month' and then he buys item B in the second %ee& etc"
Neural Nets and Decision Trees *or any given problem' the nature of the data %ill affect the techni<ues you choose" onse<uently' youIll need a variety of tools and technologies to find the best possible model" lassification models are among the most common' so the more popular %ays for building them have been e(plained here" lassifications typically involve at least one of t%o %or&horse statistical techni<ues ; logistic regression Ea generaliBation of linear regressionF and discriminate analysis" 1o%ever' as data mining becomes more common' neural nets and decision trees are also getting more consideration" ,lthough comple( in their o%n %ay' these methods re<uire less statistical sophistication on the part of the user" Neural nets use many parameters Ethe nodes in the hidden layerF to build a model that ta&es and combines a set of inputs to predict a continuous or categorical variable"
The value from each hidden node is a function of the %eighted sum of the values from all the preceding nodes that feed into it" The process of building a model involves finding
the connection %eights that produce the most accurate results by NtrainingN the neural net %ith data" The most common training method is bac&;propagation' in %hich the output result is compared %ith &no%n correct values" ,fter each comparison' the %eights are ad:usted and a ne% result computed" ,fter enough passes through the training data' the neural net typically becomes a very good predictor" .ecision trees represent a series of rules to lead to a class or value" *or e(ample' you may %ish to classify loan applicants as good or bad credit ris&s" *igure belo% sho%s a simple decision tree that solves this problem" ,rmed %ith this tree and a loan application' a loan officer could determine %hether an applicant is a good or bad credit ris&" ,n individual %ith N$ncome O LP@'@@@N and N1igh .ebtN %ould be classified as a NBad Ris&'N %hereas an individual %ith N$ncome Q LP@'@@@N and N8ob O ? 7earsN %ould be classified as a NGood Ris&"N
.ecision trees have become very popular because they are reasonably accurate and' unli&e neural nets' easy to understand" .ecision trees also ta&e less time to build than neural nets" Neural nets and decision trees can also be used to perform regressions' and some types of neural nets can even perform clustering"
T%o popular types of applications that leverage companiesI investments in data %arehousing are data mining and campaign management soft%are" .ata mining enables companies to identify trends %ithin the data %arehouse Esuch as Nfamilies %ith teenagers are li&ely to have t%o phone lines'N in the case of a telephone companyIs dataF" ampaign management soft%are enables them to leverage these trends via highly targeted and automated direct mar&eting campaigns Esuch as a telemar&eting campaign intended to sell second phone lines to families %ith teenagersF" .ata mining and campaign management have been successfully deployed by hundreds of *ortune !@@@ companies around the %orld' %ith impressive results" But recent advances in technology have enabled companies to couple these technologies more tightly' %ith the follo%ing benefits: increased speed %ith %hich they can plan and e(ecute mar&eting campaignsR increased accuracy and response rates of campaignsR and higher overall mar&eting return on investment" .ata mining automates the detection of patterns in a database and helps mar&eting professionals improve their understanding of customer behavior' and then predict behavior" *or e(ample' a pattern might indicate that married males %ith children are t%ice more li&ely to drive a particular sports car than married males %ith no children" , mar&eting manager for an auto manufacturer might find this some%hat surprising pattern <uite valuable" The data mining process can model virtually any customer activity" The &ey is to find patterns relevant to current business problems" Typical patterns that data mining uncovers include %hich customers are most li&ely to drop a service' %hich are li&ely to purchase merchandise or services' and %hich are most li&ely to respond to a particular offer" The data mining process results in the creation of a model" , model embodies the discovered patterns and can be used to ma&e predictions for records for %hich the true behavior is un&no%n" These predictions' usually called scores' are numerical values that are assigned to each record in the database and indicate the li&elihood that the customer %ill e(hibit a particular behavior" These numerical values are used to select the most appropriate prospects for a targeted mar&eting campaign" ampaign management and data mining' %hen closely integrated' are potent tools" ampaign management soft%are enables companies to deliver to customers and prospects timely' pertinent' and coordinated offers' and also manages and monitors customer communications across all channels" $n addition' it automates and integrates the planning' e(ecution' assessment and refinement of possibly tens to hundreds of highly segmented campaigns running monthly' %ee&ly' daily or intermittently" .ynamic scoring data avoids manual integration of scores %ith the database' and eliminates the need to score an entire database" $nstead' dynamic scoring mar&s only relevant customer subsets and only %hen needed" This shrin&s mar&eting cycle times and assures fresh' up;to;date results" -nce a model is in the campaign management system' the user can start to build mar&eting campaigns based upon it simply by choosing it from a menu of options"
,ny company that is creating or has created a data %arehouse should be considering the use of integrated data mining and campaign management applications' %hich unloc& the data and put it to use" By discovering customer behavior patterns and then acting upon them <uic&ly' companies can stave off competitionR and increase customer retention' cross selling and up;selling' all of %hich ultimately contribute to higher overall revenues"
, company can use its customer data to identify the best prospects for a ne% product or service among its current customers" The attributes of those customers %ho are most li&ely to choose a ne% product can be determined by using a test mailing and analyBing the characteristics of those %ho respond to create a model" The model can then be applied to the full database and those customers %ho fit the profile %ould be included in a targeted mailing campaign" Both this approach and the one above reduce mar&eting costs by focusing on those customers %ho are most profitable to the company" $n customer profiling' characteristics of good customers are identified %ith the goals of predictingR %ho %ill become one and helping mar&eters target ne% prospects" .ata mining can find patterns in a customer database that can be applied to a prospect database so that customer ac<uisition can be appropriately targeted" *or e(ample' by identifying good candidates for mail offers or catalogs direct;mail mar&eters can reduce e(penses and increase their sales" Targeting specific promotions to e(isting and potential customers offers similar benefits" ,nother common use of data mining in many organiBations is to help manage customer relationships" By determining characteristics of customers %ho are li&ely to leave for a competitor' a company can ta&e action to retain that customer because doing so is usually far less e(pensive than ac<uiring a ne% customer" Reducing fraud .ata mining can be used to identify patterns of fraudulent credit card usage' and to find behavior patterns of ris&y customers or those %ho are most li&ely to e(hibit patterns of fraudulent behavior" -ne e(ample of this %ould be to establish credit ris& by loo&ing at the ratio of debt to incomeR another %ould be using data mining to determine if a particular transaction is out of the normal range of a personIs activity and flagging that transaction for verification" *raud detection is of great interest to telecommunications firms' credit;card companies' insurance companies' stoc& e(changes' and government agencies" The aggregate total for fraud losses is enormous" But %ith data mining' these companies can identify potentially fraudulent transactions and contain the damage" Monitoring the Internet $nternet advertising companies' as %ell as other %eb;based organiBations' use Ncoo&iesN to collect data about those vie%ing their %eb sites" These coo&ies are used to create profiles of users in order to better target advertising" The information collected' via the userIs $5 address' determines the userIs geographic location and the sites that the user vie%s" ,dvertisers may also be able to determine the userIs company name' and the type and siBe of the organiBation" This can be combined %ith personal information re<uested on the %eb page' if the user chooses to fill in registration forms or logs on to use the site Mar!et Anal'sis
)ar&et;bas&et analysis helps retailers understand %hich products are purchased together or by an individual over time" 3ith data mining' retailers can determine %hich products to stoc& in %hich stores' and even ho% to place them %ithin a store" .ata mining can also help assess the effectiveness of promotions and coupons" *inancial companies use data mining to determine mar&et and industry characteristics as %ell as predict individual company and stoc& performance" ,nother interesting application is in the medical field: .ata mining can help predict the effectiveness of surgical procedures' diagnostic tests' medications' service management' and process control"
%thers , pharmaceutical company can analyBe its recent sales force activity and their results to improve targeting of high;value physicians and determine %hich mar&eting activities %ill have the greatest impact in the ne(t fe% months" The data needs to include competitor mar&et activity as %ell as information about the local health care systems" The results can be distributed to the sales force via a %ide;area net%or& that enables the representatives to revie% the recommendations from the perspective of the &ey attributes in the decision process" The ongoing' dynamic analysis of the data %arehouse allo%s best practices from throughout the organiBation to be applied in specific sales situations" , credit card company can leverage its vast %arehouse of customer transaction data to identify customers most li&ely to be interested in a ne% credit product" Using a small test mailing' the attributes of customers %ith an affinity for the product can be identified" Recent pro:ects have indicated more than a #@;fold decrease in costs for targeted mailing campaigns over conventional approaches" , diversified transportation company %ith a large direct sales force can apply data mining to identify the best prospects for its services" Using data mining to analyBe its o%n customer e(perience' this company can build a uni<ue segmentation identifying the attributes of high;value prospects" ,pplying this segmentation to a general business database such as those provided by .un + Bradstreet can yield a prioritiBed list of prospects by region" , large consumer pac&age goods company can apply data mining to improve its sales process to retailers" .ata from consumer panels' shipments' and competitor activity can be applied to understand the reasons for brand and store s%itching" Through this analysis' the manufacturer can select promotional strategies that best reach their target customer segments"
0ach of these e(amples has a clear common ground" They leverage the &no%ledge about customers implicit in a data %arehouse to reduce costs and improve the value of customer relationships" These organiBations can no% focus their efforts on the most
important EprofitableF customers and prospects' and design targeted mar&eting strategies to best reach them" To understand it clearly' ta&e as an e(ample the case of a direct mailing campaign from creation to delivery" )ore than one third of the campaignHs success is accounted for by the <uality of the selected target prospects" The remaining t%o thirds is related to the creative Ebrochure' pictures' colors' etc"F and product features"
Generating a good target list can increase response and purchase rates increasing corporate revenue" .ata mining is the ideal %ay to develop selection models" ,s an e(ample thin& of a statistical model based on an ,rtificial Neural Net%or& that has been trained to predict the li&elihood Eor scoreF for a given prospect to purchase the product" 1ere three attributes are used to ma&e the prediction!: the customerHs age' gender' and salary" 5urchase rates %ill increase since only prospects %ith high scores are targeted"
7ou could observe very high positive impacts on the purchase rate in various data mining pro:ects" ,n increase by 7@> as given in the e(ample above is absolutely realistic" 0ven higher values have been measured" To sho% the effect data mining has on your business consider the follo%ing e(ample" 7ou can compute the customer lifetime value E 29F of a given customer portfolio in a R) conte(t" ,s an appro(imation %e assume the %hole lifetime has duration of three years" 3e can start by defining the business scenario before introducing the supporting data mining processes:
3hen you apply data mining processes you can e(pect the follo%ing results: SReduced churn rate: .ata mining attrition models reveal %hich customers are the most profitable" 5rofitable customers can be targeted %ith retention programs for continued loyalty reducing a companyHs churn rate" SIncreased turno-er per customer: 1igher performing cross;selling and ac<uisition of customers is a result of predictive models that target customers %ith the highest li&elihood to purchase a given product" They also help spotting not only li&ely purchasers but also good purchasers Ethose %ho not only buy but buy a lotF" SReduced costs: Better targeting helps reduce mar&eting budgets and increases returns" The ne(t table ma&es some realistic assumptions about increasing retention rate by A"#?>' increasing customer turnover by ?>' and reducing cost by 7>" The resulting gains in the customer lifetime value are significant:
The cost of a data mining pro:ect %ould be easily covered %ith the pro:ected R-$" This sho%s the impact data mining has %hen introduced in a company" $t generates sustained gro%th and substantial return on investment"
1ere you can clearly see the financial impact" ,n increase of purchase rate of 7@> %ith respect to the traditional selection can result in an increase of profit of almost 3@@>
patterns in business behavior" To this end you need a po%erful %or&bench for creating ne% insight" SData qualit'1 The importance of data <uality is often underestimated" , thorough assessment of data <uality is essential for successful data mining solutions" The common rule Cgarbage in G garbage outD is true in such an environment" ,nalytic processes and tools need to be able to handle data <uality issues"
The anadian $nformation 5rocessing Society E $5SF includes in the definition of personal information Nthe individualIs telephone number %hich is generally publicly available' as %ell as sensitive information about UtheV individualIs age' se(' se(ual orientation' medical' criminal' and education history' or financial and %elfare transactions" 5ersonal information %ould also include biometric information' such as blood type' fingerprints and genetic ma&eupN 3a' of collecting personal information onsumers produce data in the process of conducting their daily business: using ban& machines' paying %ith debit and credit cards' using loyalty cards' borro%ing money' %riting che<ues' renting videos and cars' ma&ing phone calls' sending e;mail and bro%sing the 3eb" Businesses encourage the transition from print to electronic transactions by ma&ing tas&s more convenient' and by providing discounts and bonuses in e(change for personal information" These electronic transactions may be collected by large organiBations in their data %arehouse' and may be sub:ect to data mining" The conflict %ith privacy occurs %hen the collection of data ta&es place %ithout the &no%ledge or consent of the individual' or %hen the information is used in %ays that the individual is not a%are of' or %hen the information is disclosed to others %ithout the e(press consent of the individual"
onsumers can do follo"ing things to ensure their pri-ac' is maintained b' ,s&ing to see a businessIs privacy or confidentiality policy" ,ssess it against your e(pectations of ho% you %ant your personal information to be handled" $f the policy does not meet your e(pectations' contact the business and inform it of your e(pectations" $f no
policy e(ists' inform the business that you e(pect respectful and fair handling of your personal information" Giving only the minimum amount of personal information needed to complete a transaction" $f you are in doubt about the relevance of any information that is re<uested' as& about %hy it is needed' and as& that all the uses of the re<uested information be identified" Securit' in Data Mining There are a number of issues relating to data mining that need to be addressed by those %ho collect information" Data Bualit' .ata must be relevant for the purpose for %hich it is to be used" $t should be accurate' complete and up;to;date" *actors that affect this are accuracy of input and the steps ta&en to ensure that data is clean" lean data refers to dataIs age and accuracy" )urpose Specification The reason that data is being collected should be specified %hen the data is collected" Subse<uent use of the data should be limited to those purposes or other purposes that are compatible %ith those specified" @se 4imitation .ata should not be used for purposes other than those specified unless the individual gives consent or the use is re<uired by la%" .ata collected should be relevant to and sufficient for the purpose' but not e(cessive" .ata mining conflicts %ith this by using data collected for one purpose to be used for another' secondary' purpose" , means of re<uesting permission to perform data mining must be developed" %penness The business or organiBation must be open about developments' practices' and policies relating to personal data" $t should be possible for an individual to determine the nature and e(istence of the data held about himMherself' the purpose of the dataIs use' and the identity of the data controller %ho is accountable for the dataIs accuracy" ustomers must be informed that their data is being used for data mining purposes" Indi-idual )articipation
,n individual should have the right to confirm %hether the data controller has information about himMherself and find out %hat that information is" This should be done in a reasonable length of time' at reasonable or no charge' in a reasonable manner' and in a form that is intelligible to the individual" $f the re<uest is denied' the individual should be given the reason' and should be able to challenge the denial" 1eMshe should be able to challenge the data and have it erased' corrected' completed or amended as necessary
%ill li&ely be the ones %ith the foresight to develop a strong relationship %ith the mainstream database industry"
5#5* onclusion
.atabase mar&eting soft%are applications %ill have a tremendous impact on ho% business is done in the future" ,lthough the core data mining technology is here today' developers need to ta&e %hat already e(ists and turn it into something that business users can %or& %ith" The successful database mar&eting applications %ill combine data mining technology %ith a thorough understanding of business problems and present the results in a %ay that the user can understand" ,t that point the &no%ledge contained in a database %ill be understood by people %ho can turn %hat is &no%n into %hat can be done" omprehensive data %arehouses that integrate operational data %ith customer' supplier' and mar&et information have resulted in an e(plosion of information" ompetition re<uires timely and sophisticated analysis on an integrated vie% of the data" 1o%ever' there is a gro%ing gap bet%een more po%erful storage and retrieval systems and the usersH ability to effectively analyBe and act on the information they contain" Both relational and -2,5 technologies have tremendous capabilities for navigating massive data %arehouses' but brute force navigation of data is not enough" , ne% technological leap is needed to structure and prioritiBe information for specific end;user problems" The data mining tools can ma&e this leap" 6uantifiable business benefits have been proven through the integration of data mining %ith current information systems' and ne% products are on the horiBon that %ill bring this integration to an even %ider audience of users" .ata mining offers great promise in helping organiBations uncover hidden patterns in their data" 1o%ever' users %ho understand the business' the data' and the general nature of the analytical methods involved must guide data mining tools" Realistic e(pectations can yield re%arding results across a %ide range of applications' from improving revenues to reducing costs" Building models is only one step in &no%ledge discovery" $tIs vital to collect and prepare the data properly and to chec& models against the real %orld" The NbestN model is often found after building models of several different types and by trying out various technologies or algorithms" The data mining area is still relatively young' and tools that support the %hole of the data mining process in an easy to use fashion are rare" 1o%ever' one of the most important issues facing researchers is the use of techni<ues
against very large data sets" ,ll the mining techni<ues are based on ,rtificial $ntelligence' %here they are generally e(ecuted against small sets of data' %hich can fit in memory" 1o%ever' in data mining applications these techni<ues must be applied to data held in very large databases" These include use of parallelism and development of ne% database oriented techni<ues" 1o%ever' much %or& is re<uired before data mining can be successfully applied to large data sets" -nly then %ill the true potential of data mining be able to be realiBed""