Repoprt PCA

An
Industrial Training report

on
Pattern Recognition and their Techniques

(Principal Component Analysis)
in
partial fulfillment
of
Bachelor of Technology
in
Computer Science and Engineering
Submitted To: Submitted By:

Ms. Neha Singh SHYAM PANDEY
(Assistant Professor) (Roll No.1622910083)
Department of Computer Science and Engineering

Vidya College of Engineering
Meerut
Session 2016-2020
Declaration
I hereby declare that the Industrial Training Report“Pattern Recognition

and their Techniques(Principal Component Analysis)” is my work and
effort and that it has not been submitted anywhere for any award.
The text embodied in this report has not been submitted to any other
university or Institute for the award of any degree or diploma.
Date: Sep 10 2019 Student Signature
Shyam Pandey
iii
Abstract
In this Industrial Training I firstly study about Pattern Recognition and their
techniques and then I develop a model of Pattern Recognition using PCA of
machine learning algorithm . I implement “Principal Component Analysis” by
using language PYTHON and various Libraries of Python.
Principal Component Analysis is the technique which is used to explain the
variance-covariance structure of a set of variable through linear combinations. It
is often used as a dimensionality reduction technique.
In Chapter 1, I firstly explain about Pattern Recognition and their Technique
and then about Principal Component Analysis.
In Chapter 2, I presented the history of previous work related with our project.
In Chapter 3, I presented the technology or methodology related with our
project
In chapter 4, I present the other techniques of dimensionality Reduction
And then finally I conclude our project and then discussed about their future
scope.
iv
Acknowledgments
A summer internship is a golden opportunity for learning and self
development .I consider myself very lucky and honored to have so many
wonderful people lead me through in completion of this summer
internship.
I give my special thanks to Dr. Rajiv Chechi (Director VCE) and Upendra
Mittal (DRDO Scientist) without their guidance this internship work was not
able to complete with success. I am indebted for their continuous help and
supports.
I warmly thank all my friends for their loving cooperation, positive criticism,
excellent advice , continuous support and consideration during the preparation
of this project report.
I am thankful to management and administration of Vidya Knowledge Park for
providing necessary resources.
Last but not the least I thank my parents and god almighty.
Shyam Pandey
v
CONTENTS
Abstract iv
Acknowledgement v
List of Table vii
List of Figure viii
Chapter 1: INTRODUCTION
1.1 Pattern Recognition and their Techniques
1.1.1 Structural Method
1.1.2 Features based method
1.2 Principal Component Analysis(PCA)
1.2.1 Algorithm of PCA
1.2.2 Goals of PCA
Chapter 2: LITERATURE REVIEW
Chapter 3: TECHNOLOGY/METHODOLOGY AT A GLANCE
3.1 Technology used
3.1.1 Software Requirement
3.1.2 Hardware Requirement
3.2 Working /Methodology
3.3 Screenshots
Chapter 4: OTHER TECHNIQUE FOR DIMENSIONALITY REDUCTION
Chapter 5: CONCLUSION AND FUTURE SCOPE
REFERENCES
vi
List of Table
S.N. Table Page
No.
1. List of Chemical Data set 15 & 16
vii
List of Figures
S. N. Figure Description Page

No.
1 Fig. 1.1 Statistical Pattern Recognition 10

2 Fig. 3.1 Code of PCA Implementation 16
5 Fig. 3.4 Final output after analysis the chemical data 18
6 Fig. 4.1 Compare with other Technique 19
viii
CHAPTER 1
INTRODUCTION
RECOGNIZING the objects and the surrounding environment is a trivial task
for human beings. But if the point of implementing it artificially came, then it
becomes a very complex task. Pattern Recognition provides the solution to
various problems from speech recognition, face recognition to classification of
handwritten characters and medical diagnosis. The various application areas of
pattern recognition are like bio-informatics, document classification, image
analysis, data mining, industrial automation, biometric recognition, remote
sensing handwritten text analysis, medical diagnosis, speech recognition, GIS
and many more. Similarity between all these applications is that for a solution-
finding approach features have to be extracted and then analyzed for recognition
and classification purpose. Three processes take place in pattern recognition
task. First step is data acquisition. Data acquisition is the process of converting
data from one form (speech, character, pictures etc.) into another form which
should be acceptable to the computing device for further processing. Data
acquisition is generally performed by sensors, digitizing machine and scanners.
Second step is data analysis. After data acquisition the task of analysis begins.
During data analysis step the learning about the data takes place and
information is collected about the different events and pattern classes available
in the data. Third step used for pattern recognition is classification. Its purpose
is to decide the category of new data on the basis of knowledge received from
data analysis process. Data set presented to a Pattern Recognition system is
divided into two sets : training set and testing set.[1]
The performance of the pattern recognition techniques is influenced by mainly
three elements (i) amount of data (ii) technology used(method) (iii) designer and
the user. The challenging job in pattern recognition is to develop systems with
capability of handling massive amounts of data.
Methods or Techniques of Pattern Recognition :
1. Structural Method : Structural methods are useful in situation where
different classes of entity can be distinguished from each other by structural
information .This model is used in the application areas like in textured images,
shape analysis of contours and image interpretation where patterns have a
definite structure.
E. g. In character recognition different letters of the alphabet are structurally
different from each other.[4]
ix
2. Features based Method or Statistical Method : In features based method a
set of measurement is made on each real entity (or pattern) and from the
measurement set there is extracted a se t of features which together characteries
the class of pattern to which the given pattern belongs.
The traditional approach the feature space recognition is the statistical approach.
Features are chosen in such a way that different patterns occupy non-
overlapping feature space.
Fig.1.1 Statistical Pattern Recognition

Many classical statistical pattern recognition techniques such as factor analysis,
principal component analysis, clustering analysis, and multidimensional
scaling techniques have been successfully used.
In various statistical recognition techniques, I study about Principal Component
Analysis and also their implementation.
Principal Component Analysis : Principal component analysis (PCA) is the

most commonly used classification technique, and is also often used to reduce
the sensor data to a form that can be more easily processed by more
sophisticated classification techniques. It has been applied successfully to many
different cases using different sensors and different chemical species.PCA
involves a mathematical procedure that transforms a number of (possibly)
correlated variables into a (smaller) number of uncorrelated variables called
principal components.The objectives of principal component analysis are to
discover or to reduce the dimensionality of the data set, and to identify new,
meaningful underlying variables.[3]
x
Algorithm of PCA :
1. Find the mean vector.
2. Assemble all the data samples in a mean adjusted matrix.
3. Create the covariance matrix.
4. Compute the Eigen vectors and Eigen values.
5. Compute the basis vectors.
6. Represent each sample as a linear combination of basis vectors.
Uses of PCA:
1.It is used to find inter-relation between variables in the data.
2.It is used to interpret and visualize data.
3.As number of variables are decreasing it makes further analysis simpler.
4.It’s often used to visualize genetic distance and relatedness between

populations.
5.These are basically performed on square symmetric matrix. It can be a pure

sums of squares and cross products matrix or Covariance matrix or Correlation
matrix. A correlation matrix is used if the individual variance differs much.
Goals of PCA
The goals of PCA are to-
1. extract the most important information from the data table;
2. compress the size of the data set by keeping only this important information;
3. simplify the description of the data set; and
4. Analyze the structure of the observations and the variables.
5. Compress the data, by reducing the number of dimensions, without much
loss of information.
6. This technique used in image compression
xi
xii
CHAPTER 2
LITERATURE REVIEW
The origins of statistical techniques are often difficult to trace. Preisendorfer
and Mobley (1988) noted that Beltrami (1873) and Jordan (1874) independently
derived the singular value decomposition (SVD) in a form that underlies PCA.
Fisher and Mackenzie (1923) used the SVD in the context of a two-way analysis
of an agricultural trial. However, it is generally accepted that the earliest
descriptions of the technique now known as PCA were given by Pearson (1901)
and Hotelling (1933). Hotelling’s paper is in two parts. The first, most
important, part, together with Pearson’s paper, is among the collection of papers
edited by Bryant and Atchley (1975). The two papers adopted different
approaches, with the standard algebraic derivation given above being close to
that introduced by Hotelling (1933). Pearson (1901), on the other hand, was
concerned with finding lines and planes that best fit a set of points in p-
dimensional regarding computation, given over 50 years before the widespread
availability of computers, are interesting. He states that his methods ‘can be
easily applied to numerical problems,’ and although he says feasible. In the 32
years between Pearson’s and Hotelling’s papers, very little relevant material
seems to have been published, although Rao (1964) indicates that Frisch (1929)
adopted a similar approach to that of Pearson. Also, a footnote in Hotelling
(1933) suggests that Thurstone (1931) was working along similar lines to
Hotelling, but the cited paper, which is also in Bryant and Atchley (1975),
rather than PCA. Hotelling’s approach towards PCA defined it as really rather
different in character from factor analysis. Hotelling’s motivation is that there
may be a smaller ‘fundamental set of independent variables which determine the
values’ of the original p variables. He notes that such variables have been called
‘factors’ in the psychological literature, but introduces the alternative term
‘components’ to avoid confusion with other uses of the word ‘factor’ in
mathematics. Hotelling chooses his ‘components’ so as to maximize their
successive contributions to the total of the variances of the original variables,
and calls the components that are derived in this way the ‘principal
components.’ The analysis that finds such components is then christened the
‘method of principal components.’ Hotelling’s derivation of PCs is similar to
that given above; using Lagrange multipliers and ending up with an
eigenvalue/eigenvector problem, but it differs in three respects. A further paper
by Hotelling (1936) gave an accelerated version of the power method for
finding PCs; in the same year, Girshick (1936) provided some alternative
derivations of PCs, and introduced the idea that sample PCs were maximum
likelihood estimates of underlying population PCs. Girshick (1939)
investigated the asymptotic sampling distributions of the coefficients and
13
variances of PCs, but there appears to have been only a small amount of work
on the development of different applications of PCA during the 25 years
immediately following publication of Hotelling’s paper. Since then, however,
an explosion of new applications and further theoretical developments has
occurred. This expansion reflects the general growth of the statistical literature,
but as PCA requires considerable computing power, the expansion of its use
coincided with the widespread introduction of electronic computers. To this list
of important papers the book by Preisendorfer and Mobley (1988) should be
added. Although it is relatively unknown outside the disciplines of meteorology
and oceanography and is not an easy read, it rivals Rao (1964) in its range of
novel ideas relating to PCA, some of which have yet to be fully explored.[2]
14
CHAPTER 3
TECHNOLOGY / METHODOLOGY AT GLANCE
3.1 TECHNOLOGY USED
- Statistical Method
3.1.1 SOFTWARE REQUIREMENT

Operating System : Window10
Language : Python
Text-Editor : Jupyter Notebook
Distribution : Anaconda
3.1.2 HARDWARE REQUIREMENT
Processor : Intel i3, 2.0Ghz
RAM : 2GB
Hard Disk Drive : 200GB
3.2 WORK ING /METHODOLOGY

For implementation of Principal Component Analysis I use the real time data
i.e. chemical data and having their four property. I used the following steps to
perform this process:
Step1: Firstly standardized the given data,
For standardization used Z score formula ;
Z=Value - Mean / Standard deviation
15
Step 2: Calculate Co-variance of given matrix
The aim of this step is to understand how the variable of the input data set
are varying from mean with respect to each other or in other word to see if there
is any relationship between them. Because sometimes variables are highly
correlated in such a way that they contain redundant information. So in order to
identify these correlations, we compute the co-variance matrix.
Step 3: Compute eigen vector and eigen value
Step 4: Feature Vector Calculation
Step 5: Recast the data along the principal component axis
Finally data set = Feature vector * standardized original data set
List of Chemical Data set :
Name of chemicals Volume Viscosity Frequency Density
Diphenylaminehlorarsine 225.2 29.2 69.2 424.4
Diphenylaminehlorarsine 72.5 57.2 52.7 560
Diphenylaminehlorarsine 385.4 161.6 84.8 857
Diphenylaminehlorarsine 607.7 437.3 175.6 144.3
Diesel 114 29 54.6 31.3
Diesel 46.3 47.6 77.6 34.4
Diesel 31 9 38.6 46
Diesel 254 79.8 73.3 160.3
Alkane 319.2 66 79.6 388.3
Alkane 189.2 26 130.5 204
Alkane 332.7 76.5 159.7 365
Alkane 586 350.4 758.7 1687
Ethanol 684 399.7 461.7 870
Ethanol 809.3 68 475.7 577.7
Ethanol 772 209.3 401.7 715
16
Ethanol 918.3 77.2 494.2 1038.5
Kerosene 753.25 363.7 377 1075.5
Kerosene 474.5 302 297.8 705
Kerosene 552.5 213 249.75 808.5
Kerosene 563.75 266 282 744.5
Tantulum 78.75 100.5 206.5 717.25
Tantulum 174.75 104.75 45.5 620.75
Tantulum 226.25 117.25 53.4 717
Tantulum 279.25 186.5 31.25 732
Table 3.1 Chemical Data set
Screenshot of code and Output :
Fig.3.1 Code of PCA Implementation
17
Fig. 3.2 Code of PCA Implementation
Fig.3.3 Code of PCA Implementation
18
Fig.3.4 Final output after analysis the chemical data
19
CHAPTER 4
Other Technique Of Dimensionality Reduction :
Here I compare with other technique of dimensionality reduction named
LDA(Linear Discriminant Analysis).
Linear Discriminant Analysis is a dimensionality reduction technique used as a
preprocessing step in Machine Learning and pattern classification
applications.The main goal of dimensionality reduction techinques is to reduce
the dimensions by removing the reduntant and dependent features by
transforming the features from higher dimensional space to a space with lower
dimensions.
Fig. 4.1 Compare with other Technique

Linear Discriminant Analysis is a supervised classification technique which
takes labels into consideration.This category of dimensionality reduction is used
in biometrics, bioinformatics and chemistry.
20
CHAPTER 5
CONCLUSION AND FUTURE SCOPE
Although PCA in its standard form is a widely used and adaptive descriptive
data analysis tool. Adaptations of PCA have been proposed, among others, for
binary data, ordinal data, compositional data, discrete data, symbolic data or
data with special structure, such as time series or data sets with common
covariance matrices. PCA or PCA-related approaches have also played an
important direct role in other statistical methods, such as linear regression (with
principal component regression and even simultaneous clustering of both
individuals and variables. Methods such as correspondence analysis, canonical
correlation analysis or linear discriminant analysis may be only loosely
connected to PCA.The literature on PCA is vast and spans many disciplines.
Space constraints mean that it has been explored very superficially here. New
adaptations and methodological results, as well as applications, are still
appearing.
Future Scope of PCA :

It is a wide area of research and there are endless number of possibilities as
pattern recognition is still under research as it is getting more futuristic and
intelligent with a great effect in human life in positive manner.
Since there are endless number of research option and some of them are listed
below.
Computer Vision
Natural language Processing
Game playing
Search Engine
Robotics
Medical Diagnosis
21
REFERENCES
[1] Amin Fazel and Shantnu Chakrabartty, ‟ An Overview of Statistical
Pattern Recognition Techniques for Speaker Verification’’, IEEE circuits
and systems, pp. 61-81, magazine 2nd quarter 2011.
[2] I.T.Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)_MVSa_
[3] Yuehui Sun, Minghui: “DT-CWT Feature Based Classification Using

Orthogonal Neighborhood Preserving Projections for Face Recognition,”
Volume: 1, pp.719-724.Nov.2006
[4] R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification (John Wiley

& Sons 2000).
22

Repoprt PCA

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Repoprt PCA

Hochgeladen von

Copyright:

Verfügbare Formate

An

Industrial Training report

Pattern Recognition and their Techniques

Computer Science and Engineering

Submitted To: Submitted By:

Department of Computer Science and Engineering

I hereby declare that the Industrial Training Report“Pattern Recognition

Date: Sep 10 2019 Student Signature

development .I consider myself very lucky and honored to have so many

wonderful people lead me through in completion of this summer

excellent advice , continuous support and consideration during the preparation

of this project report.

I am thankful to management and administration of Vidya Knowledge Park for

providing necessary resources.

S. N. Figure Description Page

1 Fig. 1.1 Statistical Pattern Recognition 10

Fig.1.1 Statistical Pattern Recognition

Principal Component Analysis : Principal component analysis (PCA) is the

2.It is used to interpret and visualize data.

3.As number of variables are decreasing it makes further analysis simpler.

4.It’s often used to visualize genetic distance and relatedness between

5.These are basically performed on square symmetric matrix. It can be a pure

3.1.1 SOFTWARE REQUIREMENT

3.2 WORK ING /METHODOLOGY

Fig.3.1 Code of PCA Implementation

Fig.3.3 Code of PCA Implementation

Fig. 4.1 Compare with other Technique

Future Scope of PCA :

[3] Yuehui Sun, Minghui: “DT-CWT Feature Based Classification Using

[4] R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification (John Wiley

Das könnte Ihnen auch gefallen