Sie sind auf Seite 1von 4

Principal Components Analysis: Basic Ideas

Richard Brereton
31 March 2000

PCA is probably the most widespread multivariate statistical technique, and because of the importance of multivariate measurements in chemometrics, it is regarded by many as the technique that most significantly changed the chemist's view of data analysis

!istory
There are numerous claims to the first use of PCA in the literature. Probably the most famous early paper as by Pearson in 1!01 "1#. $o e%er& the fundamental ideas are based on approaches ell 'no n to physicists and mathematicians for much lon(er& namely those of ei(en%ector analysis. )n fact& some school mathematics syllabuses teach ideas about matrices hich are rele%ant to modern chemistry. An early description of the method in physics as by Cauchy in 1*2! "2#. )t has been claimed that the earliest non+ specific reference to PCA in the chemical literature is 1*,* "3#& althou(h the author of the paper almost certainly did not realise the potential& and as dealin( mainly ith a simple problem of linear calibration. )t is (enerally accepted that the re%olution in the use of multi%ariate methods too' place in psychometrics in the 1!30s and 1!-0s of hich $otellin(.s paper is re(arded as a classic "-#. An e/cellent recent re%ie of the area ith a historical perspecti%e& a%ailable in the chemical literature& has been published by the 0meritus Professor of Psycholo(y from the 1ni%ersity of 2ashin(ton& Paul $orst "3#. Psychometrics is ell understood to most students of psycholo(y and one important area in%ol%es relatin( ans ers in tests to underlyin( factors& for e/ample& %erbal and numerical ability as illustrated in 4i(ure 1. PCA relates a data matri/ consistin( of these ans ers to a number of psycholo(ical 5factors5. )n certain areas of statistics& ideas of factor analysis and PCA are intert ined& but in chemistry both approaches ha%e a different meanin(.

4i(ure 1

6atural scientists of all disciplines 7 biolo(ists& (eolo(ists and chemists 7 ha%e cau(ht on to these approaches o%er the past fe decades. 2ithin the chemical community the first ma8or applications of PCA ere reported in the 1!,0s& and form the foundation of many modern chemometric methods.

"ultivariate data matrices


A 'ey idea is that most chemical measurements are inherently multivariate. This means that more than one measurement can be made on a sin(le sample. An ob%ious e/ample is spectroscopy9 e can record a spectrum at hundreds of a%elen(th on a sin(le sample. Con%entional approaches are univariate in hich only one a%elen(th :or measurement; is used per sample& but this misses much information. Another common area is <uantitati%e structure property acti%ity relationships& in hich many physical measurements are a%ailable on a number of candidate compounds :bond len(ths& dipole moments& bond an(les etc.;& can e predict& statistically& the biolo(ical acti%ity of a compound= Can this assist in pharmaceutical dru( de%elopment= There are se%eral pieces of information a%ailable. PCA is one of se%eral multi%ariate methods that allo s us to e/plore patterns in this data& similar to e/plorin( patterns in psychometric data. 2hich compounds beha%e similarly= 2hich people belon( to a similar (roup= $o can this beha%iour be predicted from a%ailable information= As an e/ample& 4i(ure 2 represents a chromato(ram in hich a number of compounds are detected ith different elution times& at the same time as a their spectra :such as a u% of mass spectrum; are recorded. Coupled chromato(raphy& such as diode array hi(h performance chromato(raphy or li<uid chromato(raphy mass spectrometry& is increasin(ly common in modern laboratories& and represents a rich source of multi%ariate data. The chromato(ram can be re(arded as a data matri/.

4i(ure 2

4i(ure 3

2hat do e ant to find out about the data= $o many compounds are in the chromato(ram ould be useful information. Partially o%erlappin( pea's and minor impurities are the bu(+bears of modern chromato(raphy. 2hat are the spectra of these compounds= 4i(ure 3 :above; represents some embedded pea's. Can e reliably determine these spectra= 4inally& hat are the <uantities of each component= >ome of this information could undoubtedly be obtained by better chromato(raphy& but there is a limit& especially ith modern trends to recordin( more and more data& more and more rapidly. And in many cases the identities and amounts of un'no ns may not be a%ailable in ad%ance. PCA is one tool from multi%ariate statistics that can help sort out these data.

Aims of PCA
The aims of PCA are to determine underlyin( information from multi%ariate ra data. There are t o principle needs in chemistry. )n the case of the e/ample from coupled chromato(raphy e ould li'e to e/tract information from the t o ay chromato(ram.

The number of si(nificant PCs is ideally e<ual to the number of si(nificant components. )f there are three components in the mi/ture& then e e/pect that there are only three PCs. 0ach PC is characterised by t o pieces of information& the scores& hich& in the case of chromato(raphy& relate to the elution profiles and the loadings& hich relate to the spectra. )n the ne/t article e ill loo' in more detail ho this information is obtained. $o e%er& the ultimate information has a physical meanin( to chemists.

The second need is simply to obtain patterns. 4i(ure - represents the result of performin( PCA on a series of chromato(raphic measurements on a number of different compounds usin( ei(ht different commercial columns. The dimensions of the data matri/ are chromato(raphic columns and results of %arious tests :e.(. elution times& pea' idths and pea' asymmetries;& rather than elution times and spectra or people and ans ers to psycholo(ical tests. The aim is to sho hich columns beha%e is a similar fashion. The picture su((ests that the three )nertsil columns beha%e %ery similarly hereas ?romasil C+1* and >upelco AB@A beha%e in a diametrically different manner. This could be important& for e/ample& in the determination of hich columns are best for separatin( basic compounds& hich for amino acids and hich for neutral compounds. The resultant picture is a principal component plot& and later articles ill outline a number of different ays of obtainin( and interpretin( such pictures.

4i(ure -

PCA has a fundamental and important role in many areas of chemometrics. Bater articles ill concentrate on different aspects in detail.
#eferences 1. Pearson ? :1!01;. $n lines and planes of closest fit to systems of points in space Phil. Mag. :C;& 2& 33!+3,2. 2. Cauchy A.B :1*2!;& Oeuvres& I% :2;& 1,2+1,3 3. Adcoc'& R.D. :1*,*; A problem in least squares& The Analyst& && 33+3-. $otellin(& $. :1!33;. Analysis of a comple' of statistical variables into principal components J. Educ. Psychol.& ()& -1,+--1& -!*+320 3. $orst P. :1!!2;& *i'ty years with latent variables and still more to come& Chemometrics and Intelligent Laboratory Systems& +)& 3+21

Das könnte Ihnen auch gefallen