layout: true <div class="my-footer"><span>https://swampthingecology.org</span></div> <!-- this adds the link footer to all slides, depends on my-footer class in css--> <!-- used https://arm.rbind.io/slides/xaringan.html to help build this presentation ---> <!-- ---> --- name: xaringan-title class: left, middle background-image: url("./resources/DSCN7227.jpg") background-size: cover # Multivariate Statistics Tips and Tricks ### .fancy[<font color="white">Intro to PCA</font>] <!--.large[<font color="white">Paul Julian, PhD | November 12,2020</font>]--> .large[<font color="white">Paul Julian, PhD | April 02, 2020</font>] <!-- this ends up being the title slide since seal = FALSE--> --- class: middle, inverse <img src="./_gifs/thankyou.gif" width="75%" style="display: block; margin: auto;" /> - Thanks to Dr Zarah Pattison and the Modelling, Evidence and Policy Research group for the invitation. - Today's discussion will be loosely based on a recent [blog](https://swampthingecology.org/blog/pca-basics-in-rstats/) post on Principal Component Analysis. .footnote[ [1] https://swampthingecology.org/blog/pca-basics-in-rstats/ ] --- name: intro class: left ### About Me .left-column[ <img src="./resources/6milecypressNorth.jpg" width="110%" style="display: block; margin: auto;" /> ] .right-column[ - Not a statistician (just play one on TV) - Wetland Biogeochemist - PhD in Soil and Water Science (University of Florida) - Likes long walks on the beach...or a wetland... ### .fancy[Find me at...] [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @swampthingpaul](http://twitter.com/swampthingpaul) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg> @swampthingpaul](http://github.com/swampthingpaul) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M131.5 217.5L55.1 100.1c47.6-59.2 119-91.8 192-92.1 42.3-.3 85.5 10.5 124.8 33.2 43.4 25.2 76.4 61.4 97.4 103L264 133.4c-58.1-3.4-113.4 29.3-132.5 84.1zm32.9 38.5c0 46.2 37.4 83.6 83.6 83.6s83.6-37.4 83.6-83.6-37.4-83.6-83.6-83.6-83.6 37.3-83.6 83.6zm314.9-89.2L339.6 174c37.9 44.3 38.5 108.2 6.6 157.2L234.1 503.6c46.5 2.5 94.4-7.7 137.8-32.9 107.4-62 150.9-192 107.4-303.9zM133.7 303.6L40.4 120.1C14.9 159.1 0 205.9 0 256c0 124 90.8 226.7 209.5 244.9l63.7-124.8c-57.6 10.8-113.2-20.8-139.5-72.5z"/></svg> swampthingecology.org](https://swampthingecology.org) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M502.3 190.8c3.9-3.1 9.7-.2 9.7 4.7V400c0 26.5-21.5 48-48 48H48c-26.5 0-48-21.5-48-48V195.6c0-5 5.7-7.8 9.7-4.7 22.4 17.4 52.1 39.5 154.1 113.6 21.1 15.4 56.7 47.8 92.2 47.6 35.7.3 72-32.8 92.3-47.6 102-74.1 131.6-96.3 154-113.7zM256 320c23.2.4 56.6-29.2 73.4-41.4 132.7-96.3 142.8-104.7 173.4-128.7 5.8-4.5 9.2-11.5 9.2-18.9v-19c0-26.5-21.5-48-48-48H48C21.5 64 0 85.5 0 112v19c0 7.4 3.4 14.3 9.2 18.9 30.6 23.9 40.7 32.4 173.4 128.7 16.8 12.2 50.2 41.8 73.4 41.4z"/></svg> pauljulianphd@gmail.com](mailto:pauljulianphd@gmail.com) ] --- name: ordinations class:left ## Ordination Analysis A family of statistical analyses used to order multivariate data. -- Some common analyses include: -- - **Principal Component Analysis (PCA)** -- - Correspondance analysis (CA) and its derivatives - detrended CA - canonical CA -- - Redundancy Analysis (RDA) -- - Non-Metric Multidimensional Scaling (NMDS) -- - Bray-Curtis Ordination <!--https://www.sciencedirect.com/science/article/pii/S0065250408601683---> --- name: PCA class: left, middle ## Principal Component Analysis .pull-left[ *"A rose by any other name ..."* <img src="https://ichef.bbci.co.uk/images/ic/640x360/p07ffzqb.jpg" width="80%" style="display: block; margin: auto;" /> ] -- .pull-right[ I have heard PCA called many names: - ***unsupervised feature extraction*** - ***dimensionality reduction*** - statistical hand waving - mass plotting - magic ] --- name: PCA2 <img src="./_gifs/maths.gif" width="65%" style="display: block; margin: auto;" /> * Rooted in linear algebra, its the simplest of the true eigenvector-based multivariate analyses. -- * Creates weighted linear combination of the original varables to capture as much variance in the dataset whilest eliminating correlations/redundancies. -- * Reveals the internal structure of the data in a way that best explains the variance in the data. --- name: PCA3 class: left ## Principal Component Analysis Typically when talking about PCA you hear terms like *loading*, *eigenvectors* and *eigenvalues*. -- - **Eigenvectors** are unit-scaled loadings. Mathematically, they are the column sum of squared loadings for a factor. It conceptually represents the amount of variance accounted for by a given factor. -- - **Eigenvalues** is the measure of variation in the total sample accounted for by each factor. Computationally, a factor’s eigenvalues are determined as the sum of its squared factor loadings for all the variables. The ratio of eigenvalues is the ratio of explanatory importance of the factors with respect to the variables (remember this for later). -- - **Factor Loadings** is the correlation between the original variables and the factors. Analogous to Pearson’s r, the squared factor loadings is the percent of variance in that variable explained by the factor --- names: PCA4 class: left ## Principal Component Analysis Imagine a multivariate dataset...lets say lake water quality data ``` ## Station.ID LAKE Date.EST Alk Chla SRP TP TN DIN ## 1 A03 East Tohopekaliga 2005-05-17 17 4.0 0.0015 0.024 0.71 0.04 ## 2 A03 East Tohopekaliga 2005-06-21 22 4.7 0.0015 0.024 0.68 0.03 ## 3 A03 East Tohopekaliga 2005-07-19 16 5.1 0.0015 0.020 0.63 0.02 ## 4 A03 East Tohopekaliga 2005-08-16 17 3.0 0.0015 0.021 0.55 0.03 ``` -- - or any other dataset with several different variables -- PCA is a way to reduce the dimensionality of the data and determine what *statistically* matters. -- Its beyond a data winnowing technique it also shows similarity (or difference) between groups and relationships between variables. --- name: PCA5 class: left ## Principal Component Analysis -- ### *Disadvantages* .pull-left[ - PCA is data hungry <img src="./_gifs/cookiemonster.gif" width="75%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Fall victim to the curse of dimensionality. - As dimensionality increases, effectiveness of the data decreases. - As dimensions are added to a data set, the distance between points increases in the multivariate space. <img src="https://pvsmt99345.i.lithium.com/t5/image/serverpage/image-id/57591i4BEAC37E774C8C3B/image-size/large?v=1.0&px=999" width="80%" style="display: block; margin: auto;" /> <!-- https://community.alteryx.com/t5/Data-Science-Blog/Tidying-up-with-PCA-An-Introduction-to-Principal-Components/ba-p/382557 ---> ] --- name: PCA6 class:left ## PCA Assumptions -- - **Multiple Variables:** An obvious assumption, a multivarate analysis needs multiple variables. Meant for continuous variables, but ordinal varables are frequently used. -- - **Sample Adequacy:** Size matters!! A general rule of thumbs has been a minimum of 150 cases (ie rows), or 5 to 10 cases per variable. -- - **Linearity:** It is assumed that the relationships between variables are linearly related. The basis of this assumption is rooted in the fact that PCA is based on Pearson correlation coefficients and therefore the assumptions of Pearson’s correlation also hold true. Generally, this assumption is somewhat relaxed. -- - **Outliers:** Outliers can have a disproportionate influence on the resulting component computation. Since principal components are estimated by essentially re-scaling the data retaining the variance outlier could skew the estimate of each component within a PCA. --- name: PCA7 class: left ## PCA Assumptions .pull-left[ - One more thing about outliers. - Another way to visualize how PCA is performed is that it uses rotation of the original axes to derive a new axes, which maximizes the variance in the data set. ] -- .pull-right[ In 2D this looks like this: <img src="Julian_MEP_PCA_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] --- name: R-time class: left ## R you ready? PCA Analysis can be done through a variety of `R` Packages. Each have there own nuisances... -- - `prcomp()` and `princomp()` are from the bases `stats` package. The quickest, easiest and most stable method -- - `PCA()` in the `FactoMineR` package. - `dubi.pca()` in the `ade4` package. - `acp()` in the `amap` package. - `rda()` in the `vegan` package. More on this later. -- Personally, I only have experience working with `prcomp`, `princomp` and `rda` in the following examples we will be using `rda()` but this can be adapted to the other analyses. - `rda()` performs redundancy analysis. Normally RDA is used for *“constrained ordination”* but without predictors, functionally RDA == PCA --- name: R-time2 class: left * `R` packages used today include - `AnalystHelper` - `reshape` - `vegan` - `REdaS` ```r library(AnalystHelper) library(reshape) library(vegan) library(REdaS) ``` -- * For demonstration purposes we will use the dataset I introduced earlier [`.../data/lake_data.csv`](https://github.com/SwampThingPaul/PCA_Workshop/tree/ba3ca18155abe558e65d32ac2872d0ad27a33fad/data). - `dat<-read.csv(".../data/lake_data.csv")` * If you are playing the home game you can follow along with [`.../PCA_rawcode.R`](https://github.com/SwampThingPaul/PCA_Workshop/blob/ec110772ab7131682c3906d353405a3f38e9c7b1/PCA_rawcode.R). .footnote[ [1] https://github.com/SwampThingPaul/AnalystHelper [2] https://github.com/SwampThingPaul/PCA_Workshop ] ??? - Here we will use a dataset I put together quickly. - If we have time and if anyone is willing to explore their own data with the group. - Or have questions about their data and PCA type analyses. --- name: R-time3 class: left Currently the data is in rows, we need it in columns (data massaging). ```r # Cross tabulate the data based on parameter name dat.xtab <- cast(dat,Station.ID+LAKE+Date.EST~param,value="HalfMDL",mean) # Cleaning up/calculating parameters dat.xtab$TN <- with(dat.xtab,TN_Combine(NOx,TKN,TN)) dat.xtab$DIN <- with(dat.xtab, NOx+NH4) # More cleaning of the dataset vars <- c("Alk","Cl","Chla","DO","pH","SRP","TP","TN","DIN") dat.xtab <- dat.xtab[,c("Station.ID","LAKE","Date.EST",vars)] head(dat.xtab[,c("Station.ID","LAKE",vars)],4L) ``` ``` ## Station.ID LAKE Alk Cl Chla DO pH SRP TP TN DIN ## 1 A03 East Tohopekaliga 17 19.7 4.0 7.9 6.1 0.0015 0.024 0.71 0.04 ## 2 A03 East Tohopekaliga 22 15.4 4.7 6.9 6.4 0.0015 0.024 0.68 0.03 ## 3 A03 East Tohopekaliga 16 15.1 5.1 7.1 NaN 0.0015 0.020 0.63 0.02 ## 4 A03 East Tohopekaliga 17 14.0 3.0 6.9 6.3 0.0015 0.021 0.55 0.03 ``` --- name: R-time4 class: left * `NA` values are a no go in PCA analyses...some more cleaning. * How many `NA`s do we have? ```r #How many rows of data do we have? nrow(dat.xtab) ``` ``` ## [1] 725 ``` ```r #How many rows contain NAs nrow(na.omit(dat.xtab)) ``` ``` ## [1] 515 ``` That is 210 rows removed due to incomplete data. - could narrow the parameters you want to look at to avoid excessive data culling. ```r dat.xtab <- na.omit(dat.xtab) ``` ??? 725 rows of data 515 rows with NAs omitted 210 rows removed due to incomplete data. (could narrow the parameters you want to look at to avoid excessive data culling.) --- name: R-time5 class: left Lets take a quick look at the data... -- <div class="figure" style="text-align: center"> <img src="Julian_MEP_PCA_files/figure-html/unnamed-chunk-12-1.png" alt="Scatterplot of all data for the example `dat.xtab` dataset." /> <p class="caption">Scatterplot of all data for the example `dat.xtab` dataset.</p> </div> --- name: R-time6 class: left * Lets check the measure of sampling adequacy. -- * Some have suggested to perform a sampling adequacy analysis such as Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy. - However, KMO is less a function of sample size adequacy as its a measure of the suitability of the data for factor analysis, which leads to the next point. -- ```r KMOS(dat.xtab[,vars]) ``` ``` ## ## Kaiser-Meyer-Olkin Statistics ## ## Call: KMOS(x = dat.xtab[, vars]) ## ## Measures of Sampling Adequacy (MSA): ## Alk Cl Chla DO pH SRP TP TN ## 0.7274872 0.7238120 0.5096832 0.3118529 0.6392602 0.7777460 0.7524428 0.6106997 ## DIN ## 0.7459682 ## ## KMO-Criterion: 0.6972786 ``` Based on the KMO analysis, the KMO-Criterion of the dataset is 0.7, well above the suggested 0.5 threshold. --- name: R-time7 class: left * Lets do another check of the data using Bartlett's Test of Sphericity. -- - Bartlett's Test of Sphericity `\(\neq\)` Bartlett’s Test for Equality of Variances -- * Test of Sphericity tests whether the data comes from a multivariate normal distribution with zero covariances. - compares an observed correlation matrix to the identity matrix $$ `\begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix}` $$ -- ```r # Bartlett's Test Of Sphericity bart_spher(dat.xtab[,vars]) ``` ``` ## Bartlett's Test of Sphericity ## ## Call: bart_spher(x = dat.xtab[, vars]) ## ## X2 = 4616.865 ## df = 36 ## p-value < 2.22e-16 ``` * The data is significantly different from an identity matrix ($H_0$ : all off-diagonal correlations are zero) and suitable for PCA. --- name: R-time8 class: left, inverse ### Now that the data is cleaned and checked ... -- ### <center> .fancy[... lets do some PCA!! ]</center> <img src="./_gifs/party.gif" width="100%" style="display: block; margin: auto;" /> --- name: R-time9 class: left PCA analysis is pretty straight forward. ```r dat.xtab.pca <- rda(dat.xtab[,vars],scale = T) ``` -- - Using `rda(... scale = T)` without predictors and scaled functionally produces a PCA. Can also compare using `princomp()`. -- Now lets see the importance and variance explained by each component by extracting some important information. * The quickest way is to use `summary(dat.xtab.pca)$cont`. ``` ## $importance ## Importance of components: ## PC1 PC2 PC3 PC4 PC5 PC6 PC7 ## Eigenvalue 4.2963 1.8811 1.3700 0.7110 0.34429 0.18096 0.12522 ## Proportion Explained 0.4774 0.2090 0.1522 0.0790 0.03825 0.02011 0.01391 ## Cumulative Proportion 0.4774 0.6864 0.8386 0.9176 0.95586 0.97597 0.98988 ## PC8 PC9 ## Eigenvalue 0.058289 0.032800 ## Proportion Explained 0.006477 0.003644 ## Cumulative Proportion 0.996356 1.000000 ``` --- name: R-time10 class: left To understand what all this means lets extract the information ourselves. ```r #Extract eigenvalues (see definition above) eig <- dat.xtab.pca$CA$eig # Percent of variance explained by each compoinent variance <- eig*100/sum(eig) # The cumulative variance of each component (should sum to 1) cumvar <- cumsum(variance) # Combine all the data into one data.frame eig.pca <- data.frame(eig = eig, variance = variance,cumvariance = cumvar) ``` -- * To double check, compare `summary(dat.xtab.pca)$cont` with `eig.pca` ... they should be the same. -- * What does the component eigenvalue and percent variance mean? -- * More importantly what does it tell us about our data? -- This information helps tell us how much variance is explained by the components. It also helps identify which components should be used moving forward. --- name: R-time11 class: left <img src="./_gifs/guidelines.gif" width="80%" style="display: block; margin: auto;" /> Generally there are two general rules: -- 1. Pick components with eignvalues of at least 1. - This is called the Kaiser rule. -- 2. The selected components should be able to describe at least 80% of the variance. - If you look at `eig.pca` you'll see that based on these criteria component 1, 2 and 3 are the components to focus on as they are enough to describe the data. -- A scree plot displays these data and shows how much variation each component captures from the data. --- name: R-time12 class: left ## Scree plots <div class="figure" style="text-align: center"> <img src="Julian_MEP_PCA_files/figure-html/unnamed-chunk-20-1.png" alt="Left: Scree plot of eigenvalues for each prinicipal component with the Kaiser threshold identified. Right: Scree plot of the variance and cumulative variance for each priniciple component." /> <p class="caption">Left: Scree plot of eigenvalues for each prinicipal component with the Kaiser threshold identified. Right: Scree plot of the variance and cumulative variance for each priniciple component.</p> </div> ??? `eig.pca$eig` plot the components (left plot) `eig.pca$variance` and `eig.pca$cumvariance` plots the variance (right plot) --- name: R-time13 class: left ## Biplot Now that we know which components are important, lets put together our biplot and extract components. To extract out components and specific loadings we can use the `scores()` function in the `vegan` package. -- - It is a generic function to extract scores from `vegan` oridination objects such as RDA, CCA, etc. - This function also seems to work with `prcomp` and `princomp` PCA functions in `stats` package. -- ```r scrs <- scores(dat.xtab.pca,display=c("sites","species"),choices=c(1,2,3)); ``` -- - `scrs` is a list of two item, species and sites. Species corresponds to the columns of the data and sites correspond to the rows. - Use `choices` to extract the components you want, in this case we want the first three components. Now we can plot the scores. --- name: R-time14 class: left ## Biplot <div class="figure" style="text-align: center"> <img src="Julian_MEP_PCA_files/figure-html/unnamed-chunk-22-1.png" alt="PCA biplot of two component comparisons from the `data.xtab.pca` analysis." /> <p class="caption">PCA biplot of two component comparisons from the `data.xtab.pca` analysis.</p> </div> --- name: R-time15 class: left <div class="figure" style="text-align: center"> <img src="Julian_MEP_PCA_files/figure-html/unnamed-chunk-23-1.png" alt="PCA biplot of two component comparisons from the `data.xtab.pca` analysis with rescaled loadings." /> <p class="caption">PCA biplot of two component comparisons from the `data.xtab.pca` analysis with rescaled loadings.</p> </div> -- Typically when you see a PCA biplot, you also see arrows of each variable. This is commonly called loadings and can interpreted as: * When two vectors are close, forming a small angle, the variables are typically positively correlated. * If two vectors are at an angle 90° they are typically not correlated. * If two vectors are at a large angle say in the vicinity of 180° they are typically negatively correlated. --- name: R-time16 class: left You can take this one even further by showing how each lake falls in the ordination space by joining the `sites` to the original data frame. This is also how you use the derived components for further analysis. -- ```r dat.xtab <- cbind(dat.xtab,scrs$sites) head(dat.xtab,3L) ``` ``` ## Station.ID LAKE Date.EST Alk Cl Chla DO pH SRP TP ## 1 A03 East Tohopekaliga 2005-05-17 17 19.7 4.0 7.9 6.1 0.0015 0.024 ## 2 A03 East Tohopekaliga 2005-06-21 22 15.4 4.7 6.9 6.4 0.0015 0.024 ## 4 A03 East Tohopekaliga 2005-08-16 17 14.0 3.0 6.9 6.3 0.0015 0.021 ## TN DIN PC1 PC2 PC3 ## 1 0.71 0.04 -0.3901117 -0.2240239 -0.5666993 ## 2 0.68 0.03 -0.3912797 -0.2083258 -0.6284024 ## 4 0.55 0.03 -0.4290627 -0.2486860 -0.6599207 ``` --- name: R-time17 class: left <div class="figure" style="text-align: center"> <img src="Julian_MEP_PCA_files/figure-html/unnamed-chunk-25-1.png" alt="PCA biplot of two component comparisons from the `data.xtab.pca` analysis with rescaled loadings and Lakes identified." /> <p class="caption">PCA biplot of two component comparisons from the `data.xtab.pca` analysis with rescaled loadings and Lakes identified.</p> </div> * You can extract a lot of great information from these plots and the underlying component data but immediately we see how the different lakes are group and how differently the lakes are loaded with respect to the different variables. --- name: end-slide class: end-slide, bottom, left, inverse background-image: url("./resources/DSCN8093.jpg") background-size: cover [<svg style="height:0.8em;top:.04em;position:relative;fill:white;" viewBox="0 0 512 512"><path d="M502.3 190.8c3.9-3.1 9.7-.2 9.7 4.7V400c0 26.5-21.5 48-48 48H48c-26.5 0-48-21.5-48-48V195.6c0-5 5.7-7.8 9.7-4.7 22.4 17.4 52.1 39.5 154.1 113.6 21.1 15.4 56.7 47.8 92.2 47.6 35.7.3 72-32.8 92.3-47.6 102-74.1 131.6-96.3 154-113.7zM256 320c23.2.4 56.6-29.2 73.4-41.4 132.7-96.3 142.8-104.7 173.4-128.7 5.8-4.5 9.2-11.5 9.2-18.9v-19c0-26.5-21.5-48-48-48H48C21.5 64 0 85.5 0 112v19c0 7.4 3.4 14.3 9.2 18.9 30.6 23.9 40.7 32.4 173.4 128.7 16.8 12.2 50.2 41.8 73.4 41.4z"/></svg> pauljulianphd@gmail.com](mailto:pauljulianphd@gmail.com) [<svg style="height:0.8em;top:.04em;position:relative;fill:white;" viewBox="0 0 496 512"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg> github.com/SwampThingPaul/PCA_Workshop](http://github.com/SwampThingPaul/PCA_Workshop)