At my job, I was tasked with learning about PCA. I read a few articles and got somewhat the idea of what PCA was and what it was used for but I thought doing a project would be much more fun!
What is Principal Component Analysis?
Basically, PCA is a Dimensionality-Reduction method used to reduce the dimensions of large data sets by transforming these large data sets into smaller ones that still contains most of the information. With reduction, it is a given we will see the loss of accuracy. The key to a good PCA is to minimize the loss of accuracy in favor of simplifying the data set.
The idea behind doing PCA is that smaller data sets are easier to explore and faster for Machine Learning Algorithms.
What I am going to share with you, is how I performed Principal Component Analysis with the Breast Cancer data set from Sci-Kit-Learn.
Comments