Description
This exercise for Intro to Machine Learning with Applications class by Dr. Liang involves the implementation of Gaussian Mixture Model (GMM) and Principal Component Analysis (PCA) on a face image dataset. The primary goal is to build a GMM model for generating new face images efficiently by leveraging PCA to reduce the dimensionality of the input images. The documentation will guide you through the process with detailed explanations.
Data Sources
The face image dataset is used in this task, fetched through
scikit-learn
's fetch_lfw_people
function. The dataset consists of grayscale images with the aim of generating new face images using GMM and PCA.Methodology
Loading and Visualizing the Dataset
The face image dataset is loaded, and basic statistics about the dataset, such as the number of samples (
N
) and input dimensions (M
), are examined. The first 100 images are visualized to provide an overview of the dataset.PCA Dimension Reduction
PCA is applied to the images to reduce the dimensionality from the original pixel space to 120 principal components. The cumulative explained variance ratio is examined to assess the effectiveness of the dimensionality reduction.
GMM Model Selection
Multiple GMM models with varying numbers of components are fitted to the transformed data. The Akaike Information Criterion (AIC) is used to determine the optimal number of components. The curve of AIC against the number of components is analyzed to find the minimum AIC value and the corresponding number of components (
n_components
).GMM Training and Image Generation
The GMM is then fitted to the transformed data with the selected optimal number of components. New samples are generated in the lower-dimensional space, and these samples are transformed back to the original space using the inverse PCA. Finally, the new face images are visualized.
Extended Analysis with More Data
The entire process is repeated with a larger subset of face images, demonstrating the scalability of the approach. The documentation emphasizes code interpretation and understanding.
Conclusion
- PCA Dimension Reduction: PCA effectively reduces the dimensionality of face images, capturing the most significant features.
- GMM Model Selection: AIC is used to select the optimal number of components for the GMM, balancing model complexity and goodness of fit.
- GMM Training and Image Generation: GMM successfully generates new face images in a computationally efficient manner, showcasing the power of combining PCA and GMM.
GMM, while powerful, can be computationally expensive. Leveraging PCA to reduce the dimensionality of input data proves to be an effective strategy for speeding up the training and generation process. This combined approach offers a balance between efficiency and model accuracy.