GMM + PCA on Faces

Description

This exercise for Intro to Machine Learning with Applications class by Dr. Liang involves the implementation of Gaussian Mixture Model (GMM) and Principal Component Analysis (PCA) on a face image dataset. The primary goal is to build a GMM model for generating new face images efficiently by leveraging PCA to reduce the dimensionality of the input images. The documentation will guide you through the process with detailed explanations.

Data Sources

The face image dataset is used in this task, fetched through scikit-learn's fetch_lfw_people function. The dataset consists of grayscale images with the aim of generating new face images using GMM and PCA.

Methodology

Loading and Visualizing the Dataset

The face image dataset is loaded, and basic statistics about the dataset, such as the number of samples (N) and input dimensions (M), are examined. The first 100 images are visualized to provide an overview of the dataset.

Plot the first 100 faces to view the data

PCA Dimension Reduction

PCA is applied to the images to reduce the dimensionality from the original pixel space to 120 principal components. The cumulative explained variance ratio is examined to assess the effectiveness of the dimensionality reduction.

GMM Model Selection

Multiple GMM models with varying numbers of components are fitted to the transformed data. The Akaike Information Criterion (AIC) is used to determine the optimal number of components. The curve of AIC against the number of components is analyzed to find the minimum AIC value and the corresponding number of components (n_components).

GMM Training and Image Generation

The GMM is then fitted to the transformed data with the selected optimal number of components. New samples are generated in the lower-dimensional space, and these samples are transformed back to the original space using the inverse PCA. Finally, the new face images are visualized.

PCA transformation is applied and plot the fitted data (GMM), selecting the best model based on AIC

Plot images back to original space (PCA inverse)

Extended Analysis with More Data

The entire process is repeated with a larger subset of face images, demonstrating the scalability of the approach. The documentation emphasizes code interpretation and understanding.

Using more data, PCA transformation is applied and plot the fitted data (GMM), selecting the best model based on AIC

Plot images back to original space (PCA inverse) after using more data

Conclusion

PCA Dimension Reduction: PCA effectively reduces the dimensionality of face images, capturing the most significant features.

GMM Model Selection: AIC is used to select the optimal number of components for the GMM, balancing model complexity and goodness of fit.

GMM Training and Image Generation: GMM successfully generates new face images in a computationally efficient manner, showcasing the power of combining PCA and GMM.

GMM, while powerful, can be computationally expensive. Leveraging PCA to reduce the dimensionality of input data proves to be an effective strategy for speeding up the training and generation process. This combined approach offers a balance between efficiency and model accuracy.

👋🏻 Let’s chat!

📝

My Resume →

If you are interested in working with me, have a project in mind, or just want to say hello, please don't hesitate to contact me.

Find me here 👇🏻

📨 Email →

💼 LinkedIn →

👩‍💻 Github →

✍️ Blog Posts →

🥺 Tip Me →

Please do not steal my work. It took uncountable cups of coffee and sleepless nights. Thank you.