Analyzing Housing Data and Image Compression with K-Means Clustering
🤖

Analyzing Housing Data and Image Compression with K-Means Clustering

Date
Sep 9, 2023
Tags
Python
Data Science
Machine Learning
Jupyter Notebook
sk-learn

Description

In this project for Intro to Machine Learning with Applications class by Dr. Liang, the primary task is to apply k-means clustering to analyze housing data, with a side-task of conducting an image compression. The process involves utilizing k-means clustering, an unsupervised machine learning algorithm, to group similar instances based on certain features.

Framework

  1. Problem definition
  1. Data
  1. Evaluation
  1. Features
  1. Modelling
  1. Experimentation

Project Hypothesis / Research Question

What are the geographical patterns and locations where individuals with high and low incomes predominantly reside, and how can an analysis of housing data contribute to the identification and understanding of these distinctive income-based clusters?

Data Sources

Features

  • longitude
  • latitude
  • housing_median_age
  • total_rooms
  • total_bedrooms
  • population
  • households
  • median_income
  • median_house_value
  • ocean_proximity

Progress & Findings

After inspecting the data, there seems to be missing values in the total_bedrooms feature. The missing values are not handled since total_bedrooms feature is not used in this project. The first figure below displays the histogram of all the numeric features of the data which shows the distribution of each of the features. The figure to the right shows a plot of the population (denoted by the size of a point) and median income (denoted by the color of a point) by the location of the housing. These figures give a general sense of the data in hand.
notion image
notion image
This figure is a dot map that plots the data points in California based on its location. It highlights housing areas in California with higher number of people.
This figure is a dot map that plots the data points in California based on its location. It highlights housing areas in California with higher number of people.
The figures below visualize the clusters of data points based on locations to reveal spatial clusters using K-means clustering.
notion image
notion image
Again, using K-means clustering, the data points were clustered and visualized, revealing six different clusters based on where people live based on their income level.
notion image
The histograms of median_income in each cluster.
The histograms of median_income in each cluster.
The histogram of median_house_value in each cluster.
The histogram of median_house_value in each cluster.

Side-task: Image Compression

Steps

Step-0: read an image from a bmp file
Step-1: prepare the data matrix X
Step-2: perform k-means on data matrix X
Step-3: compress the image using the cluster centers
  1. Make a copy of the the data matrix X, call it Y
  1. Then, modify the data matrix Y, such that every data point is replaced by the corresponding cluster center
  1. Convert the data matrix Y back to an image Ic
Step-4: visualize the compressed the image Ic
Step-5: save the compressed image to a bmp file
notion image

Conclusion

The analysis of housing data using K-means clustering has shown interesting connections between people's incomes and where they live. One surprising finding is that some people with low incomes live in expensive houses, pointing to possible economic inequalities in certain neighborhoods. On the other hand, the identification of specific groups of well-off individuals helps us understand where the higher-income individuals tend to live. This project not only reveals these living patterns but also highlights how looking at housing data can teach us more about the complex relationships between income and living arrangements. It provides a deeper insight into the dynamics of money and living situations in different areas of California.
 

👋🏻 Let’s chat!

If you are interested in working with me, have a project in mind, or just want to say hello, please don't hesitate to contact me.

Find me here 👇🏻

notion image
Please do not steal my work. It took uncountable cups of coffee and sleepless nights. Thank you.