Gradebook
🤖

Gradebook

Date
Oct 5, 2022
Tags
Data Analysis
Python
Jupyter Notebook

Project Hypothesis / Research Question

Can the combination of data from multiple sources using Pandas be effectively used to calculate students' grades accurately? (Pandas library practice)

Data Sources

Progress & Findings

  • Inconsistencies can be found within the data
    • Students’ names are represented differently in each table
    • The email addresses stored have different elements, some emails are not formatted as first.last@univ.edu
    • The data is sorted differently in each table
    • Missing data is present across the different tables
  • Clean and structure the data for analysis
    • Handle missing data
    • Each student’s data is contained within a single observation in the data table, which means that the number of observations is equal to the number of students in the class
    • The features required for the data analysis are homework score, quiz score, exam score, name, and UUID
    • Calculations and the final letter grade is to be stored in separate features
  • Plot the grade distribution
    • The vertical axis shows the density of the grades in a particular bin
    • The peak density occurs in approximately 0.78
    • Through the plot, it can be concluded that the density estimate and normal distribution matched the data well
notion image

Conclusion & Future Work

  • Through this simple exercise, I was able to learn to use Pandas for purposes such as:
    • Loading data
    • Cleaning data
    • Merging data
    • Calculate with DataFrames and Series
    • Map values
    • Plot using Pandas and Matplotlib
 

👋🏻 Let’s chat!

If you are interested in working with me, have a project in mind, or just want to say hello, please don't hesitate to contact me.

Find me here 👇🏻

notion image
Please do not steal my work. It took uncountable cups of coffee and sleepless nights. Thank you.