Purpose
The purpose of this project is to conduct exploratory data analysis (EDA) on a movie dataset available on Kaggle and create simple data visualizations
Data Sources
Progress & Findings
Features
- id: The ID of the movie (clear/unique identifier).
- title: The Official Title of the movie.
- tagline: The tagline of the movie.
- release_date: Theatrical Release Date of the movie.
- genres: Genres associated with the movie.
- belongs_to_collection: Gives information on the movie series/franchise the particular film belongs to.
- original_language: The language in which the movie was originally shot in.
- budget_musd: The budget of the movie in million dollars.
- revenue_musd: The total revenue of the movie in million dollars.
- production_companies: Production companies involved with the making of the movie.
- production_countries: Countries where the movie was shot/produced in.
- vote_count: The number of votes by users, as counted by TMDB.
- vote_average: The average rating of the movie.
- popularity: The Popularity Score assigned by TMDB.
- runtime: The runtime of the movie in minutes.
- overview: A brief blurb of the movie.
- spoken_languages: Spoken languages in the film.
- poster_path: The URL of the poster image.
- cast: (Main) Actors appearing in the movie.
- cast_size: number of Actors appearing in the movie.
- director: Director of the movie.
- crew_size: Size of the film crew (incl. director, excl. actors).
- Data import and first inspection
- The project commenced with the import of the movies dataset from the "movies_complete.csv" file. The initial inspection of the data revealed several key features, including movie ID, title, release date, genres, budget, revenue, production details, ratings, and more. The dataset was found to be rich in information, setting the stage for an in-depth analysis.
- The Best and the Worst Movies
- Highest Revenue
- After filtering the dataset, the top movies with the highest revenue are identified. This provided insights into the most financially successful films.
- Highest Budget
- Similar to revenue, the movies with the highest budgets are found, shedding light on the magnitude of investments in filmmaking.
- Highest Profit
- The profit for each movie is calculated and discovered those with the highest profits. This offered a perspective on financial success in the movie industry.
- Return on Investment (ROI)
- By dividing revenue by budget, the ROI for movies with budgets greater than or equal to 10 million dollars are determined. This metric revealed the efficiency of resource utilization.
- Highest Number of Votes and Ratings
- Movies with the most user engagement are identified, measured by vote count and rating, allowing us to understand audience preferences.
- Popularity
- The movies with the highest popularity scores are also pinpointed, as rated by TMDB. This metric reflects the movie's overall appeal.
- Most Common Words in Movie Titles and Taglines
- Text analysis is conducted to identify the most common words in movie titles and taglines. This analysis provided insights into recurring themes and elements in movie marketing.
- Title
- Tagline
- Are Franchises More Successful?
- Franchise Identification
- An extra feature is created to identify whether a movie belongs to a franchise. This step highlights the difference between standalone movies and those part of a collection.
- Franchise vs. Standalone Movies
- Franchises and standalone movies are compared in terms of mean revenue, median ROI, mean budget, mean popularity, and mean rating. This analysis addressed the question of whether franchises tend to be more successful in these aspects.
- Most Successful Franchises
- The most successful movie franchises are identified based on total number of movies, total & mean budget, total & mean revenue, and mean rating. This section highlighted the dominant franchises in the movie industry.
- Most Successful Directors
- The most successful directors are discovered in terms of the total number of movies directed, total revenue generated, and mean rating received. This analysis showcased the impact of directors on movie success.
- Most Successful Actors
- The dataset is analyzed to identify the most successful actors based on their involvement in successful movies.
- Most Popular Genres
- The most popular movie genres are determined by analyzing genre frequencies and trends over the years. This analysis allowed us to understand audience preferences and how they have evolved over time.
Conclusion
In conclusion, the Movie Data Analysis project has progressed systematically through various data analysis tasks, providing valuable insights into the world of movies. The findings from each section offer a comprehensive view of factors contributing to movie success, including financial performance, audience engagement, and the influence of franchises, directors, actors, and genres. These insights can inform business decisions, marketing strategies, and creative direction in the film industry.
Future Work
- Exploratory Data Analysis (EDA): Delve deeper into EDA. Use a wide range of visualization techniques, and consider statistical tests to uncover hidden insights. Explore patterns, trends, and correlations in more detail.
- Data Presentation and Visualization: Enhance the visual appeal and clarity of my data visualizations using advanced tools like Seaborn, Plotly, or Tableau for more interactive and impactful visualizations.