Skip to content

Movie Prediction

Exploratory data analysis (EDA) on a movie dataset available on Kaggle

  • python
  • viz
  • da

Last modified:

Link to GitHub

Purpose

Data Sources

Kaggle movie data

Progress & Findings

Features
  • id: The ID of the movie (clear/unique identifier).
  • title: The Official Title of the movie.
  • tagline: The tagline of the movie.
  • release_date: Theatrical Release Date of the movie.
  • genres: Genres associated with the movie.
  • belongs_to_collection: Gives information on the movie series/franchise the particular film belongs to.
  • original_language: The language in which the movie was originally shot in.
  • budget_musd: The budget of the movie in million dollars.
  • revenue_musd: The total revenue of the movie in million dollars.
  • production_companies: Production companies involved with the making of the movie.
  • production_countries: Countries where the movie was shot/produced in.
  • vote_count: The number of votes by users, as counted by TMDB.
  • vote_average: The average rating of the movie.
  • popularity: The Popularity Score assigned by TMDB.
  • runtime: The runtime of the movie in minutes.
  • overview: A brief blurb of the movie.
  • spoken_languages: Spoken languages in the film.
  • poster_path: The URL of the poster image.
  • cast: (Main) Actors appearing in the movie.
  • cast_size: number of Actors appearing in the movie.
  • director: Director of the movie.
  • crew_size: Size of the film crew (incl. director, excl. actors).

Data import and first inspection

The project commenced with the import of the movies dataset from the “movies_complete.csv” file. The initial inspection of the data revealed several key features, including movie ID, title, release date, genres, budget, revenue, production details, ratings, and more. The dataset was found to be rich in information, setting the stage for an in-depth analysis.

output.jpg

The Best and the Worst Movies

Highest Revenue After filtering the dataset, the top movies with the highest revenue are identified. This provided insights into the most financially successful films.

Highest Revenue

Highest Budget Similar to revenue, the movies with the highest budgets are found, shedding light on the magnitude of investments in filmmaking.

highest_budget.jpeg

Highest Profit The profit for each movie is calculated and discovered those with the highest profits. This offered a perspective on financial success in the movie industry.

highest_profit.jpeg

Return on Investment (ROI) By dividing revenue by budget, the ROI for movies with budgets greater than or equal to 10 million dollars are determined. This metric revealed the efficiency of resource utilization.

highest_roi.jpeg

Highest Number of Votes and Ratings Movies with the most user engagement are identified, measured by vote count and rating, allowing us to understand audience preferences.

votes Medium.jpeg

rating Medium.jpeg

Popularity The movies with the highest popularity scores are also pinpointed, as rated by TMDB. This metric reflects the movie’s overall appeal.

popularity Medium.jpeg

Most Common Words in Movie Titles and Taglines

Text analysis is conducted to identify the most common words in movie titles and taglines. This analysis provided insights into recurring themes and elements in movie marketing. Title

title Medium.jpeg

Tagline

tagline Medium.jpeg

Are Franchises More Successful?

Franchise Identification An extra feature is created to identify whether a movie belongs to a franchise. This step highlights the difference between standalone movies and those part of a collection.

Franchise vs. Standalone Movies Franchises and standalone movies are compared in terms of mean revenue, median ROI, mean budget, mean popularity, and mean rating. This analysis addressed the question of whether franchises tend to be more successful in these aspects.

franchise Medium.jpeg

Most Successful Franchises

The most successful movie franchises are identified based on total number of movies, total & mean budget, total & mean revenue, and mean rating. This section highlighted the dominant franchises in the movie industry.

Most Successful Directors

The most successful directors are discovered in terms of the total number of movies directed, total revenue generated, and mean rating received. This analysis showcased the impact of directors on movie success.

directors Medium.jpeg

Most Successful Actors

The dataset is analyzed to identify the most successful actors based on their involvement in successful movies.

actors Medium.jpeg

The most popular movie genres are determined by analyzing genre frequencies and trends over the years. This analysis allowed us to understand audience preferences and how they have evolved over time.

genres Medium.jpeg

Conclusion

In conclusion, the Movie Data Analysis project has progressed systematically through various data analysis tasks, providing valuable insights into the world of movies. The findings from each section offer a comprehensive view of factors contributing to movie success, including financial performance, audience engagement, and the influence of franchises, directors, actors, and genres. These insights can inform business decisions, marketing strategies, and creative direction in the film industry.

Future Work

  • Exploratory Data Analysis (EDA): Delve deeper into EDA. Use a wide range of visualization techniques, and consider statistical tests to uncover hidden insights. Explore patterns, trends, and correlations in more detail.
  • Data Presentation and Visualization: Enhance the visual appeal and clarity of my data visualizations using advanced tools like Seaborn, Plotly, or Tableau for more interactive and impactful visualizations.