#TomHanksTop5? A data science take!

With a heavy assist from my good data sci pal Justin Kiggins (@neuromusic), we threw together some data on the Twitter hashtag #TomHanksTop5. The hashtag is pretty simple: what are your top 5 movies with Tom Hanks as an actor?

My personal 5 were:

But how did that stack up to the "objective" top 5 Tom Hanks movies?

Obviously there are no correct answers, but I thought it would be interesting to look at the data behind Tom Hanks' acting filmography.

I mined my initial data from Metacritic. This already presents a problem since Metacritic is 1) owned by CBS, so there's a conflict of interest if Tom Hanks was in a movie supported by CBS Studios or if a CBS Studios film came out the same weekend as a non-CBS Tom Hanks film; 2) Metacritic aggregates their scores using a proprietary algorithm that weights some critics' opinions heavier than others; 3) Not all Tom Hanks films he acted in are listed on Metacritic (The 'Burbs?? Come on, CBS!). Metacritic does however generate a score, 0 - 100, for each film in their database. This is ideally near the average film critic score for any given movie.

Alongside each Metacritic score is a user score. I mined these data as well, to see how well the internet reviewers of Metacritic compared against aggregated film critics.

Lastly, Justin Kiggins mined all the data from the #TomHanksTop5 hashtag (over 1600 tweets; over 8100 rankings) for me and I munged together a script that took each ranking, assigned a score to each ranking, and spit that back out on a 100-point scale just like how Metacritic has their scores for movies.


For the most part, Twitter, Metacritic users, and Metacritic's aggregate score correlated pretty well! Perhaps the the point that sticks out the most is how closely everyone agreed that Toy Story 3 was simply top of the charts. Some larger discrepancies are interesting -- Cloud Atlas was seemingly loved by Metacritic users but critics and apparently Twitter users did not rate is as highly -- mind you, all the films here were ranked in the top 5 at least once. Just because a film is seemingly lower doesn't necessarily indicate it isn't a favorite -- Road to Perdition, Connie and Carla, and The Terminal were all ranked lower by critics but both Twitter and Metacritic users rated them much higher on their respective scales.


I can average together all three streams of data, take the difference of extrema, and divide the difference by 3 to give three bins: Tier 1 (the "best" Tom Hanks films), Tier 2 ("okay" Tom Hanks films), and Tier 3 ("ehh" Tom Hanks films). Something that is shocking to some and not a surprise whatsoever to others is how high the ratings are for all three Toy Story films. Out of the eight "Tom Hanks must-see films," three of the eight are Toy Story.

There is a fun debate about Toy Story and how to consider the series in this ranking. A friend of mine said you should look at all three as one unit -- it's the same character across all three, just in different scenarios. That is not the case for Forrest Gump when compared to Saving Private Ryan, for example. Other friends went a more alternative approach -- voice acting lacks the physical portrayal of a character and should be judged differently. How do you compare Woody talking to Buzz Lightyear against Tom Hanks in Cast Away talking to a Wilson volleyball?

Lastly, as Renee (who runs @BecomingDataSci) mentioned an important factor: are we weighing the best movies Tom Hanks was in or the best acting Tom Hanks portrayed? A movie like Saving Private Ryan or even any of the Toy Story films has such a stellar cast, sometimes Tom Hanks blends with all the other characters to tell simply an amazing story. Other cases, like Charlie Wilson's War, Tom Hanks was a lead actor but the supporting actor (in this case, Philip Seymour Hoffman) was praised for his acting in the film and nominated for supporting actor awards while Tom Hanks was not nominated for acting awards.

For simplicity, here are a few data science rankings of Tom Hanks' best films

If you treat each Toy Story as its own standalone film:

  1. Toy Story 3
  2. Toy Story 2
  3. Saving Private Ryan
  4. Toy Story 1
  5. Everything is Copy

If you treat the Toy Story series as one piece of work:

  1. Toy Story series
  2. Saving Private Ryan
  3. Everything is Copy
  4. Forrest Gump
  5. Road to Perdition

If you take out Toy Story completely, as well as movies Tom Hanks was not pivotal in:

  1. Saving Private Ryan
  2. Forrest Gump
  3. Road to Perdition
  4. Captain Phillips
  5. Big


Visualizations made with gramm

