Back to questions

Movies By Decade Uber PySpark Interview Question

Movies By Decade

Uber PySpark Interview Question

You are working as a data analyst at Netflix, and you have a DataFrame containing information about movies. The DataFrame has a column called , which stores the date each movie was released. Your task is to count how many movies were released in each decade.

At the end, return a DataFrame with two columns:

  • : representing the release decade (like , , , etc.)
  • : the number of movies released during that decade

Sort the final output by and respectively — both in ascending order.

DataFrame:

Column NameDescriptionType
titleThe title of the movieobject
release_dateThe release date of the moviedatetime64[ns]
genresThe genres of the movieobject
budgetThe budget of the movieint64
revenueThe revenue generated by the movieint64
popularityThe popularity score of the moviefloat64

Example Input:

titlerelease_dategenresbudgetrevenuepopularity
Minions2015-06-17Family Animation Adventure Comedy740000001156730962875.581305
Interstellar2014-11-05Adventure Drama Science Fiction165000000675120017724.247784
Deadpool2016-02-09Action Adventure Comedy58000000783112979514.5699559999998
Guardians of the Galaxy2014-07-30Action Science Fiction Adventure170000000773328629481.098624
Mad Max: Fury Road2015-05-13Action Adventure Science Fiction Thriller150000000378858340434.278564
Avatar2009-12-10Action Adventure Fantasy Science Fiction2370000002787965087150.437577
Fight Club1999-10-15Drama63000000100853753146.75739099999996

Example Output:

decadecount
19901
20001
20105

Input

PySpark

Output