Back to questions
You are working as a data analyst at Netflix, and you have a DataFrame containing information about movies. The DataFrame has a column called , which stores the date each movie was released. Your task is to count how many movies were released in each decade.
At the end, return a DataFrame with two columns:
Sort the final output by and respectively — both in ascending order.
| Column Name | Description | Type |
|---|---|---|
| title | The title of the movie | object |
| release_date | The release date of the movie | datetime64[ns] |
| genres | The genres of the movie | object |
| budget | The budget of the movie | int64 |
| revenue | The revenue generated by the movie | int64 |
| popularity | The popularity score of the movie | float64 |
| title | release_date | genres | budget | revenue | popularity |
|---|---|---|---|---|---|
| Minions | 2015-06-17 | Family Animation Adventure Comedy | 74000000 | 1156730962 | 875.581305 |
| Interstellar | 2014-11-05 | Adventure Drama Science Fiction | 165000000 | 675120017 | 724.247784 |
| Deadpool | 2016-02-09 | Action Adventure Comedy | 58000000 | 783112979 | 514.5699559999998 |
| Guardians of the Galaxy | 2014-07-30 | Action Science Fiction Adventure | 170000000 | 773328629 | 481.098624 |
| Mad Max: Fury Road | 2015-05-13 | Action Adventure Science Fiction Thriller | 150000000 | 378858340 | 434.278564 |
| Avatar | 2009-12-10 | Action Adventure Fantasy Science Fiction | 237000000 | 2787965087 | 150.437577 |
| Fight Club | 1999-10-15 | Drama | 63000000 | 100853753 | 146.75739099999996 |
| decade | count |
|---|---|
| 1990 | 1 |
| 2000 | 1 |
| 2010 | 5 |
When it comes to deciding which movies belong to which decade, we only care about the year they were released in. So, the first step is to extract the year from the column.
To do this, pandas provides the accessor, which lets us work with datetime objects. We will create a new column called that will store the extracted year for each movie.
Here’s how to do that:
Here is the example input after creating a new column called :
| title | release_date | genres | budget | revenue | popularity | release_year |
|---|---|---|---|---|---|---|
| Minions | 2015-06-17 | Family Animation Adventure Comedy | 74000000 | 1156730962 | 875.581305 | 2015 |
| Interstellar | 2014-11-05 | Adventure Drama Science Fiction | 165000000 | 675120017 | 724.247784 | 2014 |
| Deadpool | 2016-02-09 | Action Adventure Comedy | 58000000 | 783112979 | 514.5699559999998 | 2016 |
| Guardians of the Galaxy | 2014-07-30 | Action Science Fiction Adventure | 170000000 | 773328629 | 481.098624 | 2014 |
| Mad Max: Fury Road | 2015-05-13 | Action Adventure Science Fiction Thriller | 150000000 | 378858340 | 434.278564 | 2015 |
| Avatar | 2009-12-10 | Action Adventure Fantasy Science Fiction | 237000000 | 2787965087 | 150.437577 | 2009 |
| Fight Club | 1999-10-15 | Drama | 63000000 | 100853753 | 146.75739099999996 | 1999 |
Now we have the year, but we want to group the years into decades. For example:
1995 → 1990
2013 → 2010
To achieve this, we can use a math trick:
We divide the year by 10 to remove the last digit (integer division), then multiply it by 10 to get the decade. This way, years like 1991, 1992, …, 1999 will all be reduced to 1990.
We will create a new column called that will contain these decade values.
Here’s how to do that:
Here is the example input after creating a new column called :
| title | release_date | genres | budget | revenue | popularity | release_year | decade |
|---|---|---|---|---|---|---|---|
| Minions | 2015-06-17 | Family Animation Adventure Comedy | 74000000 | 1156730962 | 875.581305 | 2015 | 2010 |
| Interstellar | 2014-11-05 | Adventure Drama Science Fiction | 165000000 | 675120017 | 724.247784 | 2014 | 2010 |
| Deadpool | 2016-02-09 | Action Adventure Comedy | 58000000 | 783112979 | 514.5699559999998 | 2016 | 2010 |
| Guardians of the Galaxy | 2014-07-30 | Action Science Fiction Adventure | 170000000 | 773328629 | 481.098624 | 2014 | 2010 |
| Mad Max: Fury Road | 2015-05-13 | Action Adventure Science Fiction Thriller | 150000000 | 378858340 | 434.278564 | 2015 | 2010 |
| Avatar | 2009-12-10 | Action Adventure Fantasy Science Fiction | 237000000 | 2787965087 | 150.437577 | 2009 | 2000 |
| Fight Club | 1999-10-15 | Drama | 63000000 | 100853753 | 146.75739099999996 | 1999 | 1990 |
Now that we have a column (where years like 1991, 1992, …, 1999 are all reduced to 1990), we can group the DataFrame by this column and count the number of movies in each decade.
We’ll use the method. The method will return a special grouped object (not a regular DataFrame). On this object, we can call the method to get the count of rows (movies) in each group. This count will be returned as a Series, and to turn it back into a DataFrame with named columns, we use and give the count column a name (like ).
If you’re not familiar with , you can learn more here.
Here’s how to do that:
Here is the example input after grouping and counting:
| decade | count |
|---|---|
| 1990 | 1 |
| 2010 | 5 |
| 2000 | 1 |
Finally, we need to sort the result first by the column and then by the column. For this step, we will use Pandas’ method.
When sorting a DataFrame by more than one column, we pass a list of column names to the argument in the order we want to sort the DataFrame.
If you’re not familiar with , you can learn more here.
Here is the full code:
Here is the example input after sorting by and respectively:
| decade | count |
|---|---|
| 1990 | 1 |
| 2000 | 1 |
| 2010 | 5 |
Instead of using math to group years into decades (like in the first approach), we can use the method in pandas. This method allows us to divide numeric data into bins (intervals).
What are bins? Bins are ranges of values — for example, a bin for years 1990–2000 includes all years starting from 1990 up to (but not including) 2000.
Why use ? With , we don’t need to do manual math to calculate the decade — it automatically places each year in the correct bin based on the ranges we define.
If you are not familiar with , you can learn more here.
We first find the minimum and maximum years in the data. Then, we calculate the minimum decade and maximum decade to make sure our bins cover the entire range of decades.
For example:
To create the list, we use , which generates numbers from the first decade to the last decade (inclusive), in steps of 10 years (a decade). For example, will create:
We use these same numbers as labels for the decades, like , , etc. To do this, we take all the bin edges except the last one (because the number of intervals is always one less than the number of bin edges). So, we use:
Now that we have a column with numeric labels, we can count the number of movies in each decade using .
Here’s the Full Code: