Back to questions
Manufacturing Plant You are given two DataFrames related to a manufacturing company. The first, products, contains information about the products manufactured by the company, and the second, sales, contains information about the sales of these products. Write a function that returns the top 3 selling products in each product category based on the revenue generated, without any gaps in the ranking sequence. The input DataFrames have the following schemas:
| Column Name | Type |
|---|---|
| product_id | integer |
| category | string |
| product_name | string |
| Column Name | Type |
|---|---|
| sale_id | integer |
| product_id | integer |
| quantity | integer |
| revenue | double |
The output DataFrame should have the following schema:
| Column Name | Type |
|---|---|
| category | string |
| product_name | string |
| revenue | integer |
| rank | integer |
| category | product_name | rank | revenue |
|---|---|---|---|
| A | Product1 | 3 | 100 |
| B | Product4 | 2 | 50 |
| C | Product7 | 1 | 150 |
The Pandas solution follows these steps:
First, we aggregate the sales DataFrame by grouping it by product_id and calculating the sum of the revenue column. The result is stored in the sales_agg DataFrame.
Next, we merge the products DataFrame with the sales_agg DataFrame on the product_id column using an inner join. This gives us a new DataFrame, products_with_revenue, which contains information about the products and their corresponding revenues.
We then use the rank() method to create a new column named rank in the products_with_revenue DataFrame. This column contains the dense ranking of the products within their respective categories based on their revenue. We specify the method parameter as "dense" and the ascending parameter as False to achieve this.
Finally, we filter the products_with_revenue DataFrame to keep only the rows with a rank less than or equal to 3. This gives us the top 3 selling products in each product category based on revenue. The filtered DataFrame is returned as the final result after selecting the required columns: "category", "product_name", "revenue", and "rank".