4. Optimizing Marginal Revenue from the Demand Curve, Kaggle

Let’s have fun and model profit optimization using the demand curve. The data is take from Kaggle.

4.1. Load data

Let’s load the data.

[1]:
import pandas as pd
import numpy as np

df = pd.read_csv('./data/retail_price.csv') \
    .assign(
        month_year=lambda d: pd.to_datetime(d['month_year']),
        year=lambda d: d['month_year'].dt.year,
        month=lambda d: d['month_year'].dt.month,
        weekend=lambda d: d['weekday'].apply(lambda x: 1 if x >= 5 else 0),
        text=lambda d: d['product_category_name'].apply(lambda s: ' '.join(s.split('_')))
    )
df.shape
[1]:
(676, 31)

4.2. Product categories

There are 9 product categories. Let’s model 9 demand curves corresponding to each of these product categories.

[2]:
df.rename(columns={'product_category_name': 'category'}) \
    .groupby(['category', 'unit_price']) \
    .size() \
    .reset_index() \
    .groupby(['category']) \
    .size()
[2]:
category
bed_bath_table           26
computers_accessories    48
consoles_games           10
cool_stuff               18
furniture_decor          13
garden_tools             56
health_beauty            33
perfumery                14
watches_gifts            78
dtype: int64

4.3. Demand curves by category and price

Here, we shape the data by grouping records into product category and price.

[3]:
pq_df = df.groupby(['product_category_name', 'unit_price']) \
    .agg(
        q=pd.NamedAgg('qty', 'sum')
    ) \
    .reset_index() \
    .rename(columns={'unit_price': 'p', 'product_category_name': 'category'}) \
    .set_index(['category']) \
    .join(df \
        .rename(columns={'product_category_name': 'category'}) \
        .groupby(['category']) \
        .agg(c=pd.NamedAgg('freight_price', 'mean'))
    )
pq_df.shape
[3]:
(296, 3)

4.4. Optimal prices

The table below shows the optimal price for each product category.

[4]:
from sklearn.linear_model import LinearRegression

def get_mc(category):
    return pq_df[pq_df.index==category].iloc[0].c

def get_Xy(category):
    Xy = pq_df[pq_df.index==category]
    X = Xy[['p']]
    y = Xy['q']

    return X, y

def get_model(X, y):
    m = LinearRegression()
    m.fit(X, y)

    return m

def debug_params(model, mc):
    b_0, b_1 = model.intercept_, model.coef_[0]
    qp_params = f'{b_0:.5f} + {b_1:.5f}p'

    z_0, z_1 = (-b_0 / b_1), (1 / b_1)
    pq_params = f'{z_0:.5f} + {z_1:.5f}q'
    tr_params = f'{z_0:.5f}q + {z_1:.5f}q^2'

    z_0, z_1 = (-b_0 / b_1), 2 * (1 / b_1)
    mr_params = f'{z_0:.5f} + {z_1:.5f}q'

    z_0, z_1 = (-b_0 / b_1), 2 * (1 / b_1)
    qo_params = f'({mc:.5f} - {z_0:.5f}) / {z_1:.5f}'

    return {
        'q': qp_params,
        'p': pq_params,
        'tr': tr_params,
        'mr': mr_params,
        'qo': qo_params
    }

def get_params(model):
    return pd.Series([model.intercept_, model.coef_[0]], ['b_0', 'b_1'])

def get_pq(b_0, b_1):
    return lambda q: (-b_0 / b_1) + (1 / b_1) * q

def get_qp(b_0, b_1):
    return lambda p: b_0 + (b_1 * p)

def get_mr(b_0, b_1):
    return lambda q: (-b_0 / b_1) + (2 * (1 / b_1) * q)

def get_qo(b_0, b_1):
    z_0 = (-b_0 / b_1)
    z_1 = 2 * (1 / b_1)
    return lambda mc: (mc - z_0) / z_1

def get_funcs(b_0, b_1):
    pq = get_pq(b_0, b_1)
    qp = get_qp(b_0, b_1)
    mr = get_mr(b_0, b_1)
    qo = get_qo(b_0, b_1)
    r = lambda p, q: p * q
    t = lambda p, q, mc: (p * q) - (mc * q)

    return pq, qp, mr, qo, r, t

def get_opt(f, mc):
    pq_f, qp_f, mr_f, qo_f, r_f, t_f = f

    q_opt = qo_f(mc)
    p_opt = pq_f(q_opt)
    mr_opt = mr_f(q_opt)
    r_opt = r_f(p_opt, q_opt)
    t_opt = t_f(p_opt, q_opt, mc)

    return pd.Series({
        'mc': mc,
        'p': p_opt,
        'q': q_opt,
        'mr': mr_opt,
        'tr': r_opt,
        'pr': t_opt
    })

def optimize(category):
    X, y = get_Xy(category)
    m = get_model(X, y)
    p = get_params(m)
    f = get_funcs(p.b_0, p.b_1)
    mc = get_mc(category)

    return {**{'category': category}, **get_opt(f, mc).to_dict()}

The category computer_acessories is the outlier here with an optimal price recommended to be a negative price. We will see later that there is an outlier that causes this problem. The columns of the table below are as follows.

  • mc is marginal cost computed by taking the average cost (average freight cost of all the product items for the particular product catgeory and price point)

  • p is the optimal price

  • q is the corresponding optimal quantity

  • mr is the corresponding marginal revenue

  • tr is the corresponding total revenue

  • pr is the corresponding profit

[5]:
opt_df = pd.DataFrame([optimize(c) for c in pq_df.index.unique()])
opt_df
[5]:
category mc p q mr tr pr
0 bed_bath_table 16.139718 413.954170 21.954253 16.139718 9088.054630 8733.719184
1 computers_accessories 25.103741 -27.221815 6.479547 25.103741 -176.385023 -339.045890
2 consoles_games 14.809415 26.068586 30.257496 14.809415 788.770143 340.674335
3 cool_stuff 18.975096 151.269261 24.424237 18.975096 3694.636306 3231.184067
4 furniture_decor 16.944617 222.175809 38.990538 16.944617 8662.754232 8002.074509
5 garden_tools 28.458310 99.268009 36.931818 28.458310 3666.148045 2615.130919
6 health_beauty 18.607448 849.975469 29.792779 18.607448 25323.131451 24768.763874
7 perfumery 14.336311 78.183209 30.772484 14.336311 2405.891570 1964.727677
8 watches_gifts 16.492840 159.459065 21.486980 16.492840 3426.293739 3071.912418

4.5. Visualizing optimal prices

Let’s visualize the total revenue and profit for each product category.

[6]:
import matplotlib.pyplot as plt

def plot_rev_profit_curves(category, min_p=1, max_p=800, ax=None):
    X, y = get_Xy(category)
    m = get_model(X, y)
    p = get_params(m)
    f = get_funcs(p.b_0, p.b_1)
    mc = get_mc(category)
    opt = get_opt(f, mc)

    if ax is None:
        fig, ax = plt.subplots(figsize=(7, 3))
    else:
        fig = None

    pd.DataFrame({
        'p': np.arange(min_p, max_p + 1, 1),

    }).assign(
        q=lambda d: f[1](d['p']),
        tr=lambda d: d['p'] * d['q'],
        mr=lambda d: f[2](d['q']),
        pr=lambda d: f[5](d['p'], d['q'], mc)
    ) \
    .set_index(['p'])[['tr', 'pr']] \
    .plot(kind='line', ax=ax)

    ax.axvline(x=opt.p, color='r', alpha=0.5, linestyle='dotted')
    ax.legend(bbox_to_anchor=(1.1, 1.05))
    ax.set_title(f'{category}, p_opt=${opt.p:.2f}')
    ax.set_ylabel('USD')

    if fig is not None:
        fig.tight_layout()
[7]:
fig, axes = plt.subplots(4, 2, figsize=(10, 7))

categories = [
    'bed_bath_table', 'consoles_games', 'cool_stuff', 'furniture_decor',
    'garden_tools', 'health_beauty', 'perfumery', 'watches_gifts'
]
min_p = [
    300, 1, 50, 50,
    1, 750, 1, 100
]
max_p = [
    550, 50, 300, 400,
    200, 1_000, 150, 250
]

for c, _min_p, _max_p, ax in zip(categories, min_p, max_p, np.ravel(axes)):
    plot_rev_profit_curves(c, min_p=_min_p, max_p=_max_p, ax=ax)

fig.tight_layout()
_images/optimizing-demand-curve-kaggle_13_0.png

4.6. Computer accessories

Now, what’s wrong with the computers_accesories category? Notice that its total revenue and profit curves are convex (and not concave).

[8]:
plot_rev_profit_curves('computers_accessories', min_p=-100, max_p=100)
_images/optimizing-demand-curve-kaggle_15_0.png

Let’s dig a big deeper.

[9]:
category = 'computers_accessories'
X, y = get_Xy(category)
m = get_model(X, y)
p = get_params(m)
f = get_funcs(p.b_0, p.b_1)
mc = get_mc(category)
opt = get_opt(f, mc)
[10]:
opt
[10]:
mc     25.103741
p     -27.221815
q       6.479547
mr     25.103741
tr   -176.385023
pr   -339.045890
dtype: float64

Look at the relationships below. You will notice that the slopes of q and p are NOT negative but positive (that explains the convex curves).

[11]:
debug_params(m, mc)
[11]:
{'q': '9.85046 + 0.12383p',
 'p': '-79.54737 + 8.07550q',
 'tr': '-79.54737q + 8.07550q^2',
 'mr': '-79.54737 + 16.15099q',
 'qo': '(25.10374 - -79.54737) / 16.15099'}

If we visually inspect the data (red dots are scatter plot of true quantity), we notice an outlier quantity around 250. The predicted curve from this data slopes up (positive slope, blue line).

[12]:
fig, ax = plt.subplots(figsize=(7, 3))

_temp = pq_df[pq_df.index==category] \
    .assign(
        q_true=lambda d: d['q'],
        q_pred=lambda d: f[1](d['p'])
    )[['p', 'q_true', 'q_pred']]

_temp.plot(kind='scatter', x='p', y='q_true', label='q_true', ax=ax, color='r')
_temp.plot(kind='line', x='p', y='q_pred', label='q_pred', ax=ax, color='b')

fig.tight_layout()
/opt/anaconda3/lib/python3.9/site-packages/pandas/plotting/_matplotlib/core.py:1114: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  scatter = ax.scatter(
_images/optimizing-demand-curve-kaggle_22_1.png

4.7. Remove outlier

Let’s simply remove the outlier and see what happens.

[13]:
Xy = pq_df[pq_df.index==category].query('q < 252')
X = Xy[['p']]
y = Xy['q']

m = get_model(X, y)
p = get_params(m)
f = get_funcs(p.b_0, p.b_1)
mc = get_mc(category)
opt = get_opt(f, mc)
[14]:
opt
[14]:
mc      25.103741
p      280.092646
q       11.847662
mr      25.103741
tr    3318.442913
pr    3021.022281
dtype: float64

The relationships look better (slopes are negative).

[15]:
debug_params(m, mc)
[15]:
{'q': '24.86173 + -0.04646p',
 'p': '535.08155 + -21.52230q',
 'tr': '535.08155q + -21.52230q^2',
 'mr': '535.08155 + -43.04460q',
 'qo': '(25.10374 - 535.08155) / -43.04460'}

The total revenue and profit curves are now concave.

[16]:
min_p, max_p = 200, 400

fig, ax = plt.subplots(figsize=(7, 3))

pd.DataFrame({
    'p': np.arange(min_p, max_p + 1, 1),

}).assign(
    q=lambda d: f[1](d['p']),
    tr=lambda d: d['p'] * d['q'],
    mr=lambda d: f[2](d['q']),
    pr=lambda d: f[5](d['p'], d['q'], mc)
) \
.set_index(['p'])[['tr', 'pr']] \
.plot(kind='line', ax=ax)

ax.axvline(x=opt.p, color='r', alpha=0.5, linestyle='dotted')
ax.legend(bbox_to_anchor=(1.1, 1.05))
ax.set_title(f'{category}, p_opt=${opt.p:.2f}')
ax.set_ylabel('USD')

fig.tight_layout()
_images/optimizing-demand-curve-kaggle_29_0.png