import pandas as pd
import numpy as np
= "../data/daily_ice_cream_sales.csv"
fp = pd.read_csv(fp)
df "Date"] = pd.to_datetime(df["Date"])
df["log_daily_sales"] = np.log(df["ice_cream_purchases"]) df[
Analysis Notebook
The purpose of this notebook is to analyze the daily sales of ice creams at metro areas in the USA. Subsets corresponding to yearly sales are also profiled. Plots of auto-correlation and maximum daily sales per week are also provided.
"Date"].max() df[
"Date"].min() df[
import matplotlib.pyplot as plt
Extract yearly data subsets
= df[df.Date.dt.year == 2020]
df_2020 = df[df.Date.dt.year == 2021]
df_2021 = df[df.Date.dt.year == 2022]
df_2022 = df[df.Date.dt.year == 2023] df_2023
%matplotlib inline
import statsmodels.api as sm
"log_daily_sales"].plot() df_2020[
Plot Autocorrelation
The auto correlation plots are shown below.
"log_daily_sales"], lags=40) sm.graphics.tsa.plot_acf(df_2020[
"log_daily_sales"], lags=40, method="ywm") sm.graphics.tsa.plot_pacf(df_2020[
"log_daily_sales"], lags=40, method="ywm") sm.graphics.tsa.plot_pacf(df[
Observation
It appears that daily sales are not correlated, but independent draws from a distribution. This looks like a white noise process. This is consistent with synthetically generated data. A dickey fuller test with no regression (constant, this is the default) also suggests the same.
from statsmodels.tsa.stattools import adfuller
= adfuller(df["log_daily_sales"])
result print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
df_2020
"log_daily_sales"] = np.log(df["ice_cream_purchases"]) df[
= df.set_index("Date").resample("Y").sum() df_yearly_ice_cream_sales
df_yearly_ice_cream_sales
"ice_cream_purchases"].plot() df[
= "../data/mean_weekly_ice_cream_sales.csv"
fpmeanweekly = pd.read_csv(fpmeanweekly) dfwmics
"ice_cream_purchases"].plot() dfwmics[
"weekno"] = range(1,dfwmics.shape[0] + 1) dfwmics[
dfwmics