import pandas as pd
= "../data/coffee_prices_index.csv"
fp = pd.read_csv(fp) df
Observations
= ["date", "cindex"]
df.columns "date"] = pd.to_datetime(df.date)
df["cindex"] = df["cindex"].astype(float).round(3) df[
%matplotlib inline
import statsmodels.api as sm
- Coffee prices for mild arabica do have patterns in price
- Cycles of price increases have occured in the past. In the plot below, the first cycle starts around 2016, ends mid 2019, the second one ends in the third quarter of 2023. In general, there is an increasing trend in coffee prices. The pattern definitely has stochastic components as the auto-correlation plot confirms.
import matplotlib.pyplot as plt
plt.plot(df.date,df.cindex)True)
plt.grid("Year")
plt.xlabel("cents per pound")
plt.ylabel("Price of mild arabica coffee") # from https://fred.stlouisfed.org/series/PCOFFOTMUSDM
plt.title (#plt.xlabel(df["date"])
=24); sm.graphics.tsa.plot_acf(df.cindex, lags
= pd.Series(df.cindex.values, index=df.date) coffee_prices
coffee_prices
Observations
Post-covid, there has been a stronger trend and larger seasonal variations. We have been experiencing this at the grocery stores and every where else. It looks like pre-covid, though prices did have trend-cycles and seasonality, they were gradual and similar. I am not economist, so I don’t know the answers. The point here is that the right data tools can surface the problems that need analysis. It also provides a basis for determing the right characteristics we need to account for in downstream analysis like forecasting. Building predictive models without rigorous data analysis to document evidence for sources of variation we need to account for is like carpet bombing or driving blind. You are either using too much of computational sophistication, or, if you get a reasonable answer, you are just lucky that you picked a model that had the right features.
from statsmodels.tsa.seasonal import STL
= STL(coffee_prices, period=12)
stl = stl.fit()
res #fig = res.plot()
= {"Trend": res._trend, "Seasonality": res._seasonal, "Noise": res._resid}
decomp_res = pd.DataFrame.from_dict(decomp_res, orient="columns") df_res
= df_res.reset_index()
df_res df_res
# Using plotly.express
import plotly.express as px
= px.line(df_res, x='date', y="Trend", title="Trend Cycle Component of Coffee Prices",
fig = {"Trend": "cents per pound", "date": "date"})# Using plotly.express
labels fig.show()
= px.line(df_res, x='date', y="Seasonality", title="Seasonality Component of Coffee Prices",
fig = {"Seasonality": "cents per pound", "date": "date"})# Using plotly.express
labels fig.show()
= px.line(df_res, x='date', y="Noise", title="Noise Component of Coffee Prices",
fig = {"Noise": "cents per pound", "date": "date"})# Using plotly.express
labels fig.show()
Errors Post Decomposition
from matplotlib import pyplot as plt
# looks reasonable
res.resid.plot.kde() True) plt.grid(