Observations

import pandas as pd
fp = "../data/coffee_prices_index.csv"
df = pd.read_csv(fp)
df.columns = ["date", "cindex"]
df["date"] = pd.to_datetime(df.date)
df["cindex"] = df["cindex"].astype(float).round(3)
%matplotlib inline
import statsmodels.api as sm
  1. Coffee prices for mild arabica do have patterns in price
  2. Cycles of price increases have occured in the past. In the plot below, the first cycle starts around 2016, ends mid 2019, the second one ends in the third quarter of 2023. In general, there is an increasing trend in coffee prices. The pattern definitely has stochastic components as the auto-correlation plot confirms.
import matplotlib.pyplot as plt
plt.plot(df.date,df.cindex)
plt.grid(True)
plt.xlabel("Year")
plt.ylabel("cents per pound")
plt.title ("Price of mild arabica coffee") # from https://fred.stlouisfed.org/series/PCOFFOTMUSDM
#plt.xlabel(df["date"])
sm.graphics.tsa.plot_acf(df.cindex, lags=24);
coffee_prices = pd.Series(df.cindex.values, index=df.date)
coffee_prices

Observations

Post-covid, there has been a stronger trend and larger seasonal variations. We have been experiencing this at the grocery stores and every where else. It looks like pre-covid, though prices did have trend-cycles and seasonality, they were gradual and similar. I am not economist, so I don’t know the answers. The point here is that the right data tools can surface the problems that need analysis. It also provides a basis for determing the right characteristics we need to account for in downstream analysis like forecasting. Building predictive models without rigorous data analysis to document evidence for sources of variation we need to account for is like carpet bombing or driving blind. You are either using too much of computational sophistication, or, if you get a reasonable answer, you are just lucky that you picked a model that had the right features.

from statsmodels.tsa.seasonal import STL

stl = STL(coffee_prices, period=12)
res = stl.fit()
#fig = res.plot()
decomp_res = {"Trend": res._trend, "Seasonality": res._seasonal, "Noise": res._resid}
df_res = pd.DataFrame.from_dict(decomp_res, orient="columns")
df_res = df_res.reset_index()
df_res
# Using plotly.express
import plotly.express as px
fig = px.line(df_res, x='date', y="Trend", title="Trend Cycle Component of Coffee Prices",
             labels = {"Trend": "cents per pound", "date": "date"})# Using plotly.express
fig.show()
fig = px.line(df_res, x='date', y="Seasonality", title="Seasonality Component of Coffee Prices",
             labels = {"Seasonality": "cents per pound", "date": "date"})# Using plotly.express
fig.show()
fig = px.line(df_res, x='date', y="Noise", title="Noise Component of Coffee Prices",
             labels = {"Noise": "cents per pound", "date": "date"})# Using plotly.express
fig.show()

Errors Post Decomposition

from matplotlib import pyplot as plt
res.resid.plot.kde() # looks reasonable
plt.grid(True)