Introduction
This problem is a typical econometric modeling situation. The Phillips Curve says that inflation and unemployment move
in opposite directions. It was the basis for the Fed fighting
unemployment by allowing a bit of inflation.
Actually in recent years especially since 2008 this theory is
somewhat discredited, though the Trump administration seems to believe
it. Few economists believe the link is that close anymore.
The Problem
The Phillips curve and modified Phillips curve relating inflation and
unemployment are presented in Example 6.6 in the book.
I’d read that first. Now we can examine Example 6.7, which presents some
data to test for the Phillips curve.
We quote from the book:
As an illustration of the modified Phillips curve, we present in
Table 6.5 data on inflation as measured by year-to-year percentage in
the Consumer Price Index (CPI) and the unemployment rate for the period
1960–2006. The unemployment rate represents the civilian unemployment
rate. From these data we obtained the change in the inflation rate \((\pi_t − \pi_{t−1})\) and plotted it
against the civilian unemployment rate; we are using the CPI as a
measure of inflation. The resulting graph appears in Figure 6.9.
As expected, the relation between the change in inflation rate and
the unemployment rate is negative—– a low unemployment rate leads to an
increase in the inflation rate and therefore an acceleration of the
price level, hence the name accelerationist Phillips curve. Looking at
Figure 6.9, it is not obvious whether a linear (straight line)
regression model or a reciprocal model fits the data; there may be a
curvilinear relationship between the two variables. We present below
regressions based on both the models. However, keep in mind that for the
reciprocal model the intercept term is expected to be negative and the
slope positive, as noted in footnote 20.
Linear model:
\[(\pi_t − \pi_{t−1}) = 3.7844 − 0.6385
UN_t \qquad\qquad(6.7.5)\]
\[t = (4.1912) (−4.2756) \qquad R^2 =
0.2935\]
Reciprocal model:
\[(\pi_t − pi_{t−1}) = −3.0684 + 17.2077
(1/UN_t) \qquad\qquad(6.7.6)\]
\[t = (−3.1635) (3.2886) \qquad R^2 =
0.1973\]
All the estimated coefficients in both the models are individually
statistically significant, all the p values being lower than the 0.005
level.
We are supposed to use the data in Table 6.5 to create these
regressions and determine their quality.
Packages
We load the standard packages for the MBAD637 course.
library(tidyverse) # used to manipulate data and plot
library(ggfortify) # some extra ggplot functions
library(GGally) # pairs ggplot and other options
library(skimr) # nice summaries
library(broom) # neat model summary
library(gt) # nice displays of printed data
library(patchwork) # arranging graphs
library(car) # useful tools for regression analysis
library(yardstick) # for model performance metrics
library(ggResidpanel) # residual plots
library(mbadtools) # custom packages for this course; install from github
Data
We use the Import Datasets button to import from Table 6.5,
being sure to copy the code generated for the import.
library(readxl)
Table_6_5 <- read_excel("~/stfrancis/MBAD637 Business Forecasting and Econometrics/GP Data Sets/Table 6_5.xls",
skip = 2)
New names:
View(Table_6_5)
Looking at the View()
tab for the file, we see that the
data are actually in two halves, side by side. We have to put the right
half below the left half. The column names are a bit funky too.
Examine Data
Let’s do some basic examination of the data.
Table_6_5
glimpse(Table_6_5)
Rows: 24
Columns: 7
$ Year...1 <chr> "1960", "1961", "1962", "1963", "1964", "1965", "1966…
$ INFLRATE...2 <dbl> 1.718213, 1.013514, 1.003344, 1.324503, 1.307190, 1.6…
$ UNRATE...3 <dbl> 5.5, 6.7, 5.5, 5.7, 5.2, 4.5, 3.8, 3.8, 3.6, 3.5, 4.9…
$ ...4 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ Year...5 <chr> "1984", "1985", "1986", "1987", "1988", "1989", "1990…
$ INFLRATE...6 <dbl> 4.317269, 3.561116, 1.858736, 3.649635, 4.137324, 4.8…
$ UNRATE...7 <dbl> 7.5, 7.2, 7.0, 6.2, 5.5, 5.3, 5.6, 6.8, 7.5, 6.9, 6.1…
By looking at the end of the file we see that in the right half there
is a row of NAs at the end. Since there are 24 rows in this data frame,
there must be 47 actual years of data. So there is some wrangling to do
to make a new data frame just as we like it.
Data Wrangling
We will make a new data frame phillips
by moving in
parts of the data from Table_6_5
.
We will make a temporary data frame phillips_1
with the
left half of the data and phillips_2
with the right half.
Because of the weird column names, we will also fix those as we go. And
we will make Year a numeric variable.
phillips_1 = Table_6_5 %>%
select(1:3)
names(phillips_1) = c("Year", "INFLRATE", "UNRATE")
glimpse(phillips_1)
Rows: 24
Columns: 3
$ Year <chr> "1960", "1961", "1962", "1963", "1964", "1965", "1966", "…
$ INFLRATE <dbl> 1.718213, 1.013514, 1.003344, 1.324503, 1.307190, 1.61290…
$ UNRATE <dbl> 5.5, 6.7, 5.5, 5.7, 5.2, 4.5, 3.8, 3.8, 3.6, 3.5, 4.9, 5.…
phillips_2 = Table_6_5 %>%
select(5:7)
names(phillips_2)=c("Year", "INFLRATE", "UNRATE")
glimpse(phillips_2)
Rows: 24
Columns: 3
$ Year <chr> "1984", "1985", "1986", "1987", "1988", "1989", "1990", "…
$ INFLRATE <dbl> 4.317269, 3.561116, 1.858736, 3.649635, 4.137324, 4.81825…
$ UNRATE <dbl> 7.5, 7.2, 7.0, 6.2, 5.5, 5.3, 5.6, 6.8, 7.5, 6.9, 6.1, 5.…
This looks good.
Now we make our desired data frame by putting phillips_1
on top of phillips_2
, using tidyverse
tools.
Then we remove any NAs. And we make Year a numeric column.
phillips = bind_rows(phillips_1, phillips_2) %>%
na.omit %>%
mutate(Year=as.numeric(Year))
glimpse(phillips)
Rows: 47
Columns: 3
$ Year <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 196…
$ INFLRATE <dbl> 1.718213, 1.013514, 1.003344, 1.324503, 1.307190, 1.61290…
$ UNRATE <dbl> 5.5, 6.7, 5.5, 5.7, 5.2, 4.5, 3.8, 3.8, 3.6, 3.5, 4.9, 5.…
I think we’ve got it!!
Now the research problem specifies that the models have as target not
INFLRATE, but the change in inflation rate from the previous year. Let’s
add a column to phillips
that calculates the change in the
INFLRATE
column over the past year. Observe we need to put
an NA at the start of the column, because there isn’t anything to take a
difference from.
phillips = phillips %>%
mutate(INFLCHANGE=c(NA,diff(phillips$INFLRATE)) )
glimpse(phillips)
Rows: 47
Columns: 4
$ Year <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1…
$ INFLRATE <dbl> 1.718213, 1.013514, 1.003344, 1.324503, 1.307190, 1.612…
$ UNRATE <dbl> 5.5, 6.7, 5.5, 5.7, 5.2, 4.5, 3.8, 3.8, 3.6, 3.5, 4.9, …
$ INFLCHANGE <dbl> NA, -0.70469954, -0.01016903, 0.32115883, -0.01731377, …
Now we have the difference each year from the last. We can check that
by trying the first few:
d=c()
for (i in 1:6) d[i] = phillips$INFLRATE[i+1] - phillips$INFLRATE[i]
d
[1] -0.70469954 -0.01016903 0.32115883 -0.01731377 0.30571368 1.24423963
The differences match what are in the INFLCHANGE column.
Data Description
Looking at the data we have
phillips %>% glimpse
Rows: 47
Columns: 4
$ Year <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1…
$ INFLRATE <dbl> 1.718213, 1.013514, 1.003344, 1.324503, 1.307190, 1.612…
$ UNRATE <dbl> 5.5, 6.7, 5.5, 5.7, 5.2, 4.5, 3.8, 3.8, 3.6, 3.5, 4.9, …
$ INFLCHANGE <dbl> NA, -0.70469954, -0.01016903, 0.32115883, -0.01731377, …
phillips %>% skim
── Data Summary ────────────────────────
Values
Name Piped data
Number of rows 47
Number of columns 4
_______________________
Column type frequency:
numeric 4
________________________
Group variables None
The only NA is the first INFLCHANGE observation, since we have
nothing to subtract from it.
We could throw in an estimate for the first entry, like the mean or
median of the differences, or we could just leave it and see if it
messes us up and fix it then. Hint: in R, it won’t matter for linear
models.
Visualizing Data
A pairs plot is a quick way to see relations between the columns.
phillips %>% ggpairs(progress = FALSE)

Is there a relation between INFLCHANGE
and
UNRATE
? It’s kind of there, and downward, as anticipated.
The correlation is around -0.54.
Because the research question poses the Linear and Reciprocal, let’s
graph the linear model INFLCHANGE~UNRATE
and the Reciprocal
model INFLCHANGE~I(1/UNRATE)
. The reciprocal formula has an
I()
function around it because we want R to interpret the
formula identically as it is written. That function stands for
Identity.
We will put the plots in a patchwork
object which shows
them side by side in two columns. The left one is Figure 6.9 in the
text.
g_linear = ggplot(phillips) +
geom_point(aes(UNRATE,INFLCHANGE), na.rm = T)
g_recip = ggplot(phillips) +
geom_point(aes(I(1/UNRATE),INFLCHANGE), na.rm=T)
# basic patchwork
g_linear + g_recip

Advanced patchwork
, and advanced
GGally::ggmatrix
methods:
list(g_linear, g_recip) %>%
wrap_plots(nrow = 1) +
plot_annotation(title = "Figure 6.9",
tag_levels=list(c("Linear","Reciprocal")))

list(g_linear, g_recip) %>%
ggmatrix(ncol=2, nrow=1,
title="Figure 6.9",
ylab="Inflation Change (%)",
xAxisLabels = c("Linear", "Reciprocal"),
showXAxisPlotLabels = TRUE
)

We can see some slight linearity in each model. The relation is
positive for 1/UNRATE, and negative for UNRATE itself.
Modeling
We estimate the Linear and Reciprocal models.
Linear Model
First we estimate the Linear Model.
fit.lin = lm(INFLCHANGE~UNRATE, data=phillips)
fit.lin %>% glance()
fit.lin %>% tidy(conf.int=T) %>% gt_add_significance()
term |
estimate |
std.error |
statistic |
p.value |
conf.low |
conf.high |
(Intercept) |
3.7844 |
0.9029 |
4.1912 |
0.0001 |
1.9647 |
5.6042 |
UNRATE |
−0.6385 |
0.1493 |
−4.2756 |
0.0001 |
−0.9394 |
−0.3375 |
The coefficient of UNRATE
in the Linear Model is -0.63,
and it is significant at better than the .005 level. The R-squared of
the Linear model is only 0.29, however, which leads us to wonder if this
regression is useful. However the F-statistic test tells us the
regression is meaningful (naturally, since there is only one driver in
the regression, and the coefficient estimate is probably not zero).
anova(fit.lin) %>% tidy %>% gt_add_significance()
term |
df |
sumsq |
meansq |
statistic |
p.value |
UNRATE |
1.0000 |
38.1475 |
38.1475 |
18.2805 |
0.0001 |
Residuals |
44.0000 |
91.8187 |
2.0868 |
NA |
NA |
The anova()
table tells us the same thing.
We are obliged to look at the residual distribution to assess any
regression. I like using ggResidpanel::resid_panel
.
Messages are turned off due to the smoother spitting out stuff.
resid_panel(fit.lin, plots=c("resid","qq","ls","hist"), smooth=T)

The QQ plot leads to some questions about whether the residuals are
normally distributed. For large quantiles of inflation they clearly are
off the dotted line. This is a problem for the regression.
The RF plot shows that the mean of residuals may be a bit below zero,
but it is pretty uniform across the fitted values. They are pretty well
scattered about zero and look random.
The SL plot shows that the standard deviation is declining rather
than being constant, but it isn’t too awful.
In short, the residual analysis is not giving us much more confidence
than the low R-squared in the statistics did. The Linear regression is
not too likely to be a good model for prediction.
Reciprocal Model
Now we estimate the Reciprocal Model using R.
fit.rec = lm(INFLCHANGE~I(1/UNRATE), data=phillips)
fit.rec %>% glance
fit.rec %>% tidy(conf.int=T) %>% gt_add_significance()
term |
estimate |
std.error |
statistic |
p.value |
conf.low |
conf.high |
(Intercept) |
−3.0684 |
0.9699 |
−3.1635 |
0.0028 |
−5.0231 |
−1.1136 |
I(1/UNRATE) |
17.2077 |
5.2325 |
3.2886 |
0.0020 |
6.6623 |
27.7530 |
The coefficient of 1/UNRATE
in the Reciprocal Model is
17.2, and it is significant at better than the .002 level.
The R-squared of the Linear model is only 0.20, however, which leads
us to wonder if this regression is useful. However the F-statistic test
tells us the regression is meaningful (naturally, since there is only
one driver in the regression, and the coefficient estimate is probably
not zero).
anova(fit.rec) %>% tidy %>% gt_add_significance()
term |
df |
sumsq |
meansq |
statistic |
p.value |
I(1/UNRATE) |
1.0000 |
25.6426 |
25.6426 |
10.8151 |
0.0020 |
Residuals |
44.0000 |
104.3237 |
2.3710 |
NA |
NA |
The anova()
table tells us the same thing.
We are obliged to look at the residual distribution to assess any
regression.
resid_panel(fit.rec, plots=c("resid","qq","ls","hist"), smooth=T)

The QQ plot leads to some questions about whether the residuals are
normally distributed. For large quantiles they clearly are off the
dotted line. This is a problem for the regression.
The RF plot shows that the mean of residuals may be a bit below zero,
and is not uniform across the fitted values. They are pretty well
scattered about zero,but don’t look random; they are wide in the middle
and narrow at the ends.
The LS plot shows that the standard deviation is declining rather
than being constant, and is way up for small fitted values.
In short, the residual analysis is giving us even less confidence
than the low R-squared in the statistics did. The Reciprocal regression
is also even less likely to be a good model for prediction.
Visualizing the models
Here we will graph the Linear and Reciprocal models on top of the
data, so we can get a picture of how well they estimate the data itself.
Of course the real question is can they be used by the Fed for instance
to estimate future inflation change from unemployment now. These models
probably would not be good choices.
We’ll grab the code for the side-by-side plots above and add layers
for the regression lines with the geom_smooth()
function.
It won’t work for the reciprocal,so we will just plot the regression
line.
gr_linear = ggplot(phillips,aes(UNRATE,INFLCHANGE)) +
geom_point(na.rm=TRUE) +
geom_text(x=8, y=5, label="Linear") +
geom_smooth(method="lm", se=T, formula='y~x', na.rm=T)
gr_recip = ggplot(phillips,aes(I(1/UNRATE),INFLCHANGE)) +
geom_point(na.rm=TRUE) +
geom_text(x=0.25, y=5, label="Reciprocal") +
geom_abline(aes(intercept=coef(fit.rec)[1],
slope=coef(fit.rec)[2]), color="red")
(gr_linear + gr_recip) +
plot_annotation(title="Linear and Reciprocal Estimates")

Many of the data points are outside the confidence bands for mean
estimates, signifying pretty poor forecasting ability.
Another way to look at the fit is to use yvp plots: we present them
side by side with patchwork
.
P = list(
resid_panel(fit.rec, plots=c("yvp"), smooth=T),
resid_panel(fit.lin, plots=c("yvp"), smooth=T)
)
P %>% wrap_plots(ncol=2) +
plot_annotation(subtitle = "Linear and Reciprocal responses")

P %>% ggmatrix(nrow=2, ncol=1,
xAxisLabels = c("Linear","Reciprocal"))

These both show a lot of noise in the predictions; high accuracy
would place the dots close to the blue line.
Conclusion
We looked at a Linear and a Reciprocal model of possible Phillips
Curves. Neither was found to be very good for forecasting future
inflation rate change for present unemployment.
This is in line with current critiques of the Phillips Curve as a
policy tool for the Fed.
