Initialize Notebook

Load required packages.

library(tidyverse)
library(ggthemr)
library(stargazer)
library(Metrics)
library(splines)
library(mgcv)

Set up workspace, i.e., remove all existing data from working memory, initialize the random number generator, turn of scientific notation of large numbers, set a standard theme for plotting.

Problem Description

Ask a home buyer to describe their dream house, and they probably won’t begin with the height of the basement ceiling or the proximity to an east-west railroad. But this dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence. With 76 predictor variables describing (almost) every aspect of residential homes in Ames, Iowa, this dataset challenges you to explain the final price of each home. Read more about this dataset: https://www.kaggle.com/c/house-prices-advanced-regression-techniques

From a theoretical perspective, real estate evaluation models are based on hedonic pricing models introduced by Lancaster and Rosen in the 1970s and 80s. The main idea of hedonic pricing models is that the overall price of a good can be estimated by decomposing it into its constituting characteristics, determining the value contribution of each individual characteristic, and summing these contributions up to obtain the final price. As we will see in this case study, hedonic pricing models can easily be implemented using multiple linear regression.

Load Data

Load data from CSV files. For better reproducibility, we load fixed train and test sets from CSV files.

training <- read_csv("ameshousing_training.csv")
Rows: 2346 Columns: 77
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (42): MSSubClass, MSZoning, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Neighborh...
dbl (35): LotFrontage, LotArea, OverallQual, OverallCond, YearBuilt, YearRemodAdd, MasVnrArea, BsmtFinSF1, BsmtF...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
test <- read_csv("ameshousing_test.csv")
Rows: 584 Columns: 77
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (42): MSSubClass, MSZoning, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Neighborh...
dbl (35): LotFrontage, LotArea, OverallQual, OverallCond, YearBuilt, YearRemodAdd, MasVnrArea, BsmtFinSF1, BsmtF...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Modeling

Polynomial Regression

We have already used linear regression with polynomial terms in Week 4. However, for reasons of completeness we will quickly revisit the idea and its implementation in R.

fit_poly <- lm(SalePrice ~ poly(GrLivArea, 3, raw = TRUE), data = training)
stargazer(fit_poly, intercept.bottom = FALSE, single.row = TRUE, type = "text")

===========================================================
                                    Dependent variable:    
                                ---------------------------
                                         SalePrice         
-----------------------------------------------------------
Constant                         32,559.340** (16,240.740) 
poly(GrLivArea, 3, raw = TRUE)1     73.527*** (26.862)     
poly(GrLivArea, 3, raw = TRUE)2        0.021 (0.014)       
poly(GrLivArea, 3, raw = TRUE)3     -0.00000 (0.00000)     
-----------------------------------------------------------
Observations                               2,346           
R2                                         0.523           
Adjusted R2                                0.522           
Residual Std. Error               54,306.370 (df = 2342)   
F Statistic                      855.075*** (df = 3; 2342) 
===========================================================
Note:                           *p<0.1; **p<0.05; ***p<0.01

As it is usually very difficult to interpret regression tables with polynomial terms, it is a good idea to plot the marginal effect of the predictor variable on the target.

# Create a grid of GrLivArea values
GrLivArea_range <- range(training$GrLivArea)
GrLivArea_grid <- data.frame(GrLivArea = seq(from = GrLivArea_range[1], to = GrLivArea_range[2]))

# Feed these values into the predict function; here we not only calculate predictions, but also their standard errors
preds <- predict(fit_poly, newdata = GrLivArea_grid, se.fit = TRUE)

# Gather the results in a data frame
sim_data <- data.frame(GrLivArea = GrLivArea_grid,
                       lci = preds$fit - 2 * preds$se.fit, 
                       fit = preds$fit, 
                       uci = preds$fit + 2 * preds$se.fit) 

# Plot the data
ggplot(data = sim_data) +
  geom_point(data = training, mapping = aes(x = GrLivArea, y = SalePrice), alpha = 0.3) + 
  # in geom_smooth we use 'identity' as we have already calculated the smooth function
  geom_smooth(mapping = aes(x = GrLivArea, y = fit, ymin = lci, ymax = uci), stat = "identity")

NA
NA

Step Functions

Instead of fitting one global polynomial model, we can also divide the predictor space into regions and fit more or less complex individual functions to these regions. The simplest approach is to use piecewise constant functions.

fit_stepfun <- lm(SalePrice ~ cut(GrLivArea, 4), data = training)

Print regression table to see coefficient estimates, standard errors, p-values, and R2. The predictors in a step function regression are binary indicator variables (dummy variables) and, hence, are easy to interpret.

stargazer(fit_stepfun, intercept.bottom = FALSE, single.row = TRUE, type = "text")

================================================================
                                         Dependent variable:    
                                     ---------------------------
                                              SalePrice         
----------------------------------------------------------------
Constant                             140,204.100*** (1,661.766) 
cut(GrLivArea, 4)(1.5e+03,2.56e+03]   80,353.260*** (2,544.750) 
cut(GrLivArea, 4)(2.56e+03,3.62e+03] 205,820.100*** (7,307.622) 
cut(GrLivArea, 4)(3.62e+03,4.68e+03] 336,685.900*** (26,867.220)
----------------------------------------------------------------
Observations                                    2,346           
R2                                              0.418           
Adjusted R2                                     0.417           
Residual Std. Error                    59,961.910 (df = 2342)   
F Statistic                           561.064*** (df = 3; 2342) 
================================================================
Note:                                *p<0.1; **p<0.05; ***p<0.01

We can also manually set the regions for the step functions using the cut function.

break_points <- c(0, 1000, 2000, 6000)
fit_stepfun <- lm(SalePrice ~ cut(GrLivArea, breaks = break_points),
                  data = training)
stargazer(fit_stepfun, intercept.bottom = FALSE, single.row = TRUE, type = "text")

==============================================================================
                                                       Dependent variable:    
                                                   ---------------------------
                                                            SalePrice         
------------------------------------------------------------------------------
Constant                                           112,599.800*** (3,389.495) 
cut(GrLivArea, breaks = break_points)(1e+03,2e+03]  62,448.470*** (3,723.251) 
cut(GrLivArea, breaks = break_points)(2e+03,6e+03] 173,256.300*** (4,950.424) 
------------------------------------------------------------------------------
Observations                                                  2,346           
R2                                                            0.351           
Adjusted R2                                                   0.350           
Residual Std. Error                                  63,321.000 (df = 2343)   
F Statistic                                         633.229*** (df = 2; 2343) 
==============================================================================
Note:                                              *p<0.1; **p<0.05; ***p<0.01

Although the regression coefficient in step function regression are easy to interpret, it is often enlightening to plot the predictions for a set of predictor variable values.

GrLivArea_range <- range(training$GrLivArea)
GrLivArea_grid <- data.frame(GrLivArea = seq(from = GrLivArea_range[1], to = GrLivArea_range[2]))

preds <- predict(fit_stepfun, newdata = GrLivArea_grid, se.fit = TRUE)

sim_data <- data.frame(GrLivArea = GrLivArea_grid,
                       lci = preds$fit - 2 * preds$se.fit, 
                       fit = preds$fit, 
                       uci = preds$fit + 2 * preds$se.fit) 

ggplot(data = sim_data) +
  geom_point(data = training, mapping = aes(x = GrLivArea, y = SalePrice), alpha = 0.3) +
  geom_smooth(mapping = aes(x = GrLivArea, y = fit, ymin = lci, ymax = uci), stat = "identity") +
  geom_vline(data = as.data.frame(break_points), mapping = aes(xintercept = break_points), color = "red", linetype="dashed")

NA
NA

Linear and Cubic Splines

Instead of using piecewise constant functions for the different regions, we can also use linear or polynomial (typically cubic) functions. The bs function can be used to define the knots (boundaries between regions) and the degree of the polynomial (e.g., 1 for linear, 2 for quadratic, 3 for cubic, …) for each predictor variable.

k <- c(1000, 2000)
fit_splines <- lm(SalePrice ~ bs(GrLivArea, knots = k, degree = 1), data = training)

How well does the model fit the training data?

summary(fit_splines)$r.squared
[1] 0.5224987

Note that it is very difficult, if not impossible, to interpret the individual regression coefficients and their combined effect on the target variable from the regression table. Hence, on typically plots the fitted values for set predefined predictor values.

GrLivArea_range <- range(training$GrLivArea)
GrLivArea_grid <- data.frame(GrLivArea = seq(from = GrLivArea_range[1], to = GrLivArea_range[2]))

preds <- predict(fit_splines, newdata = GrLivArea_grid, se.fit = TRUE)

sim_data <- data.frame(GrLivArea = GrLivArea_grid,
                       lci = preds$fit - 2 * preds$se.fit, 
                       fit = preds$fit, 
                       uci = preds$fit + 2 * preds$se.fit) 

ggplot(data = sim_data) +
  geom_point(data = training, mapping = aes(x = GrLivArea, y = SalePrice), alpha = 0.3) +
  geom_smooth(mapping = aes(x = GrLivArea, y = fit, ymin = lci, ymax = uci), stat = "identity") +
  geom_vline(data = as.data.frame(k), mapping = aes(xintercept = k), color = "red", linetype="dashed")

Smoothing Splines

Smoothing splines take the idea of splines to the extreme. In broad terms, the idea is to fit a natural cubic spline with knots at all unique values of the predictor variable. This basically means to connect all the dots on a scatterplot with a cubic function, which of course leads to overfitting. Hence, besides a loss function, smoothing splines also have a penalty term (lambda), which allows to control how jumpy or smooth the function should be. If lambda = 0, the function will be very jumpy and exactly interpolate the training observations, If lambda -> infinity, the function will be perfectly smooth, i.e., a straight line just like in traditional OLS regression. The optimal value for lambda is typically determined empirically via cross validation.

fit_smoothspline <- smooth.spline(y = training$SalePrice, x = training$GrLivArea) 
# Note that we do not have to set lambda manually, it will be automatically determined through cross validation. 
# But if you really wish, you can also manually define lambda

Look at the results of fitting the smoothing spline to the training data.

fit_smoothspline
Call:
smooth.spline(x = training$GrLivArea, y = training$SalePrice)

Smoothing Parameter  spar= 0.6269862  lambda= 0.000007598069 (11 iterations)
Equivalent Degrees of Freedom (Df): 32.99474
Penalized Criterion (RSS): 4180566557911
GCV: 2891025719

Plot the smoothing spline and training data.

GrLivArea_range <- range(training$GrLivArea)
GrLivArea_grid <- data.frame(GrLivArea = seq(from = GrLivArea_range[1], to = GrLivArea_range[2]))

preds <- as.data.frame(predict(fit_smoothspline, newdata = GrLivArea_grid))

ggplot(data = preds) +
  geom_point(data = training, mapping = aes(x = GrLivArea, y = SalePrice), alpha = 0.3) +
  geom_line(mapping = aes(x = x, y = y), color = "red")

Generalized Additive Models (GAM)

GAM allow to flexibly estimate a response variable by modeling it via different non-linear basis functions (e.g. polynomial regression, linear spline, smoothing spline) of each of the predictor variables, while maintaining the additivity assumption. Hence, it can be seen as a tool to unify the above introduced methods in one framework. In the example below, we estimate SalePrice by a linear spline of OverallQual with two automatically defined knots (as indicated by the function bs with df= 3; see documentation of bs() for details) and a smoothing spline of LotArea (indicated by the function s). The optimal lambda values for the smoothing spline are automatically determined via cross validation.

b <- c(2, 5, 8)
fit_gam <- gam(SalePrice ~ bs(OverallQual, df = 3, degree = 1) + 
                           s(LotArea), 
               data = training)

Let’s take a look at the regression tale - but don’t expect to be able to intuitively interpret the estimated coefficients.

summary(fit_gam)

Family: gaussian 
Link function: identity 

Formula:
SalePrice ~ bs(OverallQual, df = 3, degree = 1) + s(LotArea)

Parametric coefficients:
                                     Estimate Std. Error t value             Pr(>|t|)    
(Intercept)                             45556       7873   5.787        0.00000000815 ***
bs(OverallQual, df = 3, degree = 1)1    89102       8357  10.662 < 0.0000000000000002 ***
bs(OverallQual, df = 3, degree = 1)2   154014       7933  19.415 < 0.0000000000000002 ***
bs(OverallQual, df = 3, degree = 1)3   368507       9154  40.257 < 0.0000000000000002 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
             edf Ref.df     F             p-value    
s(LotArea) 8.094  8.749 75.68 <0.0000000000000002 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.757   Deviance explained = 75.9%
GCV = 1.5049e+09  Scale est. = 1.4971e+09  n = 2346

Instead, we plot the model again. The plot.gam function does not produce the nicest plots, but saves us a lot of time for manually simulating and plotting data.

plot.gam(fit_gam, se=TRUE, all.terms = TRUE)

NA

Use the GAM to make predictions on the test set and calculate RMSE.

test_preds_gam <- predict(fit_gam, newdata = test)
rmse(test_preds_gam, test$SalePrice)
[1] 41609.51

Compare the GAM mode with standard OLS regression using the same predictors.

fit_OLS <- lm(SalePrice ~ OverallQual + LotArea, data = training)
test_preds_OLS <- predict(fit_OLS, newdata = test)
rmse(test_preds_OLS, test$SalePrice)
[1] 47278.11
LS0tCnRpdGxlOiAiRGF0YSBTY2llbmNlIGZvciBCdXNpbmVzcyAtIFdlZWsgMDc6IEhvdXNlIFByaWNlcyBCZXlvbmQgTGluZWFyaXR5IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIEluaXRpYWxpemUgTm90ZWJvb2sKCkxvYWQgcmVxdWlyZWQgcGFja2FnZXMuCgpgYGB7ciBsb2FkIHBhY2thZ2VzLCB3YXJuaW5nPUZBTFNFLCBtZXNzYWdlPUZBTFNFfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShnZ3RoZW1yKQpsaWJyYXJ5KHN0YXJnYXplcikKbGlicmFyeShNZXRyaWNzKQpsaWJyYXJ5KHNwbGluZXMpCmxpYnJhcnkobWdjdikKCmBgYAoKU2V0IHVwIHdvcmtzcGFjZSwgaS5lLiwgcmVtb3ZlIGFsbCBleGlzdGluZyBkYXRhIGZyb20gd29ya2luZyBtZW1vcnksIGluaXRpYWxpemUgdGhlIHJhbmRvbSBudW1iZXIgZ2VuZXJhdG9yLCB0dXJuIG9mIHNjaWVudGlmaWMgbm90YXRpb24gb2YgbGFyZ2UgbnVtYmVycywgc2V0IGEgc3RhbmRhcmQgdGhlbWUgZm9yIHBsb3R0aW5nLgoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CiMgU2V0IHVwIHdvcmtzcGFjZQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUsIHdhcm5pbmcgPSBGQUxTRSkKcm0obGlzdD1scygpKQpzZXQuc2VlZCg0MikKb3B0aW9ucyhzY2lwZW49MTAwMDApCmdndGhlbXIoJ2ZyZXNoJykKb3B0aW9ucyh5YXJkc3RpY2suZXZlbnRfZmlyc3QgPSBGQUxTRSkKCmBgYAoKIyBQcm9ibGVtIERlc2NyaXB0aW9uCgpBc2sgYSBob21lIGJ1eWVyIHRvIGRlc2NyaWJlIHRoZWlyIGRyZWFtIGhvdXNlLCBhbmQgdGhleSBwcm9iYWJseSB3b24ndCBiZWdpbiB3aXRoIHRoZSBoZWlnaHQgb2YgdGhlIGJhc2VtZW50IGNlaWxpbmcgb3IgdGhlIHByb3hpbWl0eSB0byBhbiBlYXN0LXdlc3QgcmFpbHJvYWQuIEJ1dCB0aGlzIGRhdGFzZXQgcHJvdmVzIHRoYXQgbXVjaCBtb3JlIGluZmx1ZW5jZXMgcHJpY2UgbmVnb3RpYXRpb25zIHRoYW4gdGhlIG51bWJlciBvZiBiZWRyb29tcyBvciBhIHdoaXRlLXBpY2tldCBmZW5jZS4gV2l0aCA3NiBwcmVkaWN0b3IgdmFyaWFibGVzIGRlc2NyaWJpbmcgKGFsbW9zdCkgZXZlcnkgYXNwZWN0IG9mIHJlc2lkZW50aWFsIGhvbWVzIGluIEFtZXMsIElvd2EsIHRoaXMgZGF0YXNldCBjaGFsbGVuZ2VzIHlvdSB0byBleHBsYWluIHRoZSBmaW5hbCBwcmljZSBvZiBlYWNoIGhvbWUuIFJlYWQgbW9yZSBhYm91dCB0aGlzIGRhdGFzZXQ6IDxodHRwczovL3d3dy5rYWdnbGUuY29tL2MvaG91c2UtcHJpY2VzLWFkdmFuY2VkLXJlZ3Jlc3Npb24tdGVjaG5pcXVlcz4KCkZyb20gYSB0aGVvcmV0aWNhbCBwZXJzcGVjdGl2ZSwgcmVhbCBlc3RhdGUgZXZhbHVhdGlvbiBtb2RlbHMgYXJlIGJhc2VkIG9uIGhlZG9uaWMgcHJpY2luZyBtb2RlbHMgaW50cm9kdWNlZCBieSBMYW5jYXN0ZXIgYW5kIFJvc2VuIGluIHRoZSAxOTcwcyBhbmQgODBzLiBUaGUgbWFpbiBpZGVhIG9mIGhlZG9uaWMgcHJpY2luZyBtb2RlbHMgaXMgdGhhdCB0aGUgb3ZlcmFsbCBwcmljZSBvZiBhIGdvb2QgY2FuIGJlIGVzdGltYXRlZCBieSBkZWNvbXBvc2luZyBpdCBpbnRvIGl0cyBjb25zdGl0dXRpbmcgY2hhcmFjdGVyaXN0aWNzLCBkZXRlcm1pbmluZyB0aGUgdmFsdWUgY29udHJpYnV0aW9uIG9mIGVhY2ggaW5kaXZpZHVhbCBjaGFyYWN0ZXJpc3RpYywgYW5kIHN1bW1pbmcgdGhlc2UgY29udHJpYnV0aW9ucyB1cCB0byBvYnRhaW4gdGhlIGZpbmFsIHByaWNlLiBBcyB3ZSB3aWxsIHNlZSBpbiB0aGlzIGNhc2Ugc3R1ZHksIGhlZG9uaWMgcHJpY2luZyBtb2RlbHMgY2FuIGVhc2lseSBiZSBpbXBsZW1lbnRlZCB1c2luZyBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbi4KCiMgTG9hZCBEYXRhCgpMb2FkIGRhdGEgZnJvbSBDU1YgZmlsZXMuIEZvciBiZXR0ZXIgcmVwcm9kdWNpYmlsaXR5LCB3ZSBsb2FkIGZpeGVkIHRyYWluIGFuZCB0ZXN0IHNldHMgZnJvbSBDU1YgZmlsZXMuCgpgYGB7cn0KdHJhaW5pbmcgPC0gcmVhZF9jc3YoImFtZXNob3VzaW5nX3RyYWluaW5nLmNzdiIpCnRlc3QgPC0gcmVhZF9jc3YoImFtZXNob3VzaW5nX3Rlc3QuY3N2IikKCmBgYAoKIyBNb2RlbGluZwoKIyMgUG9seW5vbWlhbCBSZWdyZXNzaW9uCgpXZSBoYXZlIGFscmVhZHkgdXNlZCBsaW5lYXIgcmVncmVzc2lvbiB3aXRoIHBvbHlub21pYWwgdGVybXMgaW4gV2VlayA0LiBIb3dldmVyLCBmb3IgcmVhc29ucyBvZiBjb21wbGV0ZW5lc3Mgd2Ugd2lsbCBxdWlja2x5IHJldmlzaXQgdGhlIGlkZWEgYW5kIGl0cyBpbXBsZW1lbnRhdGlvbiBpbiBSLgoKYGBge3J9CmZpdF9wb2x5IDwtIGxtKFNhbGVQcmljZSB+IHBvbHkoR3JMaXZBcmVhLCAzLCByYXcgPSBUUlVFKSwgZGF0YSA9IHRyYWluaW5nKQoKYGBgCgpgYGB7ciwgd2FybmluZz1GQUxTRX0Kc3RhcmdhemVyKGZpdF9wb2x5LCBpbnRlcmNlcHQuYm90dG9tID0gRkFMU0UsIHNpbmdsZS5yb3cgPSBUUlVFLCB0eXBlID0gInRleHQiKQoKYGBgCgpBcyBpdCBpcyB1c3VhbGx5IHZlcnkgZGlmZmljdWx0IHRvIGludGVycHJldCByZWdyZXNzaW9uIHRhYmxlcyB3aXRoIHBvbHlub21pYWwgdGVybXMsIGl0IGlzIGEgZ29vZCBpZGVhIHRvIHBsb3QgdGhlIG1hcmdpbmFsIGVmZmVjdCBvZiB0aGUgcHJlZGljdG9yIHZhcmlhYmxlIG9uIHRoZSB0YXJnZXQuCgpgYGB7cn0KIyBDcmVhdGUgYSBncmlkIG9mIEdyTGl2QXJlYSB2YWx1ZXMKR3JMaXZBcmVhX3JhbmdlIDwtIHJhbmdlKHRyYWluaW5nJEdyTGl2QXJlYSkKR3JMaXZBcmVhX2dyaWQgPC0gZGF0YS5mcmFtZShHckxpdkFyZWEgPSBzZXEoZnJvbSA9IEdyTGl2QXJlYV9yYW5nZVsxXSwgdG8gPSBHckxpdkFyZWFfcmFuZ2VbMl0pKQoKIyBGZWVkIHRoZXNlIHZhbHVlcyBpbnRvIHRoZSBwcmVkaWN0IGZ1bmN0aW9uOyBoZXJlIHdlIG5vdCBvbmx5IGNhbGN1bGF0ZSBwcmVkaWN0aW9ucywgYnV0IGFsc28gdGhlaXIgc3RhbmRhcmQgZXJyb3JzCnByZWRzIDwtIHByZWRpY3QoZml0X3BvbHksIG5ld2RhdGEgPSBHckxpdkFyZWFfZ3JpZCwgc2UuZml0ID0gVFJVRSkKCiMgR2F0aGVyIHRoZSByZXN1bHRzIGluIGEgZGF0YSBmcmFtZQpzaW1fZGF0YSA8LSBkYXRhLmZyYW1lKEdyTGl2QXJlYSA9IEdyTGl2QXJlYV9ncmlkLAogICAgICAgICAgICAgICAgICAgICAgIGxjaSA9IHByZWRzJGZpdCAtIDIgKiBwcmVkcyRzZS5maXQsIAogICAgICAgICAgICAgICAgICAgICAgIGZpdCA9IHByZWRzJGZpdCwgCiAgICAgICAgICAgICAgICAgICAgICAgdWNpID0gcHJlZHMkZml0ICsgMiAqIHByZWRzJHNlLmZpdCkgCgojIFBsb3QgdGhlIGRhdGEKZ2dwbG90KGRhdGEgPSBzaW1fZGF0YSkgKwogIGdlb21fcG9pbnQoZGF0YSA9IHRyYWluaW5nLCBtYXBwaW5nID0gYWVzKHggPSBHckxpdkFyZWEsIHkgPSBTYWxlUHJpY2UpLCBhbHBoYSA9IDAuMykgKyAKICAjIGluIGdlb21fc21vb3RoIHdlIHVzZSAnaWRlbnRpdHknIGFzIHdlIGhhdmUgYWxyZWFkeSBjYWxjdWxhdGVkIHRoZSBzbW9vdGggZnVuY3Rpb24KICBnZW9tX3Ntb290aChtYXBwaW5nID0gYWVzKHggPSBHckxpdkFyZWEsIHkgPSBmaXQsIHltaW4gPSBsY2ksIHltYXggPSB1Y2kpLCBzdGF0ID0gImlkZW50aXR5IikKICAKCmBgYAoKIyMgU3RlcCBGdW5jdGlvbnMKCkluc3RlYWQgb2YgZml0dGluZyBvbmUgZ2xvYmFsIHBvbHlub21pYWwgbW9kZWwsIHdlIGNhbiBhbHNvIGRpdmlkZSB0aGUgcHJlZGljdG9yIHNwYWNlIGludG8gcmVnaW9ucyBhbmQgZml0IG1vcmUgb3IgbGVzcyBjb21wbGV4IGluZGl2aWR1YWwgZnVuY3Rpb25zIHRvIHRoZXNlIHJlZ2lvbnMuIFRoZSBzaW1wbGVzdCBhcHByb2FjaCBpcyB0byB1c2UgcGllY2V3aXNlIGNvbnN0YW50IGZ1bmN0aW9ucy4KCmBgYHtyfQpmaXRfc3RlcGZ1biA8LSBsbShTYWxlUHJpY2UgfiBjdXQoR3JMaXZBcmVhLCA0KSwgZGF0YSA9IHRyYWluaW5nKQoKYGBgCgpQcmludCByZWdyZXNzaW9uIHRhYmxlIHRvIHNlZSBjb2VmZmljaWVudCBlc3RpbWF0ZXMsIHN0YW5kYXJkIGVycm9ycywgcC12YWx1ZXMsIGFuZCBSMi4gVGhlIHByZWRpY3RvcnMgaW4gYSBzdGVwIGZ1bmN0aW9uIHJlZ3Jlc3Npb24gYXJlIGJpbmFyeSBpbmRpY2F0b3IgdmFyaWFibGVzIChkdW1teSB2YXJpYWJsZXMpIGFuZCwgaGVuY2UsIGFyZSBlYXN5IHRvIGludGVycHJldC4KCmBgYHtyLCB3YXJuaW5nPUZBTFNFfQpzdGFyZ2F6ZXIoZml0X3N0ZXBmdW4sIGludGVyY2VwdC5ib3R0b20gPSBGQUxTRSwgc2luZ2xlLnJvdyA9IFRSVUUsIHR5cGUgPSAidGV4dCIpCgpgYGAKCldlIGNhbiBhbHNvIG1hbnVhbGx5IHNldCB0aGUgcmVnaW9ucyBmb3IgdGhlIHN0ZXAgZnVuY3Rpb25zIHVzaW5nIHRoZSAqY3V0KiBmdW5jdGlvbi4KCmBgYHtyLCB3YXJuaW5nPUZBTFNFfQpicmVha19wb2ludHMgPC0gYygwLCAxMDAwLCAyMDAwLCA2MDAwKQpmaXRfc3RlcGZ1biA8LSBsbShTYWxlUHJpY2UgfiBjdXQoR3JMaXZBcmVhLCBicmVha3MgPSBicmVha19wb2ludHMpLAogICAgICAgICAgICAgICAgICBkYXRhID0gdHJhaW5pbmcpCnN0YXJnYXplcihmaXRfc3RlcGZ1biwgaW50ZXJjZXB0LmJvdHRvbSA9IEZBTFNFLCBzaW5nbGUucm93ID0gVFJVRSwgdHlwZSA9ICJ0ZXh0IikKCmBgYAoKQWx0aG91Z2ggdGhlIHJlZ3Jlc3Npb24gY29lZmZpY2llbnQgaW4gc3RlcCBmdW5jdGlvbiByZWdyZXNzaW9uIGFyZSBlYXN5IHRvIGludGVycHJldCwgaXQgaXMgb2Z0ZW4gZW5saWdodGVuaW5nIHRvIHBsb3QgdGhlIHByZWRpY3Rpb25zIGZvciBhIHNldCBvZiBwcmVkaWN0b3IgdmFyaWFibGUgdmFsdWVzLgoKYGBge3J9CkdyTGl2QXJlYV9yYW5nZSA8LSByYW5nZSh0cmFpbmluZyRHckxpdkFyZWEpCkdyTGl2QXJlYV9ncmlkIDwtIGRhdGEuZnJhbWUoR3JMaXZBcmVhID0gc2VxKGZyb20gPSBHckxpdkFyZWFfcmFuZ2VbMV0sIHRvID0gR3JMaXZBcmVhX3JhbmdlWzJdKSkKCnByZWRzIDwtIHByZWRpY3QoZml0X3N0ZXBmdW4sIG5ld2RhdGEgPSBHckxpdkFyZWFfZ3JpZCwgc2UuZml0ID0gVFJVRSkKCnNpbV9kYXRhIDwtIGRhdGEuZnJhbWUoR3JMaXZBcmVhID0gR3JMaXZBcmVhX2dyaWQsCiAgICAgICAgICAgICAgICAgICAgICAgbGNpID0gcHJlZHMkZml0IC0gMiAqIHByZWRzJHNlLmZpdCwgCiAgICAgICAgICAgICAgICAgICAgICAgZml0ID0gcHJlZHMkZml0LCAKICAgICAgICAgICAgICAgICAgICAgICB1Y2kgPSBwcmVkcyRmaXQgKyAyICogcHJlZHMkc2UuZml0KSAKCmdncGxvdChkYXRhID0gc2ltX2RhdGEpICsKICBnZW9tX3BvaW50KGRhdGEgPSB0cmFpbmluZywgbWFwcGluZyA9IGFlcyh4ID0gR3JMaXZBcmVhLCB5ID0gU2FsZVByaWNlKSwgYWxwaGEgPSAwLjMpICsKICBnZW9tX3Ntb290aChtYXBwaW5nID0gYWVzKHggPSBHckxpdkFyZWEsIHkgPSBmaXQsIHltaW4gPSBsY2ksIHltYXggPSB1Y2kpLCBzdGF0ID0gImlkZW50aXR5IikgKwogIGdlb21fdmxpbmUoZGF0YSA9IGFzLmRhdGEuZnJhbWUoYnJlYWtfcG9pbnRzKSwgbWFwcGluZyA9IGFlcyh4aW50ZXJjZXB0ID0gYnJlYWtfcG9pbnRzKSwgY29sb3IgPSAicmVkIiwgbGluZXR5cGU9ImRhc2hlZCIpCiAgCgpgYGAKCiMjIExpbmVhciBhbmQgQ3ViaWMgU3BsaW5lcwoKSW5zdGVhZCBvZiB1c2luZyBwaWVjZXdpc2UgY29uc3RhbnQgZnVuY3Rpb25zIGZvciB0aGUgZGlmZmVyZW50IHJlZ2lvbnMsIHdlIGNhbiBhbHNvIHVzZSBsaW5lYXIgb3IgcG9seW5vbWlhbCAodHlwaWNhbGx5IGN1YmljKSBmdW5jdGlvbnMuIFRoZSAqYnMqIGZ1bmN0aW9uIGNhbiBiZSB1c2VkIHRvIGRlZmluZSB0aGUga25vdHMgKGJvdW5kYXJpZXMgYmV0d2VlbiByZWdpb25zKSBhbmQgdGhlIGRlZ3JlZSBvZiB0aGUgcG9seW5vbWlhbCAoZS5nLiwgMSBmb3IgbGluZWFyLCAyIGZvciBxdWFkcmF0aWMsIDMgZm9yIGN1YmljLCAuLi4pIGZvciBlYWNoIHByZWRpY3RvciB2YXJpYWJsZS4KCmBgYHtyfQprIDwtIGMoMTAwMCwgMjAwMCkKZml0X3NwbGluZXMgPC0gbG0oU2FsZVByaWNlIH4gYnMoR3JMaXZBcmVhLCBrbm90cyA9IGssIGRlZ3JlZSA9IDEpLCBkYXRhID0gdHJhaW5pbmcpCgpgYGAKCkhvdyB3ZWxsIGRvZXMgdGhlIG1vZGVsIGZpdCB0aGUgdHJhaW5pbmcgZGF0YT8KCmBgYHtyfQpzdW1tYXJ5KGZpdF9zcGxpbmVzKSRyLnNxdWFyZWQKCmBgYAoKTm90ZSB0aGF0IGl0IGlzIHZlcnkgZGlmZmljdWx0LCBpZiBub3QgaW1wb3NzaWJsZSwgdG8gaW50ZXJwcmV0IHRoZSBpbmRpdmlkdWFsIHJlZ3Jlc3Npb24gY29lZmZpY2llbnRzIGFuZCB0aGVpciBjb21iaW5lZCBlZmZlY3Qgb24gdGhlIHRhcmdldCB2YXJpYWJsZSBmcm9tIHRoZSByZWdyZXNzaW9uIHRhYmxlLiBIZW5jZSwgb24gdHlwaWNhbGx5IHBsb3RzIHRoZSBmaXR0ZWQgdmFsdWVzIGZvciBzZXQgcHJlZGVmaW5lZCBwcmVkaWN0b3IgdmFsdWVzLgoKYGBge3J9CkdyTGl2QXJlYV9yYW5nZSA8LSByYW5nZSh0cmFpbmluZyRHckxpdkFyZWEpCkdyTGl2QXJlYV9ncmlkIDwtIGRhdGEuZnJhbWUoR3JMaXZBcmVhID0gc2VxKGZyb20gPSBHckxpdkFyZWFfcmFuZ2VbMV0sIHRvID0gR3JMaXZBcmVhX3JhbmdlWzJdKSkKCnByZWRzIDwtIHByZWRpY3QoZml0X3NwbGluZXMsIG5ld2RhdGEgPSBHckxpdkFyZWFfZ3JpZCwgc2UuZml0ID0gVFJVRSkKCnNpbV9kYXRhIDwtIGRhdGEuZnJhbWUoR3JMaXZBcmVhID0gR3JMaXZBcmVhX2dyaWQsCiAgICAgICAgICAgICAgICAgICAgICAgbGNpID0gcHJlZHMkZml0IC0gMiAqIHByZWRzJHNlLmZpdCwgCiAgICAgICAgICAgICAgICAgICAgICAgZml0ID0gcHJlZHMkZml0LCAKICAgICAgICAgICAgICAgICAgICAgICB1Y2kgPSBwcmVkcyRmaXQgKyAyICogcHJlZHMkc2UuZml0KSAKCmdncGxvdChkYXRhID0gc2ltX2RhdGEpICsKICBnZW9tX3BvaW50KGRhdGEgPSB0cmFpbmluZywgbWFwcGluZyA9IGFlcyh4ID0gR3JMaXZBcmVhLCB5ID0gU2FsZVByaWNlKSwgYWxwaGEgPSAwLjMpICsKICBnZW9tX3Ntb290aChtYXBwaW5nID0gYWVzKHggPSBHckxpdkFyZWEsIHkgPSBmaXQsIHltaW4gPSBsY2ksIHltYXggPSB1Y2kpLCBzdGF0ID0gImlkZW50aXR5IikgKwogIGdlb21fdmxpbmUoZGF0YSA9IGFzLmRhdGEuZnJhbWUoayksIG1hcHBpbmcgPSBhZXMoeGludGVyY2VwdCA9IGspLCBjb2xvciA9ICJyZWQiLCBsaW5ldHlwZT0iZGFzaGVkIikKCmBgYAoKIyMgU21vb3RoaW5nIFNwbGluZXMKClNtb290aGluZyBzcGxpbmVzIHRha2UgdGhlIGlkZWEgb2Ygc3BsaW5lcyB0byB0aGUgZXh0cmVtZS4gSW4gYnJvYWQgdGVybXMsIHRoZSBpZGVhIGlzIHRvIGZpdCBhIG5hdHVyYWwgY3ViaWMgc3BsaW5lIHdpdGgga25vdHMgYXQgYWxsIHVuaXF1ZSB2YWx1ZXMgb2YgdGhlIHByZWRpY3RvciB2YXJpYWJsZS4gVGhpcyBiYXNpY2FsbHkgbWVhbnMgdG8gY29ubmVjdCBhbGwgdGhlIGRvdHMgb24gYSBzY2F0dGVycGxvdCB3aXRoIGEgY3ViaWMgZnVuY3Rpb24sIHdoaWNoIG9mIGNvdXJzZSBsZWFkcyB0byBvdmVyZml0dGluZy4gSGVuY2UsIGJlc2lkZXMgYSBsb3NzIGZ1bmN0aW9uLCBzbW9vdGhpbmcgc3BsaW5lcyBhbHNvIGhhdmUgYSBwZW5hbHR5IHRlcm0gKGBsYW1iZGFgKSwgd2hpY2ggYWxsb3dzIHRvIGNvbnRyb2wgaG93IGp1bXB5IG9yIHNtb290aCB0aGUgZnVuY3Rpb24gc2hvdWxkIGJlLiBJZiBsYW1iZGEgPSAwLCB0aGUgZnVuY3Rpb24gd2lsbCBiZSB2ZXJ5IGp1bXB5IGFuZCBleGFjdGx5IGludGVycG9sYXRlIHRoZSB0cmFpbmluZyBvYnNlcnZhdGlvbnMsIElmIGxhbWJkYSAtXD4gaW5maW5pdHksIHRoZSBmdW5jdGlvbiB3aWxsIGJlIHBlcmZlY3RseSBzbW9vdGgsIGkuZS4sIGEgc3RyYWlnaHQgbGluZSBqdXN0IGxpa2UgaW4gdHJhZGl0aW9uYWwgT0xTIHJlZ3Jlc3Npb24uIFRoZSBvcHRpbWFsIHZhbHVlIGZvciBsYW1iZGEgaXMgdHlwaWNhbGx5IGRldGVybWluZWQgZW1waXJpY2FsbHkgdmlhIGNyb3NzIHZhbGlkYXRpb24uCgpgYGB7cn0KZml0X3Ntb290aHNwbGluZSA8LSBzbW9vdGguc3BsaW5lKHkgPSB0cmFpbmluZyRTYWxlUHJpY2UsIHggPSB0cmFpbmluZyRHckxpdkFyZWEpIAojIE5vdGUgdGhhdCB3ZSBkbyBub3QgaGF2ZSB0byBzZXQgbGFtYmRhIG1hbnVhbGx5LCBpdCB3aWxsIGJlIGF1dG9tYXRpY2FsbHkgZGV0ZXJtaW5lZCB0aHJvdWdoIGNyb3NzIHZhbGlkYXRpb24uIAojIEJ1dCBpZiB5b3UgcmVhbGx5IHdpc2gsIHlvdSBjYW4gYWxzbyBtYW51YWxseSBkZWZpbmUgbGFtYmRhCgpgYGAKCkxvb2sgYXQgdGhlIHJlc3VsdHMgb2YgZml0dGluZyB0aGUgc21vb3RoaW5nIHNwbGluZSB0byB0aGUgdHJhaW5pbmcgZGF0YS4KCmBgYHtyfQpmaXRfc21vb3Roc3BsaW5lCgpgYGAKClBsb3QgdGhlIHNtb290aGluZyBzcGxpbmUgYW5kIHRyYWluaW5nIGRhdGEuCgpgYGB7cn0KR3JMaXZBcmVhX3JhbmdlIDwtIHJhbmdlKHRyYWluaW5nJEdyTGl2QXJlYSkKR3JMaXZBcmVhX2dyaWQgPC0gZGF0YS5mcmFtZShHckxpdkFyZWEgPSBzZXEoZnJvbSA9IEdyTGl2QXJlYV9yYW5nZVsxXSwgdG8gPSBHckxpdkFyZWFfcmFuZ2VbMl0pKQoKcHJlZHMgPC0gYXMuZGF0YS5mcmFtZShwcmVkaWN0KGZpdF9zbW9vdGhzcGxpbmUsIG5ld2RhdGEgPSBHckxpdkFyZWFfZ3JpZCkpCgpnZ3Bsb3QoZGF0YSA9IHByZWRzKSArCiAgZ2VvbV9wb2ludChkYXRhID0gdHJhaW5pbmcsIG1hcHBpbmcgPSBhZXMoeCA9IEdyTGl2QXJlYSwgeSA9IFNhbGVQcmljZSksIGFscGhhID0gMC4zKSArCiAgZ2VvbV9saW5lKG1hcHBpbmcgPSBhZXMoeCA9IHgsIHkgPSB5KSwgY29sb3IgPSAicmVkIikKCmBgYAoKIyMgR2VuZXJhbGl6ZWQgQWRkaXRpdmUgTW9kZWxzIChHQU0pCgpHQU0gYWxsb3cgdG8gZmxleGlibHkgZXN0aW1hdGUgYSByZXNwb25zZSB2YXJpYWJsZSBieSBtb2RlbGluZyBpdCB2aWEgZGlmZmVyZW50IG5vbi1saW5lYXIgYmFzaXMgZnVuY3Rpb25zIChlLmcuIHBvbHlub21pYWwgcmVncmVzc2lvbiwgbGluZWFyIHNwbGluZSwgc21vb3RoaW5nIHNwbGluZSkgb2YgZWFjaCBvZiB0aGUgcHJlZGljdG9yIHZhcmlhYmxlcywgd2hpbGUgbWFpbnRhaW5pbmcgdGhlIGFkZGl0aXZpdHkgYXNzdW1wdGlvbi4gSGVuY2UsIGl0IGNhbiBiZSBzZWVuIGFzIGEgdG9vbCB0byB1bmlmeSB0aGUgYWJvdmUgaW50cm9kdWNlZCBtZXRob2RzIGluIG9uZSBmcmFtZXdvcmsuIEluIHRoZSBleGFtcGxlIGJlbG93LCB3ZSBlc3RpbWF0ZSBgU2FsZVByaWNlYCBieSBhIGxpbmVhciBzcGxpbmUgb2YgYE92ZXJhbGxRdWFsYCB3aXRoIHR3byBhdXRvbWF0aWNhbGx5IGRlZmluZWQga25vdHMgKGFzIGluZGljYXRlZCBieSB0aGUgZnVuY3Rpb24gYGJzYCB3aXRoIGBkZmA9IDM7IHNlZSBkb2N1bWVudGF0aW9uIG9mIGJzKCkgZm9yIGRldGFpbHMpIGFuZCBhIHNtb290aGluZyBzcGxpbmUgb2YgYExvdEFyZWFgIChpbmRpY2F0ZWQgYnkgdGhlIGZ1bmN0aW9uIGBzYCkuIFRoZSBvcHRpbWFsIGBsYW1iZGFgIHZhbHVlcyBmb3IgdGhlIHNtb290aGluZyBzcGxpbmUgYXJlIGF1dG9tYXRpY2FsbHkgZGV0ZXJtaW5lZCB2aWEgY3Jvc3MgdmFsaWRhdGlvbi4KCmBgYHtyfQpiIDwtIGMoMiwgNSwgOCkKZml0X2dhbSA8LSBnYW0oU2FsZVByaWNlIH4gYnMoT3ZlcmFsbFF1YWwsIGRmID0gMywgZGVncmVlID0gMSkgKyAKICAgICAgICAgICAgICAgICAgICAgICAgICAgcyhMb3RBcmVhKSwgCiAgICAgICAgICAgICAgIGRhdGEgPSB0cmFpbmluZykKCmBgYAoKTGV0J3MgdGFrZSBhIGxvb2sgYXQgdGhlIHJlZ3Jlc3Npb24gdGFsZSAtIGJ1dCBkb24ndCBleHBlY3QgdG8gYmUgYWJsZSB0byBpbnR1aXRpdmVseSBpbnRlcnByZXQgdGhlIGVzdGltYXRlZCBjb2VmZmljaWVudHMuCgpgYGB7cn0Kc3VtbWFyeShmaXRfZ2FtKQoKYGBgCgpJbnN0ZWFkLCB3ZSBwbG90IHRoZSBtb2RlbCBhZ2Fpbi4gVGhlIGBwbG90LmdhbWAgZnVuY3Rpb24gZG9lcyBub3QgcHJvZHVjZSB0aGUgbmljZXN0IHBsb3RzLCBidXQgc2F2ZXMgdXMgYSBsb3Qgb2YgdGltZSBmb3IgbWFudWFsbHkgc2ltdWxhdGluZyBhbmQgcGxvdHRpbmcgZGF0YS4KCmBgYHtyfQpwbG90LmdhbShmaXRfZ2FtLCBzZT1UUlVFLCBhbGwudGVybXMgPSBUUlVFKQoKYGBgCgpVc2UgdGhlIEdBTSB0byBtYWtlIHByZWRpY3Rpb25zIG9uIHRoZSB0ZXN0IHNldCBhbmQgY2FsY3VsYXRlIFJNU0UuCgpgYGB7ciwgd2FybmluZz1GQUxTRX0KdGVzdF9wcmVkc19nYW0gPC0gcHJlZGljdChmaXRfZ2FtLCBuZXdkYXRhID0gdGVzdCkKcm1zZSh0ZXN0X3ByZWRzX2dhbSwgdGVzdCRTYWxlUHJpY2UpCgpgYGAKCkNvbXBhcmUgdGhlIEdBTSBtb2RlIHdpdGggc3RhbmRhcmQgT0xTIHJlZ3Jlc3Npb24gdXNpbmcgdGhlIHNhbWUgcHJlZGljdG9ycy4KCmBgYHtyfQpmaXRfT0xTIDwtIGxtKFNhbGVQcmljZSB+IE92ZXJhbGxRdWFsICsgTG90QXJlYSwgZGF0YSA9IHRyYWluaW5nKQp0ZXN0X3ByZWRzX09MUyA8LSBwcmVkaWN0KGZpdF9PTFMsIG5ld2RhdGEgPSB0ZXN0KQpybXNlKHRlc3RfcHJlZHNfT0xTLCB0ZXN0JFNhbGVQcmljZSkKCmBgYAo=