Title: | Estimate Hierarchical Feature Regression Models |
---|---|
Description: | Provides functions for the estimation, plotting, predicting and cross-validation of hierarchical feature regression models as described in Pfitzinger (2024). Cluster Regularization via a Hierarchical Feature Regression. Econometrics and Statistics (in press). <doi:10.1016/j.ecosta.2024.01.003>. |
Authors: | Johann Pfitzinger [aut, cre] |
Maintainer: | Johann Pfitzinger <[email protected]> |
License: | GPL-2 |
Version: | 0.7.1 |
Built: | 2024-11-23 04:02:40 UTC |
Source: | https://github.com/jpfitzinger/hfr |
HFR is a regularized regression estimator that decomposes a least squares regression along a supervised hierarchical graph, and shrinks the edges of the estimated graph to regularize parameters. The algorithm leads to group shrinkage in the regression parameters and a reduction in the effective model degrees of freedom.
cv.hfr( x, y, weights = NULL, kappa = seq(0, 1, by = 0.1), q = NULL, intercept = TRUE, standardize = TRUE, nfolds = 10, foldid = NULL, partial_method = c("pairwise", "shrinkage"), l2_penalty = 0, ... )
cv.hfr( x, y, weights = NULL, kappa = seq(0, 1, by = 0.1), q = NULL, intercept = TRUE, standardize = TRUE, nfolds = 10, foldid = NULL, partial_method = c("pairwise", "shrinkage"), l2_penalty = 0, ... )
x |
Input matrix or data.frame, of dimension |
y |
Response variable. |
weights |
an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used for the level-specific regressions. |
kappa |
A vector of target effective degrees of freedom of the regression. |
q |
Thinning parameter representing the quantile cut-off (in terms of contributed variance) above which to consider levels in the hierarchy. This can used to reduce the number of levels in high-dimensional problems. Default is no thinning. |
intercept |
Should intercept be fitted. Default is |
standardize |
Logical flag for |
nfolds |
The number of folds for k-fold cross validation. Default is |
foldid |
An optional vector of values between |
partial_method |
Indicate whether to use pairwise partial correlations, or shrinkage partial correlations. |
l2_penalty |
Optional penalty for level-specific regressions (useful in high-dimensional case) |
... |
Additional arguments passed to |
This function fits an HFR to a grid of kappa
hyperparameter values. The result is a
matrix of coefficients with one column for each hyperparameter. By evaluating all hyperparameters
in a single function, the speed of the cross-validation procedure is improved substantially (since
level-specific regressions are estimated only once).
When nfolds > 1
, a cross validation is performed with shuffled data. Alternatively,
test slices can be passed to the function using the foldid
argument. The result
of the cross validation is given by best_kappa
in the output object.
A 'cv.hfr' regression object.
Johann Pfitzinger
Pfitzinger, Johann (2024). Cluster Regularization via a Hierarchical Feature Regression. _Econometrics and Statistics_ (in press). URL https://doi.org/10.1016/j.ecosta.2024.01.003.
hfr
, coef
, plot
and predict
methods
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1)) coef(fit)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1)) coef(fit)
HFR is a regularized regression estimator that decomposes a least squares regression along a supervised hierarchical graph, and shrinks the edges of the estimated graph to regularize parameters. The algorithm leads to group shrinkage in the regression parameters and a reduction in the effective model degrees of freedom.
hfr( x, y, weights = NULL, kappa = 1, q = NULL, intercept = TRUE, standardize = TRUE, partial_method = c("pairwise", "shrinkage"), l2_penalty = 0, ... )
hfr( x, y, weights = NULL, kappa = 1, q = NULL, intercept = TRUE, standardize = TRUE, partial_method = c("pairwise", "shrinkage"), l2_penalty = 0, ... )
x |
Input matrix or data.frame, of dimension |
y |
Response variable. |
weights |
an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used for the level-specific regressions. |
kappa |
The target effective degrees of freedom of the regression as a percentage of |
q |
Thinning parameter representing the quantile cut-off (in terms of contributed variance) above which to consider levels in the hierarchy. This can used to reduce the number of levels in high-dimensional problems. Default is no thinning. |
intercept |
Should intercept be fitted. Default is |
standardize |
Logical flag for x variable standardization prior to fitting the model. The coefficients are always returned on the original scale. Default is |
partial_method |
Indicate whether to use pairwise partial correlations, or shrinkage partial correlations. |
l2_penalty |
Optional penalty for level-specific regressions (useful in high-dimensional case) |
... |
Additional arguments passed to |
Shrinkage can be imposed by targeting an explicit effective degrees of freedom.
Setting the argument kappa
to a value between 0
and 1
controls
the effective degrees of freedom of the fitted object as a percentage of .
When
kappa
is 1
the result is equivalent to the result from an ordinary
least squares regression (no shrinkage). Conversely, kappa
set to 0
represents maximum shrinkage.
When
kappa
is a percentage of .
If no kappa
is set, a linear regression with kappa = 1
is
estimated.
Hierarchical clustering is performed using hclust
. The default is set to
ward.D2 clustering but can be overridden by passing a method argument to ...
.
For high-dimensional problems, the hierarchy becomes very large. Setting q
to a value below 1
reduces the number of levels used in the hierarchy. q
represents a quantile-cutoff of the amount of
variation contributed by the levels. The default (q = NULL
) considers all levels.
When data exhibits multicollinearity it can be useful to include a penalty on the l2 norm in the level-specific regressions.
This can be achieved by setting the l2_penalty
parameter.
An 'hfr' regression object.
Johann Pfitzinger
Pfitzinger, Johann (2024). Cluster Regularization via a Hierarchical Feature Regression. _Econometrics and Statistics_ (in press). URL https://doi.org/10.1016/j.ecosta.2024.01.003.
cv.hfr
, se.avg
, coef
, plot
and predict
methods
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) coef(fit)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) coef(fit)
Plots the dendrogram of a fitted cv.hfr
model. The heights of the
levels in the dendrogram are given by a shrinkage vector, with a maximum (unregularized)
overall graph height of (the number of covariates in the regression).
Stronger shrinkage leads to a shallower hierarchy.
## S3 method for class 'cv.hfr' plot(x, kappa = NULL, show_details = TRUE, max_leaf_size = 3, ...)
## S3 method for class 'cv.hfr' plot(x, kappa = NULL, show_details = TRUE, max_leaf_size = 3, ...)
x |
Fitted 'cv.hfr' model. |
kappa |
The hyperparameter used for plotting. If empty, the optimal value is used. |
show_details |
print model details on the plot. |
max_leaf_size |
maximum size of the leaf nodes. Default is |
... |
additional methods passed to |
The dendrogram is generated using hierarchical clustering and modified
so that the height differential between any two splits is the shrinkage weight of
the lower split (ranging between 0
and 1
). With no shrinkage, all shrinkage weights
are equal to 1
and the dendrogram has a height of . With shrinkage
the dendrogram has a height of
.
The leaf nodes are colored to indicate the coefficient sign, with the size indicating the absolute magnitude of the coefficients.
A color bar on the right indicates the relative contribution of each level to the coefficient of determination, with darker hues representing a larger contribution.
A plotted dendrogram.
Johann Pfitzinger
cv.hfr
, predict
and coef
methods
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1)) plot(fit, kappa = 0.5)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1)) plot(fit, kappa = 0.5)
Plots the dendrogram of a fitted hfr
model. The heights of the
levels in the dendrogram are given by a shrinkage vector, with a maximum (unregularized)
overall graph height of (the number of covariates in the regression).
Stronger shrinkage leads to a shallower hierarchy.
## S3 method for class 'hfr' plot(x, show_details = TRUE, confidence_level = 0, max_leaf_size = 3, ...)
## S3 method for class 'hfr' plot(x, show_details = TRUE, confidence_level = 0, max_leaf_size = 3, ...)
x |
Fitted 'hfr' model. |
show_details |
print model details on the plot. |
confidence_level |
coefficients with a lower approximate statistical confidence are highlighted in the plot, see details. Default is |
max_leaf_size |
maximum size of the leaf nodes. Default is |
... |
additional methods passed to |
The dendrogram is generated using hierarchical clustering and modified
so that the height differential between any two splits is the shrinkage weight of
the lower split (ranging between 0
and 1
). With no shrinkage, all shrinkage weights
are equal to 1
and the dendrogram has a height of . With shrinkage
the dendrogram has a height of
.
The leaf nodes are colored to indicate the coefficient sign, with the size indicating the absolute magnitude of the coefficients.
The average standard errors along the branch of each coefficient can be used
to highlight coefficients that are not statistically significant. When
confidence_level > 0
, branches with a lower confidence are plotted
as dotted lines.
A color bar on the right indicates the relative contribution of each level to the coefficient of determination, with darker hues representing a larger contribution.
A plotted dendrogram.
Johann Pfitzinger
hfr
, se.avg
, predict
and coef
methods
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) plot(fit)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) plot(fit)
Predict values using a fitted cv.hfr
model
## S3 method for class 'cv.hfr' predict(object, newdata = NULL, kappa = NULL, ...)
## S3 method for class 'cv.hfr' predict(object, newdata = NULL, kappa = NULL, ...)
object |
Fitted 'cv.hfr' model. |
newdata |
Matrix or data.frame of new values for |
kappa |
The hyperparameter used for prediction. If empty, the optimal value is used. |
... |
additional methods passed to |
Predictions are made by multiplying the newdata
object with the estimated coefficients.
The chosen hyperparameter value to use for predictions can be passed to
the kappa
argument.
A vector of predicted values.
Johann Pfitzinger
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1)) predict(fit, kappa = 0.1)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1)) predict(fit, kappa = 0.1)
Predict values using a fitted hfr
model
## S3 method for class 'hfr' predict(object, newdata = NULL, ...)
## S3 method for class 'hfr' predict(object, newdata = NULL, ...)
object |
Fitted 'hfr' model. |
newdata |
Matrix or data.frame of new values for |
... |
additional methods passed to |
Predictions are made by multiplying the newdata
object with the estimated coefficients.
A vector of predicted values.
Johann Pfitzinger
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) predict(fit)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) predict(fit)
Print summary statistics for a fitted cv.hfr
model
## S3 method for class 'cv.hfr' print(x, ...)
## S3 method for class 'cv.hfr' print(x, ...)
x |
Fitted |
... |
additional methods passed to |
The call that produced the object x
is printed, following by a
data.frame
of summary statistics, including the effective degrees of freedom
of the model, the R.squared and the regularization parameter.
Summary statistics of HFR model
Johann Pfitzinger
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1)) print(fit)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1)) print(fit)
Print summary statistics for a fitted hfr
model
## S3 method for class 'hfr' print(x, ...)
## S3 method for class 'hfr' print(x, ...)
x |
Fitted |
... |
additional methods passed to |
The call that produced the object x
is printed, following by a
data.frame
of summary statistics, including the effective degrees of freedom
of the model, the R.squared and the regularization parameter.
Summary statistics of HFR model
Johann Pfitzinger
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) print(fit)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) print(fit)
This function computes the weighted average standard errors across levels using Burnham & Anderson (2004).
se.avg(object)
se.avg(object)
object |
Fitted |
The HFR computes linear regressions over several levels of an estimated hierarchy. By averaging the standard errors across hierarchical levels, an indication can be obtained about the average significance of the variables.
Standard errors are understated, since the uncertainty in the hierarchy estimation is not reflected.
A vector of standard errors.
Johann Pfitzinger
Pfitzinger, J. (2022). Cluster Regularization via a Hierarchical Feature Regression. arXiv 2107.04831[statML]
Burnham, K. P. and Anderson, D. R. (2004). Multimodel inference - understanding AIC and BIC in model selection. Sociological Methods & Research 33(2): 261-304.
hfr
method
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) se.avg(fit)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit = hfr(x, y, kappa = 0.5) se.avg(fit)