Title: | Regularized Linear Modeling with Tidy Data |
---|---|
Description: | An extension to the 'R' tidy data environment for automated machine learning. The package allows fitting and cross validation of linear regression and classification algorithms on grouped data. |
Authors: | Johann Pfitzinger [aut, cre] |
Maintainer: | Johann Pfitzinger <[email protected]> |
License: | GPL-3 |
Version: | 0.7.2 |
Built: | 2024-11-03 06:31:22 UTC |
Source: | https://github.com/jpfitzinger/tidyfit |
tidyfit
Fits an adaptive Lasso regression or classification on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'adalasso' .fit(self, data = NULL)
## S3 method for class 'adalasso' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
lambda
(L1 penalty)
lambda_ridge
(L2 penalty (default = 0.01) used in the first step to determine the penalty factor)
Important method arguments (passed to m
)
The adaptive Lasso is a weighted implementation of the Lasso algorithm, with covariate-specific weights obtained using an initial regression fit (in this case, a ridge regression with lambda = lambda_ridge
, where lambda_ridge
can be passed as an argument). The adaptive Lasso is computed using the glmnet::glmnet
function. See ?glmnet
for more details. For classification pass family = "binomial"
to ...
in m
or use classify
.
Implementation
Features are standardized by default with coefficients transformed to the original scale.
If no hyperparameter grid is passed (is.null(control$lambda)
), dials::grid_regular()
is used to determine a sensible default grid. The grid size is 100. Note that the grid selection tools provided by glmnet::glmnet
cannot be used (e.g. dfmax
). This is to guarantee identical grids across groups in the tibble.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Zou, H. (2006). The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101(476), 1418-1429.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL https://www.jstatsoft.org/v33/i01/.
.fit.lasso
, .fit.enet
, .fit.ridge
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("adalasso", Return ~ ., data, lambda = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("adalasso", lambda = c(0.1, 0.5)), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("adalasso", Return ~ ., data, lambda = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("adalasso", lambda = c(0.1, 0.5)), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Performs Analysis of Variance on a 'tidyFit' R6
class. The function can be used with regress
or classify
.
## S3 method for class 'anova' .fit(self, data = NULL)
## S3 method for class 'anova' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
The function provides a wrapper for stats::anova
. See ?anova
for more details.
First a glm
model is fitted which is passed to anova
.
A fitted 'tidyFit' class model.
Johann Pfitzinger
.fit.lm
, .fit.glm
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("anova", Return ~ `Mkt-RF` + HML + SMB, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("anova"), .mask = c("Date", "Industry")) tidyr::unnest(coef(fit), model_info)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("anova", Return ~ `Mkt-RF` + HML + SMB, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("anova"), .mask = c("Date", "Industry")) tidyr::unnest(coef(fit), model_info)
tidyfit
Fits a Bayesian regression on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'bayes' .fit(self, data = NULL)
## S3 method for class 'bayes' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
The function provides a wrapper for arm::bayesglm
. See ?bayesglm
for more details.
Implementation
No implementation notes
A fitted 'tidyFit' class model.
A 'tibble'.
Johann Pfitzinger
Gelman A, Su Y (2021). arm: Data Analysis Using Regression and Multilevel/Hierarchical Models. R package version 1.12-2, https://CRAN.R-project.org/package=arm.
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("bayes", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("bayes"), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("bayes", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("bayes"), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a Bayesian Lasso regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'blasso' .fit(self, data = NULL)
## S3 method for class 'blasso' .fit(self, data = NULL)
self |
a tidyFit R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
The function provides a wrapper for monomvn::blasso
. See ?blasso
for more details.
Implementation
Features are standardized by default with coefficients transformed to the original scale.
A fitted tidyFit class model.
Johann Pfitzinger
Gramacy RB, (qpgen2/quadprog) wFcfCMaubBAT (2023). monomvn: Estimation for MVN and Student-t Data with Monotone Missingness. R package version 1.9-17, https://CRAN.R-project.org/package=monomvn.
.fit.lasso
, .fit.bridge
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("blasso", Return ~ ., data, T = 100) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("blasso", T = 100), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("blasso", Return ~ ., data, T = 100) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("blasso", T = 100), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a Bayesian model averaging regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'bma' .fit(self, data = NULL)
## S3 method for class 'bma' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
iter
(number of iteration draws)
mcmc
(model sampler used (default 'bd'))
The function provides a wrapper for BMS::bms
. See ?bms
for more details.
Implementation
The underlying function automatically generates plotting output, which is not suppressed.
Use coef(fit)
to obtain posterior mean, standard deviation as well as posterior inclusion probabilities for the features.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Feldkircher, M. and S. Zeugner (2015). Bayesian Model Averaging Employing Fixed and Flexible Priors: The BMS Package for R, Journal of Statistical Software 68(4).
.fit.bayes
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("bma", Return ~ `Mkt-RF` + HML + SMB + RMW + CMA, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("bma"), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("bma", Return ~ `Mkt-RF` + HML + SMB + RMW + CMA, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("bma"), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a gradient boosting regression or classification on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'boost' .fit(self, data = NULL)
## S3 method for class 'boost' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
mstop
(number of boosting iterations)
nu
(step size)
Important method arguments (passed to m
)
The gradient boosting regression is performed using mboost::glmboost
. See ?glmboost
for more details.
Implementation
Features are standardized by default with coefficients transformed to the original scale.
If no hyperparameter grid is passed (is.null(control$mstop)
and is.null(control$nu)
), the default grid is used with mstop = c(100, 500, 1000, 5000)
and nu = c(0.01, 0.05, 0.1, 0.15, 0.2, 0.25)
.
A fitted 'tidyFit' class model.
A 'tibble'.
Johann Pfitzinger
T. Hothorn, P. Buehlmann, T. Kneib, M. Schmid, and B. Hofner (2022). mboost: Model-Based Boosting, R package version 2.9-7,https://CRAN.R-project.org/package=mboost.
m
method
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("boost", Return ~ ., data, nu = 0.1, mstop = 100) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("boost", nu = c(0.1, 0.05), mstop = 100), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("boost", Return ~ ., data, nu = 0.1, mstop = 100) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("boost", nu = c(0.1, 0.05), mstop = 100), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a Bayesian ridge regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'bridge' .fit(self, data = NULL)
## S3 method for class 'bridge' .fit(self, data = NULL)
self |
a tidyFit R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
The function provides a wrapper for monomvn::bridge
. See ?bridge
for more details.
Implementation
Features are standardized by default with coefficients transformed to the original scale.
A fitted tidyFit class model.
Johann Pfitzinger
Gramacy RB, (qpgen2/quadprog) wFcfCMaubBAT (2023). monomvn: Estimation for MVN and Student-t Data with Monotone Missingness. R package version 1.9-17, https://CRAN.R-project.org/package=monomvn.
.fit.ridge
, .fit.blasso
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("bridge", Return ~ ., data, T = 100) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("bridge", T = 100), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("bridge", Return ~ ., data, T = 100) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("bridge", T = 100), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Calculates Pearson's Chi-squared test on a 'tidyFit' R6
class. The function can be used with classify
.
## S3 method for class 'chisq' .fit(self, data = NULL)
## S3 method for class 'chisq' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
The function provides a wrapper for stats::chisq.test
. See ?chisq.test
for more details.
Implementation
Results can be viewed using coef
.
A fitted 'tidyFit' class model.
Johann Pfitzinger
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::mutate_at(data, dplyr::vars(-Date, -Industry), dplyr::ntile, n = 10) # Within 'classify' function fit <- classify(data, Return ~ ., m("chisq"), .mask = c("Date", "Industry")) tidyr::unnest(coef(fit), model_info)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::mutate_at(data, dplyr::vars(-Date, -Industry), dplyr::ntile, n = 10) # Within 'classify' function fit <- classify(data, Return ~ ., m("chisq"), .mask = c("Date", "Industry")) tidyr::unnest(coef(fit), model_info)
tidyfit
Calculates Pearson's correlation coefficient on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'cor' .fit(self, data = NULL)
## S3 method for class 'cor' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
The function provides a wrapper for stats::cor.test
. See ?cor.test
for more details.
Implementation
Results can be viewed using coef
.
A fitted 'tidyFit' class model.
Johann Pfitzinger
.fit.chisq
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("cor", Return ~ `Mkt-RF` + HML + SMB, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("cor"), .mask = c("Date", "Industry")) tidyr::unnest(coef(fit), model_info)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("cor", Return ~ `Mkt-RF` + HML + SMB, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("cor"), .mask = c("Date", "Industry")) tidyr::unnest(coef(fit), model_info)
tidyfit
Fits an ElasticNet regression or classification on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'enet' .fit(self, data = NULL)
## S3 method for class 'enet' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
lambda
(penalty)
alpha
(L1-L2 mixing parameter)
Important method arguments (passed to m
)
The ElasticNet regression is estimated using glmnet::glmnet
. See ?glmnet
for more details. For classification pass family = "binomial"
to ...
in m
or use classify
.
Implementation
If the response variable contains more than 2 classes, a multinomial response is used automatically.
An intercept is always included and features are standardized with coefficients transformed to the original scale.
If no hyperparameter grid is passed (is.null(control$lambda)
and is.null(control$alpha)
), dials::grid_regular()
is used to determine a sensible default grid. The grid size is 100 for lambda
and 5 for alpha
. Note that the grid selection tools provided by glmnet::glmnet
cannot be used (e.g. dfmax
). This is to guarantee identical grids across groups in the tibble.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL https://www.jstatsoft.org/v33/i01/.
.fit.lasso
, .fit.adalasso
, .fit.ridge
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("enet", Return ~ ., data, lambda = c(0, 0.1), alpha = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("enet", alpha = c(0, 0.5), lambda = c(0.1)), .mask = c("Date", "Industry"), .cv = "vfold_cv") coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("enet", Return ~ ., data, lambda = c(0, 0.1), alpha = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("enet", alpha = c(0, 0.5), lambda = c(0.1)), .mask = c("Date", "Industry"), .cv = "vfold_cv") coef(fit)
tidyfit
Fits a linear regression with variable selection using a genetic algorithm on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'genetic' .fit(self, data = NULL)
## S3 method for class 'genetic' .fit(self, data = NULL)
self |
a tidyFit R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
statistic
populationSize
numGenerations
minVariables
maxVariables
The function provides a wrapper for gaselect::genAlg
. See ?genAlg
for more details.
Implementation
Control arguments are passed to gaselect::genAlgControl (the function automatically identifies which arguments are for the control object, and which for gaselect::genAlg).
gaselect::evaluatorLM is used as the evaluator with the relevant arguments automatically identified by the function.
A fitted tidyFit class model.
Johann Pfitzinger
Kepplinger D (2023). gaselect: Genetic Algorithm (GA) for Variable Selection from High-Dimensional Data. R package version 1.0.21, https://CRAN.R-project.org/package=gaselect.
.fit.lm
, .fit.bayes
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("genetic", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("genetic"), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("genetic", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("genetic"), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a general-to-specific (GETS) regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'gets' .fit(self, data = NULL)
## S3 method for class 'gets' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
max.paths
(Number of paths to search)
The function provides a wrapper for gets::gets
. See ?gets
for more details.
Implementation
Print output is suppressed by default. Use 'print.searchinfo = TRUE' for print output.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Pretis F, Reade JJ, Sucarrat G (2018). Automated General-to-Specific (GETS) Regression Modeling and Indicator Saturation for Outliers and Structural Breaks. Journal of Statistical Software 86(3), 1-44.
.fit.robust
, .fit.glm
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("gets", Return ~ `Mkt-RF` + HML + SMB, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("gets"), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("gets", Return ~ `Mkt-RF` + HML + SMB, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("gets"), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a linear or logistic regression on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'glm' .fit(self, data = NULL)
## S3 method for class 'glm' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
The function provides a wrapper for stats::glm
. See ?glm
for more details.
Implementation
No implementation notes
A fitted 'tidyFit' class model.
Johann Pfitzinger
# Load data data <- tidyfit::Factor_Industry_Returns data$Return <- ifelse(data$Return > 0, 1, 0) # Stand-alone function fit <- m("glm", Return ~ ., data) fit # Within 'classify' function fit <- classify(data, Return ~ ., m("glm"), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns data$Return <- ifelse(data$Return > 0, 1, 0) # Stand-alone function fit <- m("glm", Return ~ ., data) fit # Within 'classify' function fit <- classify(data, Return ~ ., m("glm"), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a linear or logistic mixed-effects model (GLMM) on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'glmm' .fit(self, data = NULL)
## S3 method for class 'glmm' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
The function provides a wrapper for lme4::glmer
. See ?glmer
for more details.
Implementation
No implementation notes
A fitted 'tidyFit' class model.
Johann Pfitzinger
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
# Load data data <- tidyfit::Factor_Industry_Returns data$Return <- ifelse(data$Return > 0, 1, 0) # Estimate model with random effects fit <- classify(data, Return ~ CMA + (CMA | Industry), logit = m("glmm"), .mask = "Date") fit
# Load data data <- tidyfit::Factor_Industry_Returns data$Return <- ifelse(data$Return > 0, 1, 0) # Estimate model with random effects fit <- classify(data, Return ~ CMA + (CMA | Industry), logit = m("glmm"), .mask = "Date") fit
tidyfit
Fits a linear regression or classification with a grouped L1 penalty on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'group_lasso' .fit(self, data = NULL)
## S3 method for class 'group_lasso' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
lambda
(L1 penalty)
Important method arguments (passed to m
)
The Group Lasso regression is estimated using gglasso::gglasso
. The 'group' argument is a named vector passed directly to m()
(see examples). See ?gglasso
for more details. Only binomial classification is possible. Weights are ignored for classification.
Implementation
Features are standardized by default with coefficients transformed to the original scale.
If no hyperparameter grid is passed (is.null(control$lambda)
), dials::grid_regular()
is used to determine a sensible default grid. The grid size is 100. Note that the grid selection tools provided by gglasso::gglasso
cannot be used (e.g. dfmax
). This is to guarantee identical grids across groups in the tibble.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Yang Y, Zou H, Bhatnagar S (2020). gglasso: Group Lasso Penalized Learning Using a Unified BMD Algorithm. R package version 1.5, https://CRAN.R-project.org/package=gglasso.
.fit.lasso
, .fit.blasso
, .fit.adalasso
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns groups <- setNames(c(1, 2, 2, 3, 3, 1), c("Mkt-RF", "SMB", "HML", "RMW", "CMA", "RF")) # Stand-alone function fit <- m("group_lasso", Return ~ ., data, lambda = 0.5, group = groups) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("group_lasso", lambda = c(0.1, 0.5), group = groups), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns groups <- setNames(c(1, 2, 2, 3, 3, 1), c("Mkt-RF", "SMB", "HML", "RMW", "CMA", "RF")) # Stand-alone function fit <- m("group_lasso", Return ~ ., data, lambda = 0.5, group = groups) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("group_lasso", lambda = c(0.1, 0.5), group = groups), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a hierarchical feature regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'hfr' .fit(self, data = NULL)
## S3 method for class 'hfr' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
kappa (proportional size of regression graph)
Important method arguments (passed to m
)
The hierarchical feature regression is estimated using the hfr::cv.hfr
function. See ?cv.hfr
for more details.
Implementation
Features are standardized by default with coefficients transformed to the original scale.
If no hyperparameter grid is provided (is.null(control$kappa)
), the default is seq(0, 1, by = 0.1)
.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Pfitzinger J (2022). hfr: Estimate Hierarchical Feature Regression Models. R package version 0.5.0, https://CRAN.R-project.org/package=hfr.
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("hfr", Return ~ ., data, kappa = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("hfr", kappa = c(0.1, 0.5)), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("hfr", Return ~ ., data, kappa = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("hfr", kappa = c(0.1, 0.5)), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a linear regression or classification with L1 penalty on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'lasso' .fit(self, data = NULL)
## S3 method for class 'lasso' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
lambda
(L1 penalty)
Important method arguments (passed to m
)
The Lasso regression is estimated using glmnet::glmnet
with alpha = 1
. See ?glmnet
for more details. For classification pass family = "binomial"
to ...
in m
or use classify
.
Implementation
If the response variable contains more than 2 classes, a multinomial response is used automatically.
Features are standardized by default with coefficients transformed to the original scale.
If no hyperparameter grid is passed (is.null(control$lambda)
), dials::grid_regular()
is used to determine a sensible default grid. The grid size is 100. Note that the grid selection tools provided by glmnet::glmnet
cannot be used (e.g. dfmax
). This is to guarantee identical grids across groups in the tibble.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL https://www.jstatsoft.org/v33/i01/.
.fit.enet
, .fit.ridge
, .fit.adalasso
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("lasso", Return ~ ., data, lambda = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("lasso", lambda = c(0.1, 0.5)), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("lasso", Return ~ ., data, lambda = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("lasso", lambda = c(0.1, 0.5)), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a linear regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'lm' .fit(self, data = NULL)
## S3 method for class 'lm' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
The function provides a wrapper for stats::lm
. See ?lm
for more details.
Implementation
An argument vcov.
can be passed in control or to ...
in m
to estimate the model with robust standard errors. vcov.
can be one of "BS", "HAC", "HC" and "OPG" and is passed to the sandwich
package.
A fitted 'tidyFit' class model.
Johann Pfitzinger
.fit.robust
, .fit.glm
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("lm", Return ~ `Mkt-RF` + HML + SMB, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("lm"), .mask = c("Date", "Industry")) coef(fit) # With robust standard errors fit <- m("lm", Return ~ `Mkt-RF` + HML + SMB, data, vcov. = "HAC") fit
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("lm", Return ~ `Mkt-RF` + HML + SMB, data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("lm"), .mask = c("Date", "Industry")) coef(fit) # With robust standard errors fit <- m("lm", Return ~ `Mkt-RF` + HML + SMB, data, vcov. = "HAC") fit
tidyfit
Selects features for continuous or (ordered) factor data using MRMR on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'mrmr' .fit(self, data = NULL)
## S3 method for class 'mrmr' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
feature_count (number of features to select)
solution_count (ensemble size)
The MRMR algorithm is estimated using the mRMRe::mRMR.ensemble
function. See ?mRMR.ensemble
for more details.
Implementation
Use with regress
for regression problems and with classify
for classification problems. The selected features can be obtained using coef
.
The MRMR objects have no predict
and related methods.
A fitted 'tidyFit' class model.
Johann Pfitzinger
De Jay N, Papillon-Cavanagh S, Olsen C, Bontempi G and Haibe-Kains B (2012). mRMRe: an R package for parallelized mRMR ensemble feature selection.
m
methods
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, SMB, HML, RMW, CMA, Return) ## Not run: fit <- m("mrmr", Return ~ ., data, feature_count = 2) # Retrieve selected features coef(fit) ## End(Not run)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, SMB, HML, RMW, CMA, Return) ## Not run: fit <- m("mrmr", Return ~ ., data, feature_count = 2) # Retrieve selected features coef(fit) ## End(Not run)
tidyfit
Fits a Markov-Switching regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'mslm' .fit(self, data = NULL)
## S3 method for class 'mslm' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
k
(the number of regimes)
sw
(logical vector indicating which coefficients switch)
control
(additional fitting parameters)
The function provides a wrapper for MSwM::msmFit
. See ?msmFit
for more details.
Implementation
Note that only the regression method with 'lm' is implemented at this stage.
An argument index_col
can be passed, which allows a custom index to be added to coef(m("mslm"))
(e.g. a date index).
If no sw
argument is passed, all coefficients are permitted to switch between regimes.“
A fitted 'tidyFit' class model.
Johann Pfitzinger
Sanchez-Espigares JA, Lopez-Moreno A (2021). MSwM: Fitting Markov Switching Models. R package version 1.5, https://CRAN.R-project.org/package=MSwM.
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec", Date >= 201801) data <- dplyr::select(data, -Industry) ctr <- list(maxiter = 100, parallelization = FALSE) # Stand-alone function fit <- m("mslm", Return ~ HML, data, index_col = "Date", k = 2, control = ctr) fit # Within 'regress' function fit <- regress(data, Return ~ HML, m("mslm", index_col = "Date", k = 2, control = ctr)) tidyr::unnest(coef(fit), model_info)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec", Date >= 201801) data <- dplyr::select(data, -Industry) ctr <- list(maxiter = 100, parallelization = FALSE) # Stand-alone function fit <- m("mslm", Return ~ HML, data, index_col = "Date", k = 2, control = ctr) fit # Within 'regress' function fit <- regress(data, Return ~ HML, m("mslm", index_col = "Date", k = 2, control = ctr)) tidyr::unnest(coef(fit), model_info)
tidyfit
Fits a single-hidden-layer neural network regression on a 'tidyFit' R6
class.
The function can be used with regress
and classify
.
## S3 method for class 'nnet' .fit(self, data = NULL)
## S3 method for class 'nnet' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
size
(number of units in the hidden layer)
decay
(parameter for weight decay)
maxit
(maximum number of iterations)
Important method arguments (passed to m
)
The function provides a wrapper for nnet::nnet.formula
. See ?nnet
for more details.
Implementation
For regress
, linear output units (linout=True
) are used, while classify
implements
the default logic of nnet
(entropy=TRUE
for 2 target classes and softmax=TRUE
for more classes).
A fitted 'tidyFit' class model.
Phil Holzmeister
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("nnet", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("nnet", decay=0.5, size = 8), .mask = c("Date", "Industry")) # Within 'classify' function fit <- classify(iris, Species ~ ., m("nnet", decay=0.5, size = 8))
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("nnet", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("nnet", decay=0.5, size = 8), .mask = c("Date", "Industry")) # Within 'classify' function fit <- classify(iris, Species ~ ., m("nnet", decay=0.5, size = 8))
tidyfit
Fits a principal components regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'pcr' .fit(self, data = NULL)
## S3 method for class 'pcr' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
ncomp
(number of components)
ncomp_pct
(number of components, percentage of features)
Important method arguments (passed to m
)
The principal components regression is fitted using pls
package. See ?pcr
for more details.
Implementation
Covariates are standardized, with coefficients back-transformed to the original scale. An intercept is always included.
If no hyperparameter grid is passed (is.null(control$ncomp) & is.null(control$ncomp_pct)
), the default is ncomp_pct = seq(0, 1, length.out = 20)
, where 0 results in one component and 1 results in the number of features.
When 'jackknife = TRUE' is passed (and a 'validation' method is chosen), coef
also returns the jack-knife standard errors, t-statistics and p-values.
Note that at present pls
does not offer weighted implementations or non-gaussian response. The method can therefore only be used with regress
A fitted 'tidyFit' class model.
Johann Pfitzinger
Liland K, Mevik B, Wehrens R (2022). pls: Partial Least Squares and Principal Component Regression. R package version 2.8-1, https://CRAN.R-project.org/package=pls.
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Industry) # Stand-alone function fit <- m("pcr", Return ~ ., data, ncomp = 1:3) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("pcr", jackknife = TRUE, validation = "LOO", ncomp_pct = 0.5), .mask = c("Date")) tidyr::unnest(coef(fit), model_info)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Industry) # Stand-alone function fit <- m("pcr", Return ~ ., data, ncomp = 1:3) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("pcr", jackknife = TRUE, validation = "LOO", ncomp_pct = 0.5), .mask = c("Date")) tidyr::unnest(coef(fit), model_info)
tidyfit
Fits a partial least squares regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'plsr' .fit(self, data = NULL)
## S3 method for class 'plsr' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
ncomp
(number of components)
ncomp_pct
(number of components, percentage of features)
Important method arguments (passed to m
)
The partial least squares regression is fitted using pls
package. See ?plsr
for more details.
Implementation
Covariates are standardized, with coefficients back-transformed to the original scale. An intercept is always included.
If no hyperparameter grid is passed (is.null(control$ncomp) & is.null(control$ncomp_pct)
), the default is ncomp_pct = seq(0, 1, length.out = 20)
, where 0 results in one component and 1 results in the number of features.
When 'jackknife = TRUE' is passed (and a 'validation' method is chosen), coef
also returns the jack-knife standard errors, t-statistics and p-values.
Note that at present pls
does not offer weighted implementations or non-gaussian response. The method can therefore only be used with regress
A fitted 'tidyFit' class model.
Johann Pfitzinger
Liland K, Mevik B, Wehrens R (2022). pls: Partial Least Squares and Principal Component Regression. R package version 2.8-1, https://CRAN.R-project.org/package=pls.
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Industry) # Stand-alone function fit <- m("plsr", Return ~ ., data, ncomp = 1:3) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("pcr", jackknife = TRUE, validation = "LOO", ncomp_pct = 0.5), .mask = c("Date")) tidyr::unnest(coef(fit), model_info)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Industry) # Stand-alone function fit <- m("plsr", Return ~ ., data, ncomp = 1:3) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("pcr", jackknife = TRUE, validation = "LOO", ncomp_pct = 0.5), .mask = c("Date")) tidyr::unnest(coef(fit), model_info)
tidyfit
Fits a linear quantile regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'quantile' .fit(self, data = NULL)
## S3 method for class 'quantile' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
tau
(the quantile(s) to be estimated)
The function provides a wrapper for quantreg::rq
. See ?rq
for more details. The argument tau
is the chosen quantile (default tau = 0.5
).
Implementation
No implementation notes
A fitted 'tidyFit' class model.
Johann Pfitzinger
Koenker R (2022). quantreg: Quantile Regression. R package version 5.94, https://CRAN.R-project.org/package=quantreg.
.fit.lm
, .fit.bayes
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns fit <- regress(data, Return ~ ., m("quantile", tau = c(0.1, 0.5, 0.9)), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns fit <- regress(data, Return ~ ., m("quantile", tau = c(0.1, 0.5, 0.9)), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a nonlinear quantile regression forest on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'quantile_rf' .fit(self, data = NULL)
## S3 method for class 'quantile_rf' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
ntree (number of trees)
mtry (number of variables randomly sampled at each split)
Important method arguments (passed to m
)
tau
(the quantile(s) to be estimated)
The function provides a wrapper for quantregForest::quantregForest
. See ?quantregForest
for more details.
The argument tau
is the chosen quantile (default tau = 0.5
).
tau
is passed directly to m('quantile_rf', tau = c(0.1, 0.5, 0.9)
and is not passed to predict
as in the quantregForest::quantregForest
package. This is done to ensure a consistent interface with the quantile regression from quantreg
.
Implementation
No implementation notes
A fitted 'tidyFit' class model.
Johann Pfitzinger
Meinshausen N (2017). quantregForest: Quantile Regression Forests. R package version 1.3-7, https://CRAN.R-project.org/package=quantregForest.
.fit.quantile
, .fit.rf
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Date, -Industry) # Stand-alone function fit <- m("quantile_rf", Return ~ ., data, tau = 0.5, ntree = 50) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("quantile_rf", tau = c(0.1, 0.5, 0.9), ntree = 50)) explain(fit)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Date, -Industry) # Stand-alone function fit <- m("quantile_rf", Return ~ ., data, tau = 0.5, ntree = 50) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("quantile_rf", tau = c(0.1, 0.5, 0.9), ntree = 50)) explain(fit)
tidyfit
Selects features for continuous or factor data using ReliefF on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'relief' .fit(self, data = NULL)
## S3 method for class 'relief' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
estimator (selection algorithm to use (default is 'ReliefFequalK'))
The ReliefF algorithm is estimated using the CORElearn::attrEval
function. See ?attrEval
for more details.
Implementation
Use with regress
for regression problems and with classify
for classification problems. coef
returns the score for each feature. Select the required number of features with the largest scores.
The Relief objects have no predict
and related methods.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Robnik-Sikonja M, Savicky P (2021). CORElearn: Classification, Regression and Feature Evaluation. R package version 1.56.0, https://CRAN.R-project.org/package=CORElearn.
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Date, -Industry) # Stand-alone function fit <- m("relief", Return ~ ., data) coef(fit) # Within 'regress' function fit <- regress(data, Return ~ ., m("relief")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Date, -Industry) # Stand-alone function fit <- m("relief", Return ~ ., data) coef(fit) # Within 'regress' function fit <- regress(data, Return ~ ., m("relief")) coef(fit)
tidyfit
Fits a random forest on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'rf' .fit(self, data = NULL)
## S3 method for class 'rf' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
ntree (number of trees)
mtry (number of variables randomly sampled at each split)
Important method arguments (passed to m
)
The function provides a wrapper for randomForest::randomForest
. See ?randomForest
for more details.
Implementation
The random forest is always fit with importance = TRUE
. The feature importance values are extracted using coef()
.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Liaw, A. and Wiener, M. (2002). Classification and Regression by randomForest. R News 2(3), 18–22.
.fit.svm
, .fit.boost
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Date, -Industry) # Stand-alone function fit <- m("rf", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("rf")) explain(fit)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Date, -Industry) # Stand-alone function fit <- m("rf", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("rf")) explain(fit)
tidyfit
Fits a linear regression or classification with L2 penalty on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'ridge' .fit(self, data = NULL)
## S3 method for class 'ridge' .fit(self, data = NULL)
self |
a tidyFit R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
lambda
(L2 penalty)
Important method arguments (passed to m
)
The ridge regression is estimated using glmnet::glmnet
with alpha = 0
. See ?glmnet
for more details. For classification pass family = "binomial"
to ...
in m
or use classify
.
Implementation
If the response variable contains more than 2 classes, a multinomial response is used automatically.
Features are standardized by default with coefficients transformed to the original scale.
If no hyperparameter grid is passed (is.null(control$lambda)
), dials::grid_regular()
is used to determine a sensible default grid. The grid size is 100. Note that the grid selection tools provided by glmnet::glmnet
cannot be used (e.g. dfmax
). This is to guarantee identical grids across groups in the tibble.
A fitted tidyFit class model.
Johann Pfitzinger
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL https://www.jstatsoft.org/v33/i01/.
.fit.lasso
, .fit.adalasso
, .fit.enet
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("ridge", Return ~ ., data, lambda = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("ridge", lambda = c(0.1, 0.5)), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("ridge", Return ~ ., data, lambda = 0.5) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("ridge", lambda = c(0.1, 0.5)), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a robust linear regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'robust' .fit(self, data = NULL)
## S3 method for class 'robust' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
method
(estimation algorithm, e.g. 'M', 'MM')
The function provides a wrapper for MASS::rlm
. See ?rlm
for more details.
Implementation“
An argument vcov.
can be passed in control or to ...
in m
to estimate the model with robust standard errors. vcov.
can be one of "BS", "HAC", "HC" and "OPG" and is passed to the sandwich
package.
A fitted 'tidyFit' class model.
Johann Pfitzinger
W. N. Venables and B. D. Ripley (2002).
Modern Applied Statistics with S. 4th ed., Springer, New York.
URL https://www.stats.ox.ac.uk/pub/MASS4/.
# Load data data <- tidyfit::Factor_Industry_Returns fit <- regress(data, Return ~ ., m("robust"), .mask = c("Date", "Industry")) coef(fit) # With robust standard errors fit <- m("robust", Return ~ `Mkt-RF` + HML + SMB, data, vcov. = "HAC") tidyr::unnest(coef(fit), model_info)
# Load data data <- tidyfit::Factor_Industry_Returns fit <- regress(data, Return ~ ., m("robust"), .mask = c("Date", "Industry")) coef(fit) # With robust standard errors fit <- m("robust", Return ~ `Mkt-RF` + HML + SMB, data, vcov. = "HAC") tidyr::unnest(coef(fit), model_info)
tidyfit
Fits a Bayesian Spike and Slab regression or classification on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'spikeslab' .fit(self, data = NULL)
## S3 method for class 'spikeslab' .fit(self, data = NULL)
self |
a tidyFit R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
In the case of regression, arguments are passed to BoomSpikeSlab::lm.spike
and BoomSpikeSlab::SpikeSlabPrior
. Check those functions for details.
BoomSpikeSlab::SpikeSlabPrior
expected.r2
prior.df
expected.model.size
BoomSpikeSlab::lm.spike
niter
In the case of classification, arguments are passed to BoomSpikeSlab::logit.spike
and BoomSpikeSlab::SpikeSlabGlmPrior
. Check those functions for details.
BoomSpikeSlab::logit.spike
niter
I advise against the use of BoomSpikeSlab::SpikeSlabGlmPrior
at the moment, since it appears to be buggy.
The function provides wrappers for BoomSpikeSlab::lm.spike
and BoomSpikeSlab::logit.spike
. See ?lm.spike
and ?logit.spike
for more details.
Implementation
Prior arguments are passed to BoomSpikeSlab::SpikeSlabPrior
and BoomSpikeSlab::SpikeSlabGlmPrior
(the function automatically identifies which arguments are for the prior, and which for BoomSpikeSlab::lm.spike
or BoomSpikeSlab::logit.spike
).
BoomSpikeSlab::logit.spike
is automatically selected when using classify
.
A fitted tidyFit class model.
Johann Pfitzinger
Scott SL (2022). BoomSpikeSlab: MCMC for Spike and Slab Regression. R package version 1.2.5, https://CRAN.R-project.org/package=BoomSpikeSlab.
.fit.lasso
, .fit.blasso
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("spikeslab", Return ~ ., data, niter = 100) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("spikeslab", niter = 100), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("spikeslab", Return ~ ., data, niter = 100) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("spikeslab", niter = 100), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a best subset regression or classification on a 'tidyFit' R6
class. The function can be used with regress
and classify
.
## S3 method for class 'subset' .fit(self, data = NULL)
## S3 method for class 'subset' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
method
(e.g. 'forward', 'backward')
IC
(information criterion, e.g. 'AIC')
The best subset regression is estimated using bestglm::bestglm
which is a wrapper around leaps::regsubsets
for the regression case, and performs an exhaustive search for the classification case. See ?bestglm
for more details.
Implementation
Forward or backward selection can be performed by passing method = "forward"
or method = "backward"
to m
.
A fitted 'tidyFit' class model.
Johann Pfitzinger
A.I. McLeod, Changjiang Xu and Yuanhao Lai (2020).
bestglm: Best Subset GLM and Regression Utilities.
R package version 0.37.3. URL https://CRAN.R-project.org/package=bestglm.
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("subset", Return ~ ., data, method = c("forward", "backward")) tidyr::unnest(fit, settings) # Within 'regress' function fit <- regress(data, Return ~ ., m("subset", method = "forward"), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("subset", Return ~ ., data, method = c("forward", "backward")) tidyr::unnest(fit, settings) # Within 'regress' function fit <- regress(data, Return ~ ., m("subset", method = "forward"), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a support vector regression or classification on a 'tidyFit' R6
class. The function can be used with regress
or classify
.
## S3 method for class 'svm' .fit(self, data = NULL)
## S3 method for class 'svm' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
cost (cost of constraint violation)
epsilon (epsilon in the insensitive-loss function)
Important method arguments (passed to m
)
The function provides a wrapper for e1071::svm
. See ?svm
for more details.
Implementation
The default value for the kernel
argument is set to 'linear'. If set to a different value, no coefficients will be returned.
A fitted 'tidyFit' class model.
Johann Pfitzinger
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2022). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-12, https://CRAN.R-project.org/package=e1071.
.fit.boost
, .fit.lasso
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") # Stand-alone function fit <- m("svm", Return ~ `Mkt-RF` + HML + SMB, data, cost = 0.1) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("svm", cost = 0.1), .mask = c("Date", "Industry")) coef(fit)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") # Stand-alone function fit <- m("svm", Return ~ `Mkt-RF` + HML + SMB, data, cost = 0.1) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("svm", cost = 0.1), .mask = c("Date", "Industry")) coef(fit)
tidyfit
Fits a Bayesian time-varying regression on a 'tidyFit' R6
class. The function can be used with regress
.
## S3 method for class 'tvp' .fit(self, data = NULL)
## S3 method for class 'tvp' .fit(self, data = NULL)
self |
a 'tidyFit' R6 class. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
Hyperparameters:
None. Cross validation not applicable.
Important method arguments (passed to m
)
mod_type
niter
(number of MCMC iterations)
The function provides a wrapper for shrinkTVP::shrinkTVP
. See ?shrinkTVP
for more details.
Implementation
An argument index_col
can be passed, which allows a custom index to be added to coef(m("tvp"))
(e.g. a date index, see Examples).
A fitted 'tidyFit' class model.
Johann Pfitzinger
Peter Knaus, Angela Bitto-Nemling, Annalisa Cadonna and Sylvia Frühwirth-Schnatter (2021).
Shrinkage in the Time-Varying Parameter Model Framework Using the R Package shrinkTVP.
Journal of Statistical Software 100(13), 1–32.
doi:10.18637/jss.v100.i13.
.fit.bayes
, .fit.mslm
and m
methods
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Industry) # Within 'regress' function (using low niter for illustration) fit <- regress(data, Return ~ ., m("tvp", niter = 50, index_col = "Date")) tidyr::unnest(coef(fit), model_info)
# Load data data <- tidyfit::Factor_Industry_Returns data <- dplyr::filter(data, Industry == "HiTec") data <- dplyr::select(data, -Industry) # Within 'regress' function (using low niter for illustration) fit <- regress(data, Return ~ ., m("tvp", niter = 50, index_col = "Date")) tidyr::unnest(coef(fit), model_info)
This function is a wrapper to fit many different types of linear
classification models on a (grouped) tibble
.
.data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). The data frame can be grouped. |
formula |
an object of class "formula": a symbolic description of the model to be fitted. |
... |
name-function pairs of models to be estimated. See 'Details'. |
.cv |
type of 'rsample' cross validation procedure to use to determine optimal hyperparameter values. Default is |
.cv_args |
additional settings to pass to the 'rsample' cross validation function. |
.weights |
optional name of column containing sample weights. |
.mask |
optional vector of columns names to ignore. Can be useful when using 'y ~ .' formula syntax. |
.return_slices |
logical. Should the output of individual cross validation slices be returned or only the final fit. Default is |
.return_grid |
logical. Should the output of the individual hyperparameter grids be returned or only the best fitting set of hyperparameters. Default is |
.tune_each_group |
logical. Should optimal hyperparameters be selected for each group or once across all groups. Default is |
.force_cv |
logical. Should models be evaluated across all cross validation slices, even if no hyperparameters are tuned. Default is |
classify
fits all models passed in ...
using the m
function. The models can be passed as name-function pairs (e.g. ols = m("lm")
) or without including a name.
Hyperparameters are tuned automatically using the '.cv' and '.cv_args' arguments, or can be passed to m()
(e.g. lasso = m("lasso", lambda = 0.5)
). See the individual model functions (?m()
) for an overview of hyperparameters.
Cross validation is performed using the 'rsample' package with possible methods including
'initial_split' (simple train-test split)
'initial_time_split' (train-test split with retained order)
'vfold_cv' (aka kfold cross validation)
'loo_cv' (leave-one-out)
'rolling_origin' (generalized time series cross validation, e.g. rolling or expanding windows)
'sliding_window', 'sliding_index', 'sliding_period' (specialized time series splits)
'bootstraps'
'group_vfold_cv', 'group_bootstraps'
See package documentation for 'rsample' for all available methods.
The negative log loss is used to validate performance in the cross validation.
Note that arguments for weights are automatically passed to the functions by setting the '.weights' argument. Weights are also considered during cross validation by calculating weighted versions of the cross validation loss function.
classify
can handle both binomial and multinomial response distributions, however not all underlying methods are capable of handling a multinomial response.
A tidyfit.models
frame containing model details for each group.
The 'tidyfit.models' frame consists of 4 different components:
A group of identifying columns (e.g. model name, data groups, grid IDs)
A 'model_object' column, which contains the fitted model.
A nested 'settings' column containing model arguments and hyperparameters
Columns showing errors, warnings and messages (if applicable)
Coefficients, predictions, fitted values or residuals can be accessed using the built-in coef
, predict
, fitted
and resid
methods. Note that all coefficients are transformed to ensure comparability across methods.
Johann Pfitzinger
regress
, coef.tidyfit.models
and predict.tidyfit.models
method
data <- tidyfit::Factor_Industry_Returns data <- dplyr::mutate(data, Return = ifelse(Return > 0, 1, 0)) fit <- classify(data, Return ~ ., m("lasso", lambda = c(0.001, 0.1)), .mask = c("Date", "Industry")) # Print the models frame tidyr::unnest(fit, settings) # View coefficients coef(fit)
data <- tidyfit::Factor_Industry_Returns data <- dplyr::mutate(data, Return = ifelse(Return > 0, 1, 0)) fit <- classify(data, Return ~ ., m("lasso", lambda = c(0.001, 0.1)), .mask = c("Date", "Industry")) # Print the models frame tidyr::unnest(fit, settings) # View coefficients coef(fit)
tidyfit.models
frameThe function extracts and prepares coefficients from all models in a tidyfit.models
frame and outputs a tidy frame of estimates.
## S3 method for class 'tidyfit.models' coef( object, ..., .add_bootstrap_interval = FALSE, .bootstrap_alpha = 0.05, .keep_grid_id = FALSE )
## S3 method for class 'tidyfit.models' coef( object, ..., .add_bootstrap_interval = FALSE, .bootstrap_alpha = 0.05, .keep_grid_id = FALSE )
object |
|
... |
currently not used |
.add_bootstrap_interval |
calculate bootstrap intervals for the parameters. See 'Details'. |
.bootstrap_alpha |
confidence level used for the bootstrap interval. Default is |
.keep_grid_id |
boolean. By default the grid ID column is dropped, if there is only one unique setting per model or group. |
The function uses the 'model_object' column in a tidyfit.model
frame to return a data frame of estimated coefficients.
Results are 'tidied' using broom::tidy
whenever possible.
All coefficients are transformed to ensure statistical comparability. For instance, standardized coefficients are always transformed back to the original data scale, naming conventions are harmonized etc.
Bootstrap intervals can be calculated using rsample::int_pctl
. Only set .add_bootstrap_interval = TRUE
if you are using .cv = "bootstraps"
in combination with .return_slices = TRUE
to generate the model frame.
A 'tibble'.
Johann Pfitzinger
predict.tidyfit.models
, fitted.tidyfit.models
and residuals.tidyfit.models
data <- tidyfit::Factor_Industry_Returns fit <- regress(data, Return ~ ., m("lm"), .mask = c("Date", "Industry")) coef(fit)
data <- tidyfit::Factor_Industry_Returns fit <- regress(data, Return ~ ., m("lm"), .mask = c("Date", "Industry")) coef(fit)
A generic method for calculating XAI and variable importance methods for tidyfit.models frames.
explain(object, use_package = NULL, use_method = NULL, ...)
explain(object, use_package = NULL, use_method = NULL, ...)
object |
|
use_package |
the package to use to calculate variable importance. See 'Details' for possible options. |
use_method |
the method from 'use_package' that should be used to calculate variable importance. |
... |
additional arguments passed to the importance method |
WARNING This function is currently in an experimental stage.
The function uses the 'model_object' column in a tidyfit.model
frame to return variable importance measures for each model.
Possible packages and methods include:
sensitivity
package:The package provides methods to assess variable importance in linear regressions ('lm') and classifications ('glm').
Usage: use_package="sensitivity"
Methods:
"lmg" (Shapley regression),
"pmvd" (Proportional marginal variance decomposition),
"src" (standardized regression coefficients),
"pcc" (partial correlation coefficients),
"johnson" (Johnson indices)
See ?sensitivity::lmg
for more information and additional arguments.
iml
package:Integration with iml is currently in progress. The methods can be used for 'nnet', 'rf', 'lasso', 'enet', 'ridge', 'adalasso', 'glm' and 'lm'.
Usage: use_package="iml"
Methods:
"Shapley" (SHAP values)
"LocalModel" (LIME)
"FeatureImp" (Permutation-based feature importance)
The argument 'which_rows' (vector of integer indexes) can be used to explain specific rows in the data set for Shapley and LocalModel methods.
randomForest
package:This uses the native importance method of the randomForest package and can be used with 'rf' and 'quantile_rf' regression and classification.
Usage: use_package="randomForest"
Methods:
"mean_decrease_accuracy"
partimp
package:This is currently still experimental. Documentation will be provided when the package is more stable.
A 'tibble'.
Johann Pfitzinger
Molnar C, Bischl B, Casalicchio G (2018). “iml: An R package for Interpretable Machine Learning.” JOSS, 3(26), 786. doi:10.21105/joss.00786.
Iooss B, Veiga SD, Janon A, Pujol G, Broto wcfB, Boumhaout K, Clouvel L, Delage T, Amri RE, Fruth J, Gilquin L, Guillaume J, Herin M, Idrissi MI, Le Gratiet L, Lemaitre P, Marrel A, Meynaoui A, Nelson BL, Monari F, Oomen R, Rakovec O, Ramos B, Rochet P, Roustant O, Sarazin G, Song E, Staum J, Sueur R, Touati T, Verges V, Weber F (2024). sensitivity: Global Sensitivity Analysis of Model Outputs and Importance Measures. R package version 1.30.0, https://CRAN.R-project.org/package=sensitivity.
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18–22.
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") tidyfit::explain(fit, use_package = "sensitivity", use_method = "src") data <- dplyr::filter(tidyfit::Factor_Industry_Returns, Industry == Industry[1]) fit <- regress(data, Return ~ ., m("lm"), .mask = c("Date", "Industry")) tidyfit::explain(fit, use_package = "iml", use_method = "Shapley", which_rows = c(1))
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") tidyfit::explain(fit, use_package = "sensitivity", use_method = "src") data <- dplyr::filter(tidyfit::Factor_Industry_Returns, Industry == Industry[1]) fit <- regress(data, Return ~ ., m("lm"), .mask = c("Date", "Industry")) tidyfit::explain(fit, use_package = "iml", use_method = "Shapley", which_rows = c(1))
A generic method for calculating XAI and variable importance methods for tidyfit.models frames.
## S3 method for class 'tidyfit.models' explain( object, use_package = NULL, use_method = NULL, ..., .keep_grid_id = FALSE )
## S3 method for class 'tidyfit.models' explain( object, use_package = NULL, use_method = NULL, ..., .keep_grid_id = FALSE )
object |
|
use_package |
the package to use to calculate variable importance. See 'Details' for possible options. |
use_method |
the method from 'use_package' that should be used to calculate variable importance. |
... |
additional arguments passed to the importance method |
.keep_grid_id |
boolean. By default the grid ID column is dropped, if there is only one unique setting per model or group. |
WARNING This function is currently in an experimental stage.
The function uses the 'model_object' column in a tidyfit.model
frame to return variable importance measures for each model.
Possible packages and methods include:
sensitivity
package:The package provides methods to assess variable importance in linear regressions ('lm') and classifications ('glm').
Usage: use_package="sensitivity"
Methods:
"lmg" (Shapley regression),
"pmvd" (Proportional marginal variance decomposition),
"src" (standardized regression coefficients),
"pcc" (partial correlation coefficients),
"johnson" (Johnson indices)
See ?sensitivity::lmg
for more information and additional arguments.
iml
package:Integration with iml is currently in progress. The methods can be used for 'nnet', 'rf', 'lasso', 'enet', 'ridge', 'adalasso', 'glm' and 'lm'.
Usage: use_package="iml"
Methods:
"Shapley" (SHAP values)
"LocalModel" (LIME)
"FeatureImp" (Permutation-based feature importance)
The argument 'which_rows' (vector of integer indexes) can be used to explain specific rows in the data set for Shapley and LocalModel methods.
randomForest
package:This uses the native importance method of the randomForest package and can be used with 'rf' and 'quantile_rf' regression and classification.
Usage: use_package="randomForest"
Methods:
"mean_decrease_accuracy"
partimp
package:This is currently still experimental. Documentation will be provided when the package is more stable.
A 'tibble'.
Johann Pfitzinger
Molnar C, Bischl B, Casalicchio G (2018). “iml: An R package for Interpretable Machine Learning.” JOSS, 3(26), 786. doi:10.21105/joss.00786.
Iooss B, Veiga SD, Janon A, Pujol G, Broto wcfB, Boumhaout K, Clouvel L, Delage T, Amri RE, Fruth J, Gilquin L, Guillaume J, Herin M, Idrissi MI, Le Gratiet L, Lemaitre P, Marrel A, Meynaoui A, Nelson BL, Monari F, Oomen R, Rakovec O, Ramos B, Rochet P, Roustant O, Sarazin G, Song E, Staum J, Sueur R, Touati T, Verges V, Weber F (2024). sensitivity: Global Sensitivity Analysis of Model Outputs and Importance Measures. R package version 1.30.0, https://CRAN.R-project.org/package=sensitivity.
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18–22.
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") explain(fit, use_package = "sensitivity", use_method = "src") data <- dplyr::filter(tidyfit::Factor_Industry_Returns, Industry == Industry[1]) fit <- regress(data, Return ~ ., m("lm"), .mask = c("Date", "Industry")) explain(fit, use_package = "iml", use_method = "Shapley", which_rows = c(1))
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") explain(fit, use_package = "sensitivity", use_method = "src") data <- dplyr::filter(tidyfit::Factor_Industry_Returns, Industry == Industry[1]) fit <- regress(data, Return ~ ., m("lm"), .mask = c("Date", "Industry")) explain(fit, use_package = "iml", use_method = "Shapley", which_rows = c(1))
The data set includes monthly returns between 1963 and 2022 for 10 industries, as well as factor values for 5 Fama-French factors.
https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
tidyfit.models
frameThe function generates fitted values for all models in a tidyfit.models
frame and outputs a tidy frame.
## S3 method for class 'tidyfit.models' fitted(object, ...)
## S3 method for class 'tidyfit.models' fitted(object, ...)
object |
|
... |
currently not used |
The function uses the 'model_object' column in a tidyfit.model
frame to return fitted values for each model.
A 'tibble'.
Johann Pfitzinger
coef.tidyfit.models
, predict.tidyfit.models
and residuals.tidyfit.models
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") fitted(fit)
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") fitted(fit)
tidyfit
The function can fit various regression or classification models and returns the results as a tibble. m()
can be used in conjunction with regress
and classify
, or as a stand-alone function.
m(model_method, formula = NULL, data = NULL, ...)
m(model_method, formula = NULL, data = NULL, ...)
model_method |
The name of the method to fit. See Details. |
formula |
an object of class "formula": a symbolic description of the model to be fitted. |
data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
... |
Additional arguments passed to the underlying method function (e.g. |
model_method
specifies the model to fit to the data and can take one of several options:
"lm"
performs an OLS regression using stats::lm
. See .fit.lm
for details.
"glm"
performs a generalized regression or classification using stats::glm
. See .fit.glm
for details.
"anova"
performs analysis of variance using stats::anova
. See .fit.anova
for details.
"robust"
performs a robust regression using MASS::rlm
. See .fit.robust
for details.
"quantile"
performs a quantile regression using quantreg::rq
. See .fit.quantile
for details.
"lasso"
performs a linear regression or classification with L1 penalty using glmnet::glmnet
. See .fit.lasso
for details.
"ridge"
performs a linear regression or classification with L2 penalty using glmnet::glmnet
. See .fit.ridge
for details.
"adalasso"
performs an Adaptive Lasso regression or classification using glmnet::glmnet
. See .fit.adalasso
for details.
"enet"
performs a linear regression or classification with L1 and L2 penalties using glmnet::glmnet
. See .fit.enet
for details.
"group_lasso"
performs a linear regression or classification with grouped L1 penalty using gglasso::gglasso
. See .fit.group_lasso
for details.
"boost"
performs gradient boosting regression or classification using mboost::glmboost
. See .fit.boost
for details.
"rf"
performs a random forest regression or classification using randomForest::randomForest
. See .fit.rf
for details.
"quantile_rf"
performs a quantile random forest regression or classification using quantregForest::quantregForest
. See .fit.quantile_rf
for details.
"svm"
performs a support vector regression or classification using e1071::svm
. See .fit.svm
for details.
"nnet"
performs a neural network regression or classification using nnet::nnet
. See .fit.nnet
for details.
"pcr"
performs a principal components regression using pls::pcr
. See .fit.pcr
for details.
"plsr"
performs a partial least squares regression using pls::plsr
. See .fit.plsr
for details.
"hfr"
performs a hierarchical feature regression using hfr::hfr
. See .fit.hfr
for details.
"subset"
performs a best subset regression or classification using bestglm::bestglm
(wrapper for leaps
). See .fit.subset
for details.
"gets"
performs a general-to-specific regression using gets::gets
. See .fit.gets
for details.
"bayes"
performs a Bayesian generalized regression or classification using arm::bayesglm
. See .fit.bayes
for details.
"bridge"
performs a Bayesian ridge regression using monomvn::bridge
. See .fit.bridge
for details.
"blasso"
performs a Bayesian Lasso regression using monomvn::blasso
. See .fit.blasso
for details.
"spikeslab"
performs a Bayesian Spike and Slab regression using BoomSpikeSlab::lm.spike
. See .fit.spikeslab
for details.
"bma"
performs a Bayesian model averaging regression using BMS::bms
. See .fit.bma
for details.
"tvp"
performs a Bayesian time-varying parameter regression using shrinkTVP::shrinkTVP
. See .fit.tvp
for details.
"glmm"
performs a mixed-effects GLM using lme4::glmer
. See .fit.glmm
for details.
"mslm"
performs a Markov-switching regression using MSwM::msmFit
. See .fit.mslm
for details.
"cor"
calculates Pearson's correlation coefficient using stats::cor.test
. See .fit.cor
for details.
"chisq"
calculates Pearson's Chi-squared test using stats::chisq.test
. See .fit.chisq
for details.
"mrmr"
performs a minimum redundancy, maximum relevance features selection routine using mRMRe::mRMR.ensemble
. See .fit.mrmr
for details.
"relief"
performs a ReliefF feature selection routine using CORElearn::attrEval
. See .fit.relief
for details.
"genetic"
performs a linear regression with feature selection using the genetic algorithm implemented in gaselect::genAlg
. See .fit.genetic
for details.
When called without formula
and data
arguments, the function returns a 'tidyfit.models' data frame with unfitted models.
A 'tidyfit.models' data frame.
Johann Pfitzinger
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("lm", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") fit
# Load data data <- tidyfit::Factor_Industry_Returns # Stand-alone function fit <- m("lm", Return ~ ., data) fit # Within 'regress' function fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") fit
tidyfit.models
frameThe function generates predictions for all models in a tidyfit.models
frame and outputs a tidy frame.
## S3 method for class 'tidyfit.models' predict(object, newdata, ..., .keep_grid_id = FALSE)
## S3 method for class 'tidyfit.models' predict(object, newdata, ..., .keep_grid_id = FALSE)
object |
|
newdata |
New values at which predictions are to made |
... |
currently not used |
.keep_grid_id |
boolean. By default the grid ID column is dropped, if there is only one unique setting per model or group. |
The function uses the 'model_object' column in a tidyfit.model
frame to return predictions using the newdata
argument for each model.
When the response variable is found in newdata
, it is automatically included as a 'truth' column.
A 'tibble'.
Johann Pfitzinger
coef.tidyfit.models
, residuals.tidyfit.models
and fitted.tidyfit.models
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") predict(fit, data)
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") predict(fit, data)
This function is a wrapper to fit many different types of linear
regression models on a (grouped) tibble
.
.data |
a data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). The data frame can be grouped. |
formula |
an object of class "formula": a symbolic description of the model to be fitted. |
... |
name-function pairs of models to be estimated. See 'Details'. |
.cv |
type of 'rsample' cross validation procedure to use to determine optimal hyperparameter values. Default is |
.cv_args |
additional settings to pass to the 'rsample' cross validation function. |
.weights |
optional name of column containing sample weights. |
.mask |
optional vector of columns names to ignore. Can be useful when using 'y ~ .' formula syntax. |
.return_slices |
logical. Should the output of individual cross validation slices be returned or only the final fit. Default is |
.return_grid |
logical. Should the output of the individual hyperparameter grids be returned or only the best fitting set of hyperparameters. Default is |
.tune_each_group |
logical. Should optimal hyperparameters be selected for each group or once across all groups. Default is |
.force_cv |
logical. Should models be evaluated across all cross validation slices, even if no hyperparameters are tuned. Default is |
regress
fits all models passed in ...
using the m
function. The models can be passed as name-function pairs (e.g. ols = m("lm")
) or without including a name.
Hyperparameters are tuned automatically using the '.cv' and '.cv_args' arguments, or can be passed to m()
(e.g. lasso = m("lasso", lambda = 0.5)
). See the individual model functions (?m()
) for an overview of hyperparameters.
Cross validation is performed using the 'rsample' package with possible methods including
'initial_split' (simple train-test split)
'initial_time_split' (train-test split with retained order)
'vfold_cv' (aka kfold cross validation)
'loo_cv' (leave-one-out)
'rolling_origin' (generalized time series cross validation, e.g. rolling or expanding windows)
'sliding_window', 'sliding_index', 'sliding_period' (specialized time series splits)
'bootstraps'
'group_vfold_cv', 'group_bootstraps'
See package documentation for 'rsample' for all available methods.
The mean squared error loss is used to validate performance in the cross validation.
Note that arguments for weights are automatically passed to the functions by setting the '.weights' argument. Weights are also considered during cross validation by calculating weighted versions of the cross validation loss function.
A tidyfit.models
frame containing model details for each group.
The 'tidyfit.models' frame consists of 4 different components:
A group of identifying columns (e.g. model name, data groups, grid IDs)
A 'model_object' column, which contains the fitted model.
A nested 'settings' column containing model arguments and hyperparameters
Columns showing errors, warnings and messages (if applicable)
Coefficients, predictions, fitted values or residuals can be accessed using the built-in coef
, predict
, fitted
and resid
methods. Note that all coefficients are transformed to ensure comparability across methods.
Johann Pfitzinger
classify
, coef.tidyfit.models
and predict.tidyfit.models
method
data <- tidyfit::Factor_Industry_Returns fit <- regress(data, Return ~ ., m("lasso", lambda = c(0.001, 0.1)), .mask = c("Date", "Industry")) # Print the models frame tidyr::unnest(fit, settings) # View coefficients coef(fit)
data <- tidyfit::Factor_Industry_Returns fit <- regress(data, Return ~ ., m("lasso", lambda = c(0.001, 0.1)), .mask = c("Date", "Industry")) # Print the models frame tidyr::unnest(fit, settings) # View coefficients coef(fit)
tidyfit.models
frameThe function generates residuals for all models in a tidyfit.models
frame and outputs a tidy frame.
## S3 method for class 'tidyfit.models' residuals(object, ...)
## S3 method for class 'tidyfit.models' residuals(object, ...)
object |
|
... |
currently not used |
The function uses the 'model_object' column in a tidyfit.model
frame to return residuals for each model.
A 'tibble'.
Johann Pfitzinger
coef.tidyfit.models
, predict.tidyfit.models
and fitted.tidyfit.models
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") resid(fit)
data <- dplyr::group_by(tidyfit::Factor_Industry_Returns, Industry) fit <- regress(data, Return ~ ., m("lm"), .mask = "Date") resid(fit)