Title: | Model-Based Sliced Inverse Regression |
---|---|
Description: | An R package for dimension reduction based on finite Gaussian mixture modeling of inverse regression. |
Authors: | Luca Scrucca [aut, cre] |
Maintainer: | Luca Scrucca <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3.3 |
Built: | 2024-10-25 02:44:16 UTC |
Source: | https://github.com/cran/msir |
An R package that implements MSIR, a dimension reduction method based on Gaussian finite mixture models. The basis of the subspace is estimated by modeling the inverse distribution within slice using finite mixtures of Gaussians, with number of components and covariance matrix parameterization selected by BIC or defined by the user. The method provides an extension to sliced inverse regression (SIR) and allows to overcome the main limitation of SIR, i.e., the failure in the presence of regression symmetric relationships, without the need to impose further assumptions.
Luca Scrucca [email protected]
Scrucca, L. (2011) Model-based SIR for dimension reduction. Computational Statistics & Data Analysis, 55(11), 3010-3026.
Nonparametric estimation of mean function with variability bands.
loess.sd(x, y = NULL, nsigma = 1, ...) panel.loess(x, y, col = par("col"), bg = NA, pch = par("pch"), cex = 1, col.smooth = "red", span = 2/3, degree = 2, nsigma = 1, ...)
loess.sd(x, y = NULL, nsigma = 1, ...) panel.loess(x, y, col = par("col"), bg = NA, pch = par("pch"), cex = 1, col.smooth = "red", span = 2/3, degree = 2, nsigma = 1, ...)
x |
a vector of values for the predictor variable |
y |
a vector of values for the response variable |
nsigma |
a multiplier for the standard deviation function. |
col , bg , pch , cex
|
numeric or character codes for the color(s), point type and size of points; see also |
col.smooth |
color to be used by |
span |
smoothing parameter for |
degree |
the degree of the polynomials to be used, see |
... |
further argument passed to the function |
The function loess.sd
computes the loess smooth for the mean function and the mean plus and minus k
times the standard deviation function.
The function panel.loess
can be used to add to a scatterplot matrix panel a smoothing of mean function using loess with variability bands at plus and minus nsigmas
times the standard deviation.
Luca Scrucca [email protected]
Weisberg, S. (2005) Applied Linear Regression, 3rd ed., Wiley, New York, pp. 275-278.
data(cars) plot(cars, main = "lowess.sd(cars)") lines(l <- loess.sd(cars)) lines(l$x, l$upper, lty=2) lines(l$x, l$lower, lty=2)
data(cars) plot(cars, main = "lowess.sd(cars)") lines(l <- loess.sd(cars)) lines(l$x, l$upper, lty=2) lines(l$x, l$lower, lty=2)
A dimension reduction method based on Gaussian finite mixture models which provides an extension to sliced inverse regression (SIR). The basis of the subspace is estimated by modeling the inverse distribution within slice using Gaussian finite mixtures with number of components and covariance matrix parameterization selected by BIC or defined by the user.
msir(x, y, nslices = msir.nslices, slice.function = msir.slices, modelNames = NULL, G = NULL, cov = c("mle", "regularized"), ...)
msir(x, y, nslices = msir.nslices, slice.function = msir.slices, modelNames = NULL, G = NULL, cov = c("mle", "regularized"), ...)
x |
A |
y |
A |
nslices |
The number of slices used, unless |
slice.function |
The slice functions to be used, by default |
modelNames |
A vector of character strings indicating the Gaussian mixture models to be fitted as described in |
G |
An integer vector specifying the numbers of mixture components used in fitting Gaussian mixture models. If a list of vectors is provided then each vector refers to a single slice. |
cov |
The predictors marginal covariance matrix. Possible choices are:
|
... |
other arguments passed to |
Returns an object of class 'msir'
with attributes:
call |
the function call. |
x |
the design matrix. |
y |
the response vector. |
slice.info |
output from slicing function. |
mixmod |
a list of finite mixture model objects as described in |
loglik |
the log-likelihood for the mixture models. |
f |
a vector of length equal to the total number of mixture components containing the fraction of observations in each fitted component within slices. |
mu |
a matrix of component within slices predictors means. |
sigma |
the marginal predictors covariance matrix. |
M |
the msir kernel matrix. |
evalues |
the eigenvalues from the generalized eigen-decomposition of |
evectors |
the raw eigenvectors from the generalized eigen-decomposition of |
basis |
the normalized eigenvectors from the generalized eigen-decomposition of |
std.basis |
standardized basis vectors obtained by multiplying each coefficient of the eigenvectors by the standard deviation of the corresponding predictor. The resulting coefficients are scaled such that all predictors have unit standard deviation. |
numdir |
the maximal number of directions estimated. |
dir |
the estimated MSIR directions from mean-centered predictors. |
Luca Scrucca [email protected]
Scrucca, L. (2011) Model-based SIR for dimension reduction. Computational Statistics & Data Analysis, 55(11), 3010-3026.
# 1-dimensional simple regression n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- exp(0.5 * x%*%b) + 0.1*rnorm(n) MSIR <- msir(x, y) summary(MSIR) plot(MSIR, type = "2Dplot") # 1-dimensional symmetric response curve n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- (0.5 * x%*%b)^2 + 0.1*rnorm(n) MSIR <- msir(x, y) summary(MSIR) plot(MSIR, type = "2Dplot") plot(MSIR, type = "coefficients") # 2-dimensional response curve n <- 300 p <- 5 b1 <- c(1, 1, 1, rep(0, p-3)) b2 <- c(1,-1,-1, rep(0, p-3)) b <- cbind(b1,b2) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- x %*% b1 + (x %*% b1)^3 + 4*(x %*% b2)^2 + rnorm(n) MSIR <- msir(x, y) summary(MSIR) plot(MSIR, which = 1:2) ## Not run: plot(MSIR, type = "spinplot") plot(MSIR, which = 1, type = "2Dplot", span = 0.7) plot(MSIR, which = 2, type = "2Dplot", span = 0.7)
# 1-dimensional simple regression n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- exp(0.5 * x%*%b) + 0.1*rnorm(n) MSIR <- msir(x, y) summary(MSIR) plot(MSIR, type = "2Dplot") # 1-dimensional symmetric response curve n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- (0.5 * x%*%b)^2 + 0.1*rnorm(n) MSIR <- msir(x, y) summary(MSIR) plot(MSIR, type = "2Dplot") plot(MSIR, type = "coefficients") # 2-dimensional response curve n <- 300 p <- 5 b1 <- c(1, 1, 1, rep(0, p-3)) b2 <- c(1,-1,-1, rep(0, p-3)) b <- cbind(b1,b2) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- x %*% b1 + (x %*% b1)^3 + 4*(x %*% b2)^2 + rnorm(n) MSIR <- msir(x, y) summary(MSIR) plot(MSIR, which = 1:2) ## Not run: plot(MSIR, type = "spinplot") plot(MSIR, which = 1, type = "2Dplot", span = 0.7) plot(MSIR, which = 2, type = "2Dplot", span = 0.7)
BIC-type criterion for selecting the dimensionality of a dimension reduction subspace.
msir.bic(object, type = 1, plot = FALSE) bicDimRed(M, x, nslices, type = 1, tol = sqrt(.Machine$double.eps))
msir.bic(object, type = 1, plot = FALSE) bicDimRed(M, x, nslices, type = 1, tol = sqrt(.Machine$double.eps))
object |
a |
plot |
if |
M |
the kernel matrix. See details below. |
x |
the predictors data matrix. See details below. |
type |
See details below. |
nslices |
the number of slices. See details below. |
tol |
a tolerance value |
This BIC-type criterion for the determination of the structural dimension selects as the maximizer of
where is the log-likelihood for dimensions up to
,
is the number of predictors, and
is the sample size.
The term
is the type of penalty to be used:
type = 1
:
type = 2
: , where
type = 3
: , where
type = 4
Returns a list with components:
evalues |
eigenvalues |
l |
log-likelihood |
crit |
BIC-type criterion |
d |
selected dimensionality |
The msir.bic
also assign the above information to the corresponding 'msir'
object.
Luca Scrucca [email protected]
Zhu, Miao and Peng (2006) "Sliced Inverse Regression for CDR Space Estimation", JASA.
Zhu, Zhu (2007) "On kernel method for SAVE", Journal of Multivariate Analysis.
# 1-dimensional symmetric response curve n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- (0.5 * x%*%b)^2 + 0.1*rnorm(n) MSIR <- msir(x, y) msir.bic(MSIR, plot = TRUE) summary(MSIR) msir.bic(MSIR, type = 3, plot = TRUE) summary(MSIR)
# 1-dimensional symmetric response curve n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- (0.5 * x%*%b)^2 + 0.1*rnorm(n) MSIR <- msir(x, y) msir.bic(MSIR, plot = TRUE) summary(MSIR) msir.bic(MSIR, type = 3, plot = TRUE) summary(MSIR)
This function computes a Sturges' type number of slices to be used as default in the msir
function.
msir.nslices(n, p)
msir.nslices(n, p)
n |
the number of observations in the sample. |
p |
the number of predictors in the sample. |
The function returns a single value, i.e. the number of slices.
Luca Scrucca [email protected]
Approximates marginal dimension test significance levels by sampling from the permutation distribution.
msir.permutation.test(object, npermute = 99, numdir = object$numdir, verbose = TRUE)
msir.permutation.test(object, npermute = 99, numdir = object$numdir, verbose = TRUE)
object |
a |
npermute |
number of permutations to compute. |
numdir |
maximum value of the dimension to test. |
verbose |
if |
The function approximates significance levels of the marginal dimension tests based on a permutation test.
The function returns a list with components:
summary |
a table containing the hypotheses, the test statistics, the permutation p-values. |
npermute |
the number of permutations used. |
Furthermore, it also assigns the above information to the corresponding 'msir'
object.
Luca Scrucca [email protected]
Scrucca, L. (2011) Model-based SIR for dimension reduction. Computational Statistics & Data Analysis, 55(11), 3010-3026.
Function dr()
in package dr.
## Not run: # 1-dimensional simple regression n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- exp(0.5 * x%*%b) + 0.1*rnorm(n) MSIR <- msir(x, y) msir.permutation.test(MSIR) summary(MSIR) ## End(Not run)
## Not run: # 1-dimensional simple regression n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- exp(0.5 * x%*%b) + 0.1*rnorm(n) MSIR <- msir(x, y) msir.permutation.test(MSIR) summary(MSIR) ## End(Not run)
This function computes a regularized version of the covariance matrix of the predictors. Among the possible models the one which maximizes BIC is returned.
msir.regularizedSigma(x, inv = FALSE, model = c("XII", "XXI", "XXX"))
msir.regularizedSigma(x, inv = FALSE, model = c("XII", "XXI", "XXX"))
x |
Ahe predictors data matrix. |
inv |
A logical specifying what must be returned. If |
model |
A character string specifying the available models:
|
A covariance matrix estimate.
Luca Scrucca [email protected]
Function used for slicing a continuous response variable.
msir.slices(y, nslices)
msir.slices(y, nslices)
y |
a vector of |
nslices |
the number of slices, no larger than |
Returns a list with components:
slice.indicator |
an indicator variable for the slices. |
nslices |
the actual number of slices produced. |
slice.sizes |
the number of observations in each slice. |
Luca Scrucca [email protected]
'msir'
objects.Plots directions and other information from MSIR estimation.
## S3 method for class 'msir' plot(x, which, type = c("pairs", "2Dplot", "spinplot", "evalues", "coefficients"), span = NULL, std = TRUE, ylab, xlab, restore.par = TRUE, ...)
## S3 method for class 'msir' plot(x, which, type = c("pairs", "2Dplot", "spinplot", "evalues", "coefficients"), span = NULL, std = TRUE, ylab, xlab, restore.par = TRUE, ...)
x |
a |
which |
a vector of value(s) giving the directions for which the plot should be drawn. |
type |
the type of plot to be drawn. |
span |
the span of smoother (only for |
std |
if |
ylab |
a character string for the y-axis label. |
xlab |
a character string for the x-axis label. |
restore.par |
if |
... |
additional arguments. |
Luca Scrucca [email protected]
Scrucca, L. (2011) Model-based SIR for dimension reduction. Computational Statistics & Data Analysis, 55(11), 3010-3026.
## Not run: # 2-dimensional response curve n <- 300 p <- 5 b1 <- c(1, 1, 1, rep(0, p-3)) b2 <- c(1,-1,-1, rep(0, p-3)) b <- cbind(b1,b2) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- x %*% b1 + (x %*% b1)^3 + 4*(x %*% b2)^2 + rnorm(n) MSIR <- msir(x, y) summary(MSIR) plot(MSIR) plot(MSIR, which = 1:2) plot(MSIR, type = "2Dplot", which = 1, span = 0.7) plot(MSIR, type = "2Dplot", which = 2, span = 0.7) plot(MSIR, type = "spinplot") plot(MSIR, type = "evalues") plot(MSIR, type = "coefficients") ## End(Not run)
## Not run: # 2-dimensional response curve n <- 300 p <- 5 b1 <- c(1, 1, 1, rep(0, p-3)) b2 <- c(1,-1,-1, rep(0, p-3)) b <- cbind(b1,b2) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- x %*% b1 + (x %*% b1)^3 + 4*(x %*% b2)^2 + rnorm(n) MSIR <- msir(x, y) summary(MSIR) plot(MSIR) plot(MSIR, which = 1:2) plot(MSIR, type = "2Dplot", which = 1, span = 0.7) plot(MSIR, type = "2Dplot", which = 2, span = 0.7) plot(MSIR, type = "spinplot") plot(MSIR, type = "evalues") plot(MSIR, type = "coefficients") ## End(Not run)
MSIR estimates a set of orthogonal direction vectors of length
which are estimates of the basis of the dimensional reduction subspace.
## S3 method for class 'msir' predict(object, dim = 1:object$numdir, newdata, ...)
## S3 method for class 'msir' predict(object, dim = 1:object$numdir, newdata, ...)
object |
an object of class |
dim |
the dimensions of the reduced subspace used for prediction. |
newdata |
a data frame or matrix giving the data. If missing the data obtained from the call to |
... |
further arguments passed to or from other methods. |
The function returns a matrix of points projected on the subspace spanned by the estimated basis vectors.
Luca Scrucca [email protected]
Scrucca, L. (2011) Model-based SIR for dimension reduction. Computational Statistics & Data Analysis, 55(11), 3010-3026.
{msir}
n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- exp(0.5 * x%*%b) + 0.1*rnorm(n) pairs(cbind(y,x), gap = 0) MSIR <- msir(x, y) summary(MSIR) plot(MSIR, which = 1, type = "2Dplot") all.equal(predict(MSIR), MSIR$dir) predict(MSIR, dim = 1:2) x0 <- matrix(rnorm(n*p), nrow = n, ncol = p) y0 <- exp(0.5 * x0%*%b) + 0.1*rnorm(n) plot(predict(MSIR, dim = 1, newdata = x0), y0)
n <- 200 p <- 5 b <- as.matrix(c(1,-1,rep(0,p-2))) x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- exp(0.5 * x%*%b) + 0.1*rnorm(n) pairs(cbind(y,x), gap = 0) MSIR <- msir(x, y) summary(MSIR) plot(MSIR, which = 1, type = "2Dplot") all.equal(predict(MSIR), MSIR$dir) predict(MSIR, dim = 1:2) x0 <- matrix(rnorm(n*p), nrow = n, ncol = p) y0 <- exp(0.5 * x0%*%b) + 0.1*rnorm(n) plot(predict(MSIR, dim = 1, newdata = x0), y0)
General function to draw a rgl-based rotating 3D scatterplot.
spinplot(x, y, z, scaling = c("abc", "aaa"), rem.lin.trend = FALSE, uncor.vars = FALSE, fit.ols = FALSE, fit.smooth = FALSE, span = 0.75, ngrid = 25, markby, pch.points = 1, col.points = "black", cex.points = 1, col.axis = "gray50", col.smooth = "limegreen", col.ols = "lightsteelblue", background = "white", ...)
spinplot(x, y, z, scaling = c("abc", "aaa"), rem.lin.trend = FALSE, uncor.vars = FALSE, fit.ols = FALSE, fit.smooth = FALSE, span = 0.75, ngrid = 25, markby, pch.points = 1, col.points = "black", cex.points = 1, col.axis = "gray50", col.smooth = "limegreen", col.ols = "lightsteelblue", background = "white", ...)
x |
a vector of values for the variable in the horizontal (H) screen axis. |
y |
a vector of values for the variable in the vertical (V) screen axis. |
z |
a vector of values for the variable in the out-of-screen (O) axis. |
scaling |
the scaling applied. Two possible values are |
rem.lin.trend |
a logical specifying if the linear trend should be remove. If |
uncor.vars |
a logical specifying if uncorrelated H and O variables should be used. If |
fit.ols |
a logical specifying if a fitted OLS plane should be included. |
fit.smooth |
a logical specifying if a nonparametric smoothing plane should be included. |
span |
the span used by |
ngrid |
the number of grid points to use for displaing the fitted plane. |
markby |
a variable (usually a factor) to be used for marking the points. |
pch.points |
a vector of symbols for marking the points. |
col.points |
a vector of colors for marking the points. |
cex.points |
the cex for points. |
col.axis |
the color of the axis. |
col.ols |
the color to be used for drawing the OLS plane. |
col.smooth |
the color to be used for drawing the smoothing plane. |
background |
the color of background space. |
... |
catch further unused arguments. |
This function is mainly based on the functionality of the spin-plot
function once available in XLisp-Stat software https://en.wikipedia.org/wiki/XLispStat, and the adds-on introduced by the Arc software http://www.stat.umn.edu/arc/index.html.
Luca Scrucca [email protected]
Cook R. D., Weisberg S. (1999) Applied Regression Including Computing and Graphics, Wiley, Chapter 8
## Not run: x1 <- rnorm(100) x2 <- rnorm(100) y <- 2*x1 + x2^2 + 0.5*rnorm(100) spinplot(x1, y, x2) spinplot(x1, y, x2, scaling = "aaa") spinplot(x1, y, x2, rem.lin.trend = "TRUE") spinplot(x1, y, x2, fit.smooth = TRUE) spinplot(x1, y, x2, fit.ols = TRUE) x <- iris[,1:3] y <- iris[,5] spinplot(x) spinplot(x, markby = y) spinplot(x, markby = y, col.points = c("dodgerblue2", "orange", "green3")) spinplot(x, markby = y, pch = c(0,3,1), col.points = c("dodgerblue2", "orange", "green3")) # to save plots use # rgl.postscript("plot.pdf", fmt="pdf") # or # rgl.snapshot("plot.png") ## End(Not run)
## Not run: x1 <- rnorm(100) x2 <- rnorm(100) y <- 2*x1 + x2^2 + 0.5*rnorm(100) spinplot(x1, y, x2) spinplot(x1, y, x2, scaling = "aaa") spinplot(x1, y, x2, rem.lin.trend = "TRUE") spinplot(x1, y, x2, fit.smooth = TRUE) spinplot(x1, y, x2, fit.ols = TRUE) x <- iris[,1:3] y <- iris[,5] spinplot(x) spinplot(x, markby = y) spinplot(x, markby = y, col.points = c("dodgerblue2", "orange", "green3")) spinplot(x, markby = y, pch = c(0,3,1), col.points = c("dodgerblue2", "orange", "green3")) # to save plots use # rgl.postscript("plot.pdf", fmt="pdf") # or # rgl.snapshot("plot.png") ## End(Not run)
'msir'
objectsSummary and print methods for 'msir'
objects.
## S3 method for class 'msir' summary(object, numdir = object$numdir, std = FALSE, verbose = TRUE, ...) ## S3 method for class 'summary.msir' print(x, digits = max(5, getOption("digits") - 3), ... )
## S3 method for class 'msir' summary(object, numdir = object$numdir, std = FALSE, verbose = TRUE, ...) ## S3 method for class 'summary.msir' print(x, digits = max(5, getOption("digits") - 3), ... )
object |
a |
numdir |
the number of directions to be shown. |
std |
if |
verbose |
if |
x |
a |
digits |
the significant digits to use. |
... |
additional arguments. |
Luca Scrucca [email protected]