Model-based Sliced Inverse Regression (MSIR) is a dimension reduction method based on Gaussian finite mixture models which provides an extension to sliced inverse regression (SIR).
The basis of the MSIR subspace is estimated by modeling the inverse distribution within slice using Gaussian finite mixtures with number of components and covariance matrix parameterization selected by BIC or defined by the user.
The msir package implements the methodology described in Scrucca (2011).
This vignette is written in R Markdown using the knitr package for production.
n <- 200
p <- 5
b <- as.matrix(c(1,-1,rep(0,p-2)))
x <- matrix(rnorm(n*p), nrow = n, ncol = p)
y <- exp(0.5 * x%*%b) + 0.1*rnorm(n)
MSIR <- msir(x, y)
summary(MSIR)
## --------------------------------------------------
## Model-based SIR
## --------------------------------------------------
##
## Slices:
## 1 2 3 4 5 6
## GMM XXI EEI XXX XXX XXX XII
## Num.comp. 1 2 1 1 1 1
## Num.obs. 33 9|24 33 33 33 35
##
## Estimated basis vectors:
## Dir1 Dir2 Dir3 Dir4 Dir5
## x1 0.719233 0.665189 -0.35087 0.088328 0.289256
## x2 -0.693391 0.578066 -0.33561 0.195841 0.141172
## x3 0.027810 0.215261 0.30793 0.626084 -0.655309
## x4 -0.019198 -0.013048 -0.52209 -0.425878 -0.678608
## x5 -0.027765 0.420542 0.62996 -0.616840 -0.080389
##
## Dir1 Dir2 Dir3 Dir4 Dir5
## Eigenvalues 0.89038 0.12676 0.04031 0.00887 1.6342e-03
## Cum. % 83.37250 95.24195 99.01642 99.84698 1.0000e+02
plot(MSIR, type = "evalues")
n <- 200
p <- 5
b <- as.matrix(c(1,-1,rep(0,p-2)))
x <- matrix(rnorm(n*p), nrow = n, ncol = p)
y <- (0.5 * x%*%b)^2 + 0.1*rnorm(n)
MSIR <- msir(x, y)
summary(MSIR)
## --------------------------------------------------
## Model-based SIR
## --------------------------------------------------
##
## Slices:
## 1 2 3 4 5 6
## GMM XXX XXI XII XII EEV EII
## Num.comp. 1 1 1 1 2 2
## Num.obs. 33 33 33 33 13|20 16|19
##
## Estimated basis vectors:
## Dir1 Dir2 Dir3 Dir4 Dir5
## x1 0.7032497 -0.32265327 0.17850 0.50213 0.48315
## x2 -0.7097760 -0.20051289 0.26844 0.34153 0.51756
## x3 0.0207609 -0.64666898 0.42755 -0.59757 0.03875
## x4 0.0020883 -0.00061766 0.69106 0.35396 -0.60040
## x5 -0.0349652 -0.66144409 -0.48550 0.38580 -0.36976
##
## Dir1 Dir2 Dir3 Dir4 Dir5
## Eigenvalues 0.76068 0.048494 0.027907 0.01981 2.6055e-03
## Cum. % 88.50291 94.145085 97.392035 99.69686 1.0000e+02
plot(MSIR, type = "evalues")
n <- 300
p <- 5
b1 <- c(1, 1, 1, rep(0, p-3))
b2 <- c(1,-1,-1, rep(0, p-3))
b <- cbind(b1,b2)
x <- matrix(rnorm(n*p), nrow = n, ncol = p)
y <- x %*% b1 + (x %*% b1)^3 + 4*(x %*% b2)^2 + rnorm(n)
MSIR <- msir(x, y)
summary(MSIR)
## --------------------------------------------------
## Model-based SIR
## --------------------------------------------------
##
## Slices:
## 1 2 3 4 5 6 7 8
## GMM XXI VVE EEV EVE XII EEV XII XXI
## Num.comp. 1 2 2 3 1 2 1 1
## Num.obs. 42 12|30 16|26 18|12|12 42 25|17 42 6
##
## Estimated basis vectors:
## Dir1 Dir2 Dir3 Dir4 Dir5
## x1 0.320287 0.944245 -0.019421 -0.047184 -0.14173
## x2 0.635841 -0.256139 0.464958 0.182138 -0.50389
## x3 0.695964 -0.194734 -0.298009 -0.145248 0.62064
## x4 -0.092271 0.065128 0.818368 -0.361349 0.42430
## x5 0.015541 -0.025113 -0.157804 -0.901626 -0.40099
##
## Dir1 Dir2 Dir3 Dir4 Dir5
## Eigenvalues 0.65327 0.36881 0.054035 0.025591 0.013548
## Cum. % 58.57625 91.64546 96.490546 98.785167 100.000000
plot(MSIR, type = "evalues")
To obtain rotating 3D spinplot use:
spinplot()
function
spinplot(x, markby = y, pch = c(0,3,1),
col.points = c("lightcyan", "yellow", "lightgreen"),
background = "black")
Scrucca, L. (2011) Model-based SIR for dimension reduction. Computational Statistics & Data Analysis, 55(11), 3010-3026.
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] msir_1.3.3 knitr_1.48 rmarkdown_2.28
##
## loaded via a namespace (and not attached):
## [1] rgl_1.3.1 digest_0.6.37 R6_2.5.1 base64enc_0.1-3 fastmap_1.2.0
## [6] xfun_0.48 magrittr_2.0.3 maketools_1.3.1 mclust_6.1.1 cachem_1.1.0
## [11] htmltools_0.5.8.1 buildtools_1.0.0 lifecycle_1.0.4 cli_3.6.3 sass_0.4.9
## [16] jquerylib_0.1.4 compiler_4.4.1 highr_0.11 sys_3.4.3 tools_4.4.1
## [21] evaluate_1.0.1 bslib_0.8.0 yaml_2.3.10 htmlwidgets_1.6.4 jsonlite_1.8.9
## [26] rlang_1.1.4