glmnet                package:glmnet                R Documentation

_f_i_t _a_n _e_l_a_s_t_i_c_n_e_t _m_o_d_e_l _p_a_t_h

_D_e_s_c_r_i_p_t_i_o_n:

     Fit a regularization path for the elasticnet at a grid of values
     for the regularization parameter lambda. Can deal with all shapes
     of data, including very large sparse data matrices. Fits linear,
     logistic and multinomial regression models.

_U_s_a_g_e:

     glmnet(x, y, family=c("gaussian","binomial","multinomial"), weights, alpha = 1,
       nlambda = 100, lambda.min = ifelse(nobs<nvars,0.05,0.0001), lambda,
       standardize = TRUE, thresh = 1e-04,  dfmax = nvars + 1,
       pmax = min(dfmax * 1.2, nvars), exclude, penalty.factor = rep(1, nvars),
       maxit=100, HessianExact = FALSE, type = c("covariance", "naive"))

_A_r_g_u_m_e_n_t_s:

       x: input matrix, of dimension nobs x nvars; each row is an
          observation vector. Can be in sparse column format (class
          '"dgCMatrix"' as in package 'Matrix')

       y: response variable. Quantitative for 'family="gaussian"'. For
          'family="binomial"' should be either a factor with two
          levels, or a two-column matrix of counts or proportions. For
          'family="multinomial"', can be a 'nc>=2' level factor, or a
          matrix with 'nc' columns of counts or proportions

  family: Response type (see above)

 weights: observation weights. Can be total counts if responses are
          proportion matrices. Default is 1 for each observation

   alpha: The elasticnet mixing parameter, with 0<alpha<= 1. The
          penalty is defined as 

               (1-alpha)/2||beta||_2^2+alpha||beta||_1.

          'alpha=1' is the lasso penalty; Currently 'alpha<0.01' not
          reliable, unless you supply your own 'lambda' sequence

 nlambda: The number of 'lambda' values - default is 100.

lambda.min: Smallest value for 'lambda', as a fraction of 'lambda.max',
          the (data derived) entry value (i.e. the smallest value for
          which all coefficients are zero). The default depends on the
          sample size 'nobs' relative to the number of variables
          'nvars'. If 'nobs > nvars', the default is '0.0001', close to
          zero.  If 'nobs < nvars', the default is '0.05'. A very small
          value of 'lambda.min' will lead to a saturated fit. This is
          undefined for '"binomial"' and '"multinomial"' models, and
          'glmnet' will exit gracefully when the percentage deviance
          explained is almost 1.

  lambda: A user supplied 'lambda' sequence. Typical usage is to have
          the  program compute its own 'lambda' sequence based on
          'nlambda' and 'lambda.min'. Supplying a value of 'lambda'
          overrides this. Use with care - it is better to supply a
          decreasing sequence of 'lambda' values than a single (small)
          value.

standardize: Logical flag for variable standardization, prior to
          fitting the model sequence. The coefficients are always
          returned on the original scale. Default is is
          'standardize=TRUE'

  thresh: Convergence threshold for coordinate descent. Each inner
          coordinate-descent loop continues until the relative change
          in any coefficient is less than 'thresh'. Defaults value is
          '1E-4'.

   dfmax: Limit the maximum number of variables in the model. Useful
          for very large 'nvars', if a partial path is desired.

    pmax: Limit the maximum number of variables ever to be nonzero

 exclude: Indices of variables to be excluded from the model. Default
          is none. Equivalent to an infinite penalty factor (next
          item).

penalty.factor: Separate penalty factors can be applied to each
          coefficient. This is a number that multiplies 'lambda' to
          allow differential shrinkage. Can be 0 for some variables,
          which implies no shrinkage, and that variable is always
          included in the model. Default is 1 for all variables (and
          implicitly infinity for variables listed in 'exclude').

   maxit: Maximum number of outer-loop iterations for '"binomial"' or
          '"multinomial"' families. Default is 100.

HessianExact: Only applies to '"binomial"' or '"multinomial"' families.
          If 'FALSE' (the default), an upper-bound approximation is
          made to the hessian, which is not recalculated at each outer
          loop.

    type: Two algorithm types are supported for (only)
          'family="gaussian"'. The default 'type="covariance"' saves
          all inner-products ever computed, and can be much faster than
          'type="naive"'. The latter can be more efficient for 'p >> N'
          situations.

_D_e_t_a_i_l_s:

     The sequence of models implied by 'lambda' is fit by coordinate
     descent. For 'family="gaussian"' this is the lasso sequence if
     'alpha=1', else it is the elasticnet sequence. For
     'family="binomial"' or 'family="multinomial"', this is a lasso or
     elasticnet regularization path for fitting the linear logistic or
     multinomial logistic regression paths. Sometimes the sequence is
     truncated before 'nlambda' values of 'lambda' have been used,
     because of instabilities in the logistic or multinomial models
     near a saturated fit. 'glmnet(...,family="binomial")' fits a
     traditional logistic regression model for the log-odds.
     'glmnet(...,family="multinomial")' fits a symmetric multinomial
     model, where each class is represented by a linear model (on the
     log-scale). The penalties take care of redundancies. A two-class
     '"multinomial"' model will produce the same fit as the
     corresponding '"binomial"' model, except the pair of coefficient
     matrices will be equal in magnitude and opposite in sign, and half
     the '"binomial"' values.  Note that the objective function for
     '"gaussian"' is 

                    1/(2*nobs)RSS + lambda*penalty

     , and for the logistic models it is  

                   1/nobs -loglik + lambda*penalty

_V_a_l_u_e:

     An object with S3 class '"glmnet","*" ', where '"*"' is '"elnet"',
     '"lognet"' or '"multnet"'  for the three types of models. 

    call: the call that produced this object

      a0: Intercept sequence of length 'length(lambda)'

    beta: For '"elnet"' and '"lognet"' models, a 'nvars x
          length(lambda)' matrix of coefficients, stored in sparse
          column format ('"dgCMatrix"'). For '"multnet"', a list of
          'nc' such matrices, one for each class.

  lambda: The actual sequence of 'lambda' values used

     dev: The fraction of (null) deviance explained (for '"elnet"',
          this is the R-square).

 nulldev: Null deviance (per observation)

      df: The number of nonzero coefficients for each value of
          'lambda'. For '"multnet"', this is the number of variables
          with a nonzero coefficient for _any_ class.

   dfmat: For '"multnet"' only. A matrix consisting of the number of
          nonzero coefficients per class

     dim: dimension of coefficient matrix (ices)

 npasses: total passes over the data summed over all lambda values

    jerr: error flag, for warnings and errors (largely for internal
          debugging).

_A_u_t_h_o_r(_s):

     Jerome Friedman, Trevor Hastie and Rob Tibshirani
       Maintainer: Trevor Hastie hastie@stanford.edu

_R_e_f_e_r_e_n_c_e_s:

     Friedman, J., Hastie, T. and Tibshirani, R. (2008) _Regularization
     Paths for Generalized Linear Models via Coordinate Descent_<URL:
     http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf>

_S_e_e _A_l_s_o:

     'print', 'predict' and 'coef' methods.

_E_x_a_m_p_l_e_s:

     x=matrix(rnorm(100*20),100,20)
     y=rnorm(100)
     g2=sample(1:2,100,replace=TRUE)
     g4=sample(1:4,100,replace=TRUE)
     fit1=glmnet(x,y)
     print(fit1)
     coef(fit1,s=0.01) # extract coefficients at a single value of lambda
     predict(fit1,newx=x[1:10,],s=c(0.01,0.005)) # make predictions
     fit2=glmnet(x,g2,family="binomial")
     fit3=glmnet(x,g4,family="multinomial")

