Variable selection with Permuted Inclusion Criterion

Method of forward variable selection based on deviance for Bradley-Terry models using pairwise ranking data. The selection procedure consists of two steps, first, permuting the variables from the original predictors with n.iteractions, then performing a forward selection to retain the predictors with highest contribution to the model, see details.

btpermute(
  contests = NULL,
  predictors = NULL,
  n.iterations = 15,
  seed = NULL,
  ...
)

Arguments

contests: a data frame with pairwise binary contests with these variables 'id','player1','player2','win1','win2'; in that order. The id should be equivalent to the index of each row in predictors
predictors: a data frame with player-specific variables with row indices that should match with the ids in contests. An id is not required, only the predictor variables, the ids are the index for each row
n.iterations: integer, number of iterations to compute
seed: integer, the seed for random number generation. If NULL (the default), gosset will set the seed randomly
...: additional arguments passed to BradleyTerry2 methods

Value

an object of class gosset_btpermute with the final BTm() model, selected variables, seeds (random numbers) used for permutations and deviances

Details

The selection procedure consists of two steps. In the first step, btpermute adds to the set of original (candidate) predictors variables an additional set of 'fake', permuted variables. This set of permuted predictors is created by assigning to each ranking the variables from another, randomly selected ranking. The permuted variables are not expected to have any predictive power for pairwise rankings. In the second step, btpermute adds predictors to the Bradley-Terry model in a forward selection procedure. Each predictors (real and permuted) is added to the null model individually, and btpermute retains which variable reduces model deviance most strongly. The two-step process is replicated n times with argument n.iterations. At each iteration, a new random permutation is generated and all variables are tested. Replicability can be controlled using argument seed. Across the n n.iterations, the function identifies the predictor that appeared most often as the most deviance-reducing one. When this is a real variable, it is constantly added to the model and the forward selection procedure moves on – again creating new permutations, adding real and fake variables individually, and examining model deviance. Variable selection stops when a permuted variable is found to be most frequently the most deviance-reducing predictors across n.iterations. In turn, variable selection continuous as long as any real variable has stronger explanatory power for pairwise rankings than the random variables.

References

Lysen, S. (2009) Permuted inclusion criterion: A variable selection technique. University of Pennsylvania

Author

Jonathan Steinke and Kauê de Sousa

Examples

if (FALSE) { # interactive()

require("BradleyTerry2")

data("kenyachoice", package = "gosset")

mod <- btpermute(contests = kenyachoice$contests,
                 predictors = kenyachoice$predictors,
                 n.iterations = 10,
                 seed = 1)

mod
}