Standardize or normalize numerical variables of a base with a mixture of numerical and categorical variables

2

I have a database where numerical and categorical variables are mixed, I want to apply the scale () foundation to these variables efficiently, that is, that I do not have to create another dataframe with only numerics and apply the foundation scale () and then have to re-arm the database with cbind ()

    
asked by Estrada 12.10.2018 в 03:42
source

1 answer

3

There are several ways, the shortest and most effective is with the library dplyr .

Solution with dplyr

library(dplyr)
datos <- data.frame (num1 = rnorm(10, 0, 1), 
                     num2 = rnorm(10, 10, 10), 
                     char1 = letters[1:10])

mutate_if(datos, is.numeric, scale)

num1           num2      char1
1  -0.6713835 -0.2810801     a
2  -1.1075229 -0.5331723     b
3   0.4043045 -0.4210662     c
4  -0.6203845 -1.7037996     d
5  -1.0522031 -0.2737342     e
6   2.0347227  0.9929453     f
7  -0.6050569 -0.6518153     g
8   0.9539465  1.8703961     h
9   0.2077009  0.6891423     i
10  0.4558763  0.3121841     j

Explanation:

  • I create the data because the question does not contain an example.
  • mutate_if() conditionally applies a function. In this case the conditional filter is is.numeric , so it will apply the function only to the numerical ones.
  • Applies the scale function.
  • If you would like to add an argument to scale you should do so after calling the function. For example, to not center the values:

    mutate_if(datos, is.numeric, scale, center = FALSE)
    

    Solution with base::

      

    No need to import libraries.

  • We create a function that scales only the numerical variables. If they are not, the original vector returns.
    • Always preserve the type.
    • Does not alter the order of the columns.
  • We apply it with lapply()
  • We convert the resulting list from lapply() to data.frame
  • escalador_condicional

    Arguments:

    x A list or data frame with numeric columns.

    ... additional arguments for scale

       escalador_condicional <- function(x, ...) {
       if (is.numeric (x)) { 
       y <- as.vector(scale(x, ...))
       y } else {x}
       }
    

    Test:

       data.frame(lapply(datos, escalador_condicional))
    
        
    answered by 12.10.2018 / 04:40
    source