Error trying to normalize data

1

I'm normalizing a csv file with R.

datos <- read.csv2("Z.EMR.csv")
normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x)))}

In this line I get the following error

datos_norm <- as.data.frame(lapply(datos, normalize))
  

Error in Summary.factor (c (1793L, 1778L, 1791L, 1868L, 1794L, 1797L,   775L,: 'min' not meaningful for factors

The data that is in the csv has this format

600   21.253.671   742.868.110.810.811   24.306.558  223.452.149.180.328

Any help, thank you very much.

    
asked by Andres 27.11.2018 в 19:35
source

2 answers

1

Most likely, one of the columns is categorical. You can put a if in the function so that, in case the column is factor , do not perform any operation:

normalize <- function(x) { 
  if(is.factor(x)) {return(x)}
  (x - min(x)) / (max(x) - min(x))}
data.frame(lapply(datos, normalize))
    
answered by 27.11.2018 в 19:53
1

The error is mainly due to the fact that some of the columns is a factor and not a numerical data. This would be an example:

normalize <- function(x) {return ((x - min(x)) / (max(x) - min(x)))}
v <- factor(rnorm(10))
normalize(v)

 Error in Summary.factor(c(9L, 3L, 4L, 5L, 6L, 7L, 8L, 2L, 1L, 10L), na.rm = FALSE) : 
  ‘min’ not meaningful for factors 

Already the first function min() does not make sense with factor . Here are some possibilities:

1. Indeed the data is a factor and does not correspond to normalize it

You can condition the application of normalize or modify normalize so you have in mind what to do if you get a factor , but clearly we should not apply this function in the columns that are factor , then ideally for my taste, it is to apply normalize only in the corresponding columns and not in the data.frame complete. With names(datos)[-sapply(df, is.factor)] we get the names of the columns that are not factors, so we can subset the original data.frame and apply normalize in it:

lapply(datos[ ,names(datos)[-sapply(df, is.factor)], drop=FALSE], normalize)

2. Even being a factor we want to normalize it

Converting a% co_from% to numeric (as long as it makes sense to do so) is done using factor , your as.numeric(as.character(x)) could be expressed like this:

lapply(datos, function(x) {if(is.factor(x)){as.numeric(as.character(x))}else{normalize(x)}})

Note: if necessary, rather than converting the lapply into a numeric, is to analyze why the reading of the factor produces a csv and correct this.

    
answered by 27.11.2018 в 20:32