Error trying to normalize data

Question

Error trying to normalize data

Navigation

#1 by (1 votes)
#2 by (1 votes)

1

I'm normalizing a csv file with R.

datos <- read.csv2("Z.EMR.csv")
normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x)))}

In this line I get the following error

datos_norm <- as.data.frame(lapply(datos, normalize))

Error in Summary.factor (c (1793L, 1778L, 1791L, 1868L, 1794L, 1797L, 775L,: 'min' not meaningful for factors

The data that is in the csv has this format

600   21.253.671   742.868.110.810.811   24.306.558  223.452.149.180.328

Any help, thank you very much.

r

asked by Andres 27.11.2018 в 18:35

source

2 answers

Error counting the same day Problems with dynamic select - dependent ajax, mysql and php

score 1 · Answer 1

Most likely, one of the columns is categorical. You can put a if in the function so that, in case the column is factor , do not perform any operation:

normalize <- function(x) { 
  if(is.factor(x)) {return(x)}
  (x - min(x)) / (max(x) - min(x))}
data.frame(lapply(datos, normalize))

score 1 · Answer 2

The error is mainly due to the fact that some of the columns is a factor and not a numerical data. This would be an example:

normalize <- function(x) {return ((x - min(x)) / (max(x) - min(x)))}
v <- factor(rnorm(10))
normalize(v)

 Error in Summary.factor(c(9L, 3L, 4L, 5L, 6L, 7L, 8L, 2L, 1L, 10L), na.rm = FALSE) : 
  ‘min’ not meaningful for factors

Already the first function min() does not make sense with factor . Here are some possibilities:

1. Indeed the data is a factor and does not correspond to normalize it

You can condition the application of normalize or modify normalize so you have in mind what to do if you get a factor , but clearly we should not apply this function in the columns that are factor , then ideally for my taste, it is to apply normalize only in the corresponding columns and not in the data.frame complete. With names(datos)[-sapply(df, is.factor)] we get the names of the columns that are not factors, so we can subset the original data.frame and apply normalize in it:

lapply(datos[ ,names(datos)[-sapply(df, is.factor)], drop=FALSE], normalize)

2. Even being a factor we want to normalize it

Converting a% co_from% to numeric (as long as it makes sense to do so) is done using factor , your as.numeric(as.character(x)) could be expressed like this:

lapply(datos, function(x) {if(is.factor(x)){as.numeric(as.character(x))}else{normalize(x)}})

Note: if necessary, rather than converting the lapply into a numeric, is to analyze why the reading of the factor produces a csv and correct this.