How to establish the factor of classifications of the knn test on a data set?

1

From the known conjugate of iris data, from the length and width of sepals and petals, I want to predict the species.

The data set is this:

I do not know how to get classes of the same size to get the neighboring knn. I tried:

data(iris)    
normalize <- function(x){
  return ((x-min(x))/max(x)-min(x))
}

iris_n <- as.data.frame(lapply(data[, c(1,2,3,4)],normalize))

iris_train <- c(iris_n[1:40, ], iris_n[51:90, ], iris_n[101:140, ])

iris_test <- c(iris_n[41:50, ], iris_n[91:100, ] ,iris_n[141:150, ])

iris_train_target <- c(iris_n[1:40, 5], iris_n[51:90, 5], iris_n[101:140, 5])
iris_test_target <-c(iris_n[41:50, 5], iris_n[91:100, 5], iris_n[141:150, 5])


require(class)
m1 <- knn(train = iris_train, test = iris_test, cl = iris_train_target, k=13)
table(iris_test_target, m1)

However the compiler tells me:

Loading required package: class
Error in knn(train = iris_train, test = iris_test, cl = iris_train_target,  : 
  'train' and 'class' have different lengths
In addition: Warning message:
In is.na(cl) : is.na() applied to non-(list or vector) of type 'NULL'
    
asked by ThePassenger 21.03.2017 в 09:25
source

1 answer

2

You only have one problem in the subset.

iris_train_target <- c(iris_n[1:40, 5], iris_n[51:90, 5], iris_n[101:140, 5])
iris_test_target <- c(iris_n[41:50, 5], iris_n[91:100, 5], iris_n[141:150, 5])

You are choosing column 5 of dataset iris_n when only has 4 columns according to your previous definition. Effectively, the order:

dim(iris_n)

Return:

  

[1] 150 4

So you just have to define the targets from the original dataset. Of course, it can be made a little more compact:

iris_train_target <- iris[c(1:40, 51:90, 101:140), 5]
iris_test_target <- iris[c(41:50, 91:100, 141:150), 5]

and the rest of the code works without problems.

    
answered by 29.03.2017 / 00:02
source