Load csv spark file 2.3.0 to analyze it with k-means

1

I am new using Apache Spark, version 2.3.0. I am based on the sample code that comes in the Spark page to be able to use the k-means algorithm. I make the example shown below and run it perfectly, but when trying to use it with csv file the errors shown in the image come out.

I need to know what I'm doing wrong, if it's the way to load the file or modify the algorithm code.

import org.apache.spark.ml.clustering.KMeans
import org.apache.spark.ml.evaluation.ClusteringEvaluator

// Loads data
val dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt")

// Trains a k-means model
val kmeans = new KMeans().setK(2).setSeed(1L)
val model = kmeans.fit(dataset)

// Make predictions
val predictions = model.transform(dataset)

// Evaluate clustering by computing Silhouette score
val evaluator = new ClusteringEvaluator()

val silhouette = evaluator.evaluate(predictions)
println(s"Silhouette with squared euclidean distance = $silhouette")

// Shows the result
println("Cluster Centers: ")
model.clusterCenters.foreach(println)

    
asked by senseilex 03.10.2018 в 06:07
source

0 answers