Hi, first of all, thank you.
I present problems when executing the following code in a cluster with Spark, if I execute it as master ("local [*]") it does not give problems everything perfect, but once I execute it in the following way
/home/cluster/opt/spark-2.2.1-bin-hadoop2.7/bin/spark-submit --master spark: //10.8.176.90: 7077 --class mainPrincipate JARname.jar filename.csv
presents the following error
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: ..... java.lang.NoClassDefFoundError: Could not initialize class mainPrincipal $
at mainPrincipal $$ anonfun $ 2.apply (mainPrincipal.scala.25) which is the line that says "val tuplas = datos.map {row = > parseTupla (row)}"
this is my code:
object mainPrincipal {
val spark: SparkSession = SparkSession
.builder()
.appName("Nombre de la Aplicación")
.getOrCreate()
def main(args: Array[String]): Unit = {
val nombreDelfichero = args(0)
import spark.implicits._
val datos = spark.read.format("csv").option("header", "false").csv(nombreDelfichero)
val tuplas = datos.map{row => parseTupla(row)} //RDD de Tupla(id: Long, valores: Seq[Double])
tuplas.persist(StorageLevel.MEMORY_AND_DISK_SER)
tuplas.foreach( element1 =>{
println("Id "+element1._1.toString)
}
}
case class Tupla(id: Long, valores: Seq[Double]) extends Serializable
def parseTupla(row: Row): Tupla ={
var coordenadas : Seq[Double] = Seq.empty[Double]
val cant = row.size
val id = row.getString(cant-1).toLong
for(i <- 0 until cant-1)
{
coordenadas = coordenadas :+ row.getString(i).toDouble
}
Tupla( id,coordenadas)
}
}
I have consulted subjects but they are similar and I still do not have a solution, I would appreciate cooperation, as soon as possible and I would like to thank you again.