Questions tagged as 'apache-spark'

0
answers

Load csv spark file 2.3.0 to analyze it with k-means

I am new using Apache Spark, version 2.3.0. I am based on the sample code that comes in the Spark page to be able to use the k-means algorithm. I make the example shown below and run it perfectly, but when trying to use it with csv file the erro...
asked by 03.10.2018 / 06:07
1
answer

How to configure spark cluster in ecs?

I have a multimaster spark configuration with zookeeper like this: 2 spark-master registered in zookeeper and two workers who register with the leader of the spark master, in addition the jobs are sent through livy which is an api rest. My...
asked by 27.02.2018 / 22:11
0
answers

Connect mysql with apache spark using python? [closed]

I am starting to use spark for some data response problems that we have since mysql . I have consumed tables that I exported with phpmyadmin to son but I want to connect directly to my localhost to consume the database...
asked by 17.06.2016 / 22:29
1
answer

Several computers work as 1 [closed]

I have several old computers bounced, and I wanted to know how I could align them in a kind of cluster, in order to run applications in a "faster" way, so to speak, in which the workload can be shared. , etc, seeing some options maybe with apach...
asked by 04.02.2017 / 04:10
0
answers

I can not start commands in spark-shell with scala

Use cmd of win 10, I call spark-shell, spark is executed and I'm on the command line under scala. Then, when trying to execute any command whatsoever, this appears to me: Exception in thread "main" java.io.FileNotFoundException: C:\Users\histe...
asked by 25.11.2018 / 01:17
1
answer

Why do I load the whole table in null when loading a dataframe in spark?

I'm using Apache Spark 2.3.0 but when I want to load the csv and then show its data with df.show the whole table appears in null and I do not understand why if the file does contain the data val schema = StructType(Array(StructField("Rank",Str...
asked by 11.10.2018 / 16:44
0
answers

Apache Spark. Consult MongoDB database

I am working with Apache Spark and MongoDB in Java. I have a database with a lot of documents, maybe more than 30,000,000 million documents. Every time I make a connection in my project I charge this entire collection in a Dataset which is ve...
asked by 10.10.2018 / 18:34
0
answers

Auxiliary RDDs within function map

I'm new to Apache Spark. I have a query regarding RDDs and transformations. I have a PairRDD with data loaded already. And now what I need, is: through a transformation (maptoPair), so that I can get another pairRDD, the tuple that meets a certa...
asked by 20.09.2018 / 00:34
0
answers

Create one RDD from another, but changing the value of a column (pyspark)

I must change an attribute in an RDD, passing it to 1 if it is "A", and to 0 if it is "B". Even the new attribute should be integer, rather than the current one. ... StructField("key", IntegerType(), True), StructField("inf", Strin...
asked by 17.06.2018 / 15:52
0
answers

java.lang.NoClassDefFoundError: Could not initialize class mainPrincipal $ in Apace Spark

Hi, first of all, thank you. I present problems when executing the following code in a cluster with Spark, if I execute it as master ("local [*]") it does not give problems everything perfect, but once I execute it in the following way   ...
asked by 18.05.2018 / 03:22