Questions tagged as 'pyspark'

1
answer

Error in Jupyter with SPARK (Pyhton)

Python , Spark , Jupyter in my notebook and I see a failure when I try to run the following code, I put this example (text analysis), but it jumps in many other programs, I tried everything, configure the clusters, review the...
asked by 18.12.2018 / 12:23
0
answers

Group by 2 independent columns

I have a Dataset with millions of records that I want to group using pyspark by 2 independent columns, I'll give you an example: I have: ID Col A Col B 1 Alicia Madrid 2 Pepe Barcelona 3 Pepe Madrid 4 Juan Cadiz 5 Alicia Sev...
asked by 19.11.2018 / 18:02
0
answers

Auxiliary RDDs within function map

I'm new to Apache Spark. I have a query regarding RDDs and transformations. I have a PairRDD with data loaded already. And now what I need, is: through a transformation (maptoPair), so that I can get another pairRDD, the tuple that meets a certa...
asked by 20.09.2018 / 00:34
0
answers

Create one RDD from another, but changing the value of a column (pyspark)

I must change an attribute in an RDD, passing it to 1 if it is "A", and to 0 if it is "B". Even the new attribute should be integer, rather than the current one. ... StructField("key", IntegerType(), True), StructField("inf", Strin...
asked by 17.06.2018 / 15:52
0
answers

Can Squarify use two columns to graph?

I would like to know if having a dataframe with column A and B, where A is a quantity and B indicates a category assigned to that amount can be used as parameters in squarify to display its graph. One of the parameters of squarify is the scale o...
asked by 21.05.2018 / 05:22
0
answers

How do I convert a code into scala to pyspark?

Hello I need help with a code that I'm doing, it turns out that I must define a row wn pyspark object, but I have a code in scala that does that but I must do it in python def getRow(x : String) : Row={ val columnArray = new Array[String](95)...
asked by 15.05.2018 / 18:19
1
answer

Change values of an RDD

How can I change all the letters A to a 9 and all the letters B to an 8 in an RDD with a lambda function. I tried this but it does not work: rdd.map(lambda a: 9 if a == "A" else a == a) rdd.map(lambda a: 8 if a == 'B' else a == a) My exampl...
asked by 22.06.2017 / 20:49