Create one RDD from another, but changing the value of a column (pyspark)

0

I must change an attribute in an RDD, passing it to 1 if it is "A", and to 0 if it is "B". Even the new attribute should be integer, rather than the current one.

    ...
    StructField("key", IntegerType(), True),
    StructField("inf", StringType(), True)
    ...

    print(xx_Rdd)
    [u'1::B', u'2::A', u'3::A', u'4::B', u'5::A']

And I need:

    print(new_xx_Rdd)
    [u'1::0', u'2::1', u'3::1', u'4::0', u'5::1']

Thanks !!

    
asked by Maria_1996 17.06.2018 в 15:52
source

0 answers