Replace empty or null values in one column with those in another

0

I have a DataFrame with two columns ( A and B ), in the column a there are rows that are empty (can be "nan" or " " ) How can I do to take the values of the column B and pass them to the A but only filling in the values that are empty or are null ?. For example:

import pandas as pd
import numpy as np

iDf2= {'a': [' ', 'juan', 'nan'],
       'b': ['2', '5', 'martina']}
iDf2 = pd.DataFrame(iDf2)

I want it to stay like this:

 iDf2= {'a': ['2','juan','martina'],
        'b': ['2','5','martina']}
    
asked by Emaa 10.09.2018 в 20:13
source

1 answer

0

If your values were truly null values you could simply use the fillna method:

iDf2.a.fillna(iDf2.b, inplace=True)

The problem is that what you consider empty values are not really null values, for example a string with a space will not be considered as NaN by Pandas by default. Before this you have two options:

  • Normalize the entire column or DataFrame previously passing those values considered "empty" to NaN and then apply fillna . This is especially relevant if you load the DataFrame from a csv ( pandas.read_csv ) or use pandas.read_table for example, in which you can use the argument na_values to specify that it should be considered as NAN , apart from the values that they are already considered by default as such:

    import io
    
    import pandas as pd
    import numpy as np
    
    file = io.StringIO('''\
    a,b
    ,13
     ,2
    juan,5
    nan,martina
    ''')
    
    >>> iDf2 = pd.read_csv(file, dtype=str, na_values=[" "])
    >>> iDf2
    
          a        b
    0   NaN       13
    1   NaN        2
    2  juan        5
    3   NaN  martina
    
    >>> iDf2.a.fillna(iDf2.b,  inplace=True)
    >>> iDf2
    
             a        b
    0       13       13
    1        2        2
    2     juan        5
    3  martina  martina
    
      

    By default they are parsed as nulls '-1.#IND' , '1.#QNAN' , '1.#IND' , '-1.#QNAN' , '#N/A' , 'N/A' , 'NA' , '#NA' , 'NULL' , 'NaN' , '-NaN' , 'nan' , '-nan' , plus empty strings.

  • If the above is not possible, you can simply create a boolean mask with the values that you consider "empty" ( pandas.Series.isin can be a good option) and assign them in this case the values of the column b .

    import pandas as pd
    import numpy as np
    
    
    
    data = {'a': ['', ' ','juan','nan', np.nan],
            'b': ['13', '2','5','martina', 'foo']}
    
    iDf2 = pd.DataFrame(data)
    nans = iDf2.a.isin(('', ' ', 'nan', np.nan))
    iDf2.a[nans] = iDf2.b
    

    Exit:

    >>> iDf2
             a        b
    0       13       13
    1        2        2
    2     juan        5
    3  martina  martina
    4      foo      foo
    
answered by 10.09.2018 в 21:40