Replace empty or null values in one column with those in another

Question

Replace empty or null values in one column with those in another

Navigation

#1 by (0 votes)

0

I have a DataFrame with two columns ( A and B ), in the column a there are rows that are empty (can be "nan" or " " ) How can I do to take the values of the column B and pass them to the A but only filling in the values that are empty or are null ?. For example:

import pandas as pd
import numpy as np

iDf2= {'a': [' ', 'juan', 'nan'],
       'b': ['2', '5', 'martina']}
iDf2 = pd.DataFrame(iDf2)

I want it to stay like this:

 iDf2= {'a': ['2','juan','martina'],
        'b': ['2','5','martina']}

python pandas

asked by Emaa 10.09.2018 в 18:13

source

1 answer

How can I pass a controller variable to a jsp? Error compiling a template

score 0 · Answer 1

If your values were truly null values you could simply use the fillna method:

iDf2.a.fillna(iDf2.b, inplace=True)

The problem is that what you consider empty values are not really null values, for example a string with a space will not be considered as NaN by Pandas by default. Before this you have two options:

Normalize the entire column or DataFrame previously passing those values considered "empty" to NaN and then apply fillna . This is especially relevant if you load the DataFrame from a csv ( pandas.read_csv ) or use pandas.read_table for example, in which you can use the argument na_values to specify that it should be considered as NAN , apart from the values that they are already considered by default as such:
```
import io

import pandas as pd
import numpy as np

file = io.StringIO('''\
a,b
,13
 ,2
juan,5
nan,martina
''')
```
```
>>> iDf2 = pd.read_csv(file, dtype=str, na_values=[" "])
>>> iDf2

      a        b
0   NaN       13
1   NaN        2
2  juan        5
3   NaN  martina

>>> iDf2.a.fillna(iDf2.b,  inplace=True)
>>> iDf2

         a        b
0       13       13
1        2        2
2     juan        5
3  martina  martina
```
By default they are parsed as nulls '-1.#IND' , '1.#QNAN' , '1.#IND' , '-1.#QNAN' , '#N/A' , 'N/A' , 'NA' , '#NA' , 'NULL' , 'NaN' , '-NaN' , 'nan' , '-nan' , plus empty strings.

If the above is not possible, you can simply create a boolean mask with the values that you consider "empty" ( pandas.Series.isin can be a good option) and assign them in this case the values of the column b .

import pandas as pd
import numpy as np



data = {'a': ['', ' ','juan','nan', np.nan],
        'b': ['13', '2','5','martina', 'foo']}

iDf2 = pd.DataFrame(data)
nans = iDf2.a.isin(('', ' ', 'nan', np.nan))
iDf2.a[nans] = iDf2.b

Exit:

>>> iDf2
         a        b
0       13       13
1        2        2
2     juan        5
3  martina  martina
4      foo      foo