Print certain pandas rows

1

I have a dataframe similar to the following:

I want to show those columns that any of their rows contain a NaN, NaT or None. If I apply the following code I see what they are:

def null(x):
    return any(pd.isnull(x))

print df.apply(null)

It gives me something similar to this:

col1          False
col2          True
col3          False
col4          False
col5          False
col6          True
dtype: bool

I know that those that contain it are those that have a True .

But now I get a list of hits the columns, some with value True and others with value False . How can I make so that only the columns that are True come out? I have tried the following, but then I only get the result True but without the name of the column next to it, I feel that I should be close but I can not get it.

def null(x):
    if any(pd.isnull(x)) == True:
        print any(pd.isnull(x))

print df.apply(null)

This prints me the following,

True
True

EXAMPLE of what I want

col2          True
col6          True
dtype: bool
    
asked by NEA 09.12.2018 в 19:39
source

2 answers

2

You almost had it already. The first filter returned a series whose index was the names of the columns (all) and their values a series of Booleans. Just use the indexing operator ( [] ) to keep those rows that have value True .

In the same way that you can do df[df.x>3] for example, that you select the rows in the x of the dataframe df have value greater than 3, you can also do df[df.y] , which selects you the rows of the dataframe df in which the% column_of% has the value% co_of%.

In your case the "dataframe" is a series, which is the result of the first filter. Instead of printing it, let's save it in a variable, y :

def null(x):
    return any(pd.isnull(x))

r = df.apply(null)

Now it's enough to do:

print r[r]

and you'll get the output you expected:

Col2    True
Col6    True
dtype: bool

Even simpler

You do not need the function True or r :

r = df.isnull().any()
print r[r]

And if really what you were looking for are the names of the columns that contain some null, the% co_of final% you can change it for this other one:

print list(r[r].index)
['Col2', 'Col6']

Addendum

For reproducibility of my results, the dataframe null() with which I worked created it in the following way (it is not the same as yours, but in the end it serves to illustrate the method):

import pandas as pd
import numpy as np

# Creo 6 columnas de números aleatorios. 20 datos por columna
# Lo guardo en un diccionario cuyas claves son los nombres de columna
data = {}
for i in range(1,7):
  data["Col{}".format(i)] = np.random.randint(20,100, 20)

# Lo convierto en dataframe
df = pd.DataFrame(data)

# Pongo algunos 'nan' en algunas celdas de las columnas Col2 y Col6
df.loc[3, "Col2"] = np.nan
df.loc[6, "Col2"] = np.nan
df.loc[7, "Col6"] = np.nan
    
answered by 10.12.2018 / 11:15
source
0

You have 2 print's. one that shows when it is True and another that shows what is outside the function. If what you want is that it only shows the True, nothing else you should use the if == True and printear. If what you want is not to print the false ones, remove the print that you have out or put an if to it so that it only executes the print line when it is true.

    
answered by 10.12.2018 в 06:07