First let's create a reproducible example based on the one you provide:
import pandas as pd
data = {'X': [2, 3, 3, 4, 3, 2, 2],
'Y': [4, 5, 2, 4, 7, 4, 3],
'PROB': [False, False, False, True, True, False, False]
}
df = pd.DataFrame(data, columns = ['X', 'Y', 'PROB'])
To solve these cases, a very simple way is to use pandas.DataFrame.shift
to compare each element with the previous one and see if they are the same. If used together with pandas.DataFrame.cumsum
we obtain the numbered categories. For example:
>>> df['Categorias'] = (df.PROB != df.PROB.shift()).cumsum()
>>> df
X Y PROB Categorias
0 2 4 False 1
1 3 5 False 1
2 3 2 False 1
3 4 4 True 2
4 3 7 True 2
5 2 4 False 3
6 2 3 False 3
In your case, you want the numbering of the categories to be independent for each sub-dataframe obtained when separating according to the column PROB
. For this we can re-apply the same operation on each DataFrame obtained. To get the "true" and "false" just use the PROB
column as a Boolean mask:
aux = (df.PROB != df.PROB.shift()).cumsum()
falsos = df[~df.PROB].copy()
falsos['PROB']=(aux[~df.PROB]!=aux[~df.PROB].shift()).cumsum()
verdaderos = df[df.PROB].copy()
verdaderos['PROB']=(aux[df.PROB]!=aux[df.PROB].shift()).cumsum()
del(aux)
Exit:
>>> df
X Y PROB
0 2 4 False
1 3 5 False
2 3 2 False
3 4 4 True
4 3 7 True
5 2 4 False
6 2 3 False
>>> verdaderos
X Y PROB
3 4 4 1
4 3 7 1
>>> falsos
X Y PROB
0 2 4 1
1 3 5 1
2 3 2 1
5 2 4 2
6 2 3 2