If what you really want is that all the RREF columns for the same value of name
have the same value (in this case the one of the first row that is), a very simple way without having to even create the validation column is to use appl
and on each group and with loc
assign all the rows in the group the value of the first one in those columns.
Let's first create a reproducible example:
from io import StringIO
import pandas as pd
csv =StringIO(u'''\
name,ref,RREF1,RREF2,RREF3
arandela carpeta,32,1,7,8
arandela carpeta,33,1,6,3
arandela carpeta,34,5,7,8
az,6166,5,3,10
az,6541,5,3,10
Babero Terry,6089,2,6,1
Babero Terry,8089,1,7,18
''')
df = pd.read_csv(csv)
print(df)
With what we have the following DataFrame:
name ref RREF1 RREF2 RREF3
0 arandela carpeta 32 1 7 8
1 arandela carpeta 33 1 6 3
2 arandela carpeta 34 5 7 8
3 az 6166 5 3 10
4 az 6541 5 3 10
5 Babero Terry 6089 2 6 1
6 Babero Terry 8089 1 7 18
Applying the idea mentioned above:
def set_RREFs(g):
cols = ["RREF1", "RREF2", "RREF3"]
df.loc[g.index, cols] = g[cols].iloc[0].values
_ = df.groupby('name').apply(set_RREFs)
We get (the change is in-place ):
>>> df
name ref RREF1 RREF2 RREF3
0 arandela carpeta 32 1 7 8
1 arandela carpeta 33 1 7 8
2 arandela carpeta 34 1 7 8
3 az 6166 5 3 10
4 az 6541 5 3 10
5 Babero Terry 6089 2 6 1
6 Babero Terry 8089 2 6 1
-
df.loc[g.index, cols]
selects the rows that belong to the shift group in the DataFrame (those with the same value in name
) and for each row also selects the three RREF columns.
-
g[cols].iloc[0].values
allows us to obtain the values of these three columns in the first row of the group.
Edit:
If you want to create a new DataFrame you just have to return the group with the relevant changes:
def set_RREFs(group):
cols = ["RREF1", "RREF2", "RREF3"]
group[cols] = group[cols].iloc[0].values
return group
df2 = df.groupby('name').apply(set_RREFs)
>>> df
name ref RREF1 RREF2 RREF3
0 arandela carpeta 32 1 7 8
1 arandela carpeta 33 1 6 3
2 arandela carpeta 34 5 7 8
3 az 6166 5 3 10
4 az 6541 5 3 10
5 Babero Terry 6089 2 6 1
6 Babero Terry 8089 1 7 18
>>> df2
name ref RREF1 RREF2 RREF3
0 arandela carpeta 32 1 7 8
1 arandela carpeta 33 1 7 8
2 arandela carpeta 34 1 7 8
3 az 6166 5 3 10
4 az 6541 5 3 10
5 Babero Terry 6089 2 6 1
6 Babero Terry 8089 2 6 1