We will start from a DataFrame first as you show to be able to reproduce the problem:
import pandas as pd
data = {"name": ("Babero", "Babero Terry", "Babero Terry", "Baly", "Baly", "Baly", "Barilla Metalica", "Base Para Portátil", "Base Para Portátil", "Base Para Portátil"),
"ref": (8090, 6089, 8089, 3045, 3046, 3047, 141, 7188, 7190, 7191),
"RREF1": (1, 2, 1, 1, 1, 1, 1, 1, 1, 1),
"RREF2": (7, 6, 7, 7, 7, 0, 6, 7, 7, 7),
"RREF3": (18, 1, 18, 15, 1, 0, 9, 18, 18, 18)
}
mydataset_df = pd.DataFrame(data)
One option is to use transform
next to pandas.DataFrame.nunique
to get what you want:
mydataset_df["Válido"] = (mydataset_df.groupby(["name"], as_index=False)
["RREF1", "RREF2", "RREF3"]
.transform("nunique", dropna=False)
.sum(axis=1) == 3
)
The nunique
method returns the number of unique values for each RREF column within each group. The rows of the groups that have all their RREF columns equal will have 1 as a value in all the columns returned by nunique
, so the sum of their columns will be 3.
The previous code gives us the following result:
>>> mydataset_df
name ref RREF1 RREF2 RREF3 Válido
0 Babero 8090 1 7 18 True
1 Babero Terry 6089 2 6 1 False
2 Babero Terry 8089 1 7 18 False
3 Baly 3045 1 7 15 False
4 Baly 3046 1 7 1 False
5 Baly 3047 1 0 0 False
6 Barilla Metalica 141 1 6 9 True
7 Base Para Portátil 7188 1 7 18 True
8 Base Para Portátil 7190 1 7 18 True
9 Base Para Portátil 7191 1 7 18 True
Both Terry Babe and Baly are not validated since their RREF values are not equal in all rows.