Edit Dataframe pandas .groupby

2

I have the following code:

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
mydataset = dataiku.Dataset("PRD_edited_rejoined")
mydataset_df = mydataset.get_dataframe()'
mydataset_df[u"Válido"] = (mydataset_df.groupby([u"name"])
                      [u"RREF1", u"RREF2", u"RREF3"]
                      .transform("nunique", dropna=False)
                      .sum(axis=1) == 3
                     )

cols_names = [u'name', u'ref', u'RREF1', u'RREF2', u'RREF3', u'Válido']
my_cols_df = mydataset_df[cols_names]

The code is for grouping the items by name and then comparing if the RREF1, 2 and 3 are equal.

Well that's done and it returns " FALSE " when they are not equal by group of names.

---- Now what you have to do is as shown in the image change the values of RREF by group of names so that all are equal, in the case of the image would be to copy the values of the row 6 to row 7.

  • Any ideas on how to do that?
asked by Yan Chirino 24.04.2018 в 16:34
source

1 answer

2

If what you really want is that all the RREF columns for the same value of name have the same value (in this case the one of the first row that is), a very simple way without having to even create the validation column is to use appl and on each group and with loc assign all the rows in the group the value of the first one in those columns.

Let's first create a reproducible example:

from io import StringIO
import pandas as pd

csv =StringIO(u'''\
name,ref,RREF1,RREF2,RREF3
arandela carpeta,32,1,7,8
arandela carpeta,33,1,6,3
arandela carpeta,34,5,7,8
az,6166,5,3,10
az,6541,5,3,10
Babero Terry,6089,2,6,1
Babero Terry,8089,1,7,18
''')

df = pd.read_csv(csv)
print(df)

With what we have the following DataFrame:

               name   ref  RREF1  RREF2  RREF3
0  arandela carpeta    32      1      7      8
1  arandela carpeta    33      1      6      3
2  arandela carpeta    34      5      7      8
3                az  6166      5      3     10
4                az  6541      5      3     10
5      Babero Terry  6089      2      6      1
6      Babero Terry  8089      1      7     18

Applying the idea mentioned above:

def set_RREFs(g):
    cols = ["RREF1", "RREF2", "RREF3"]
    df.loc[g.index, cols] = g[cols].iloc[0].values

_ = df.groupby('name').apply(set_RREFs)

We get (the change is in-place ):

>>> df
               name   ref  RREF1  RREF2  RREF3
0  arandela carpeta    32      1      7      8
1  arandela carpeta    33      1      7      8
2  arandela carpeta    34      1      7      8
3                az  6166      5      3     10
4                az  6541      5      3     10
5      Babero Terry  6089      2      6      1
6      Babero Terry  8089      2      6      1
  • df.loc[g.index, cols] selects the rows that belong to the shift group in the DataFrame (those with the same value in name ) and for each row also selects the three RREF columns.

  • g[cols].iloc[0].values allows us to obtain the values of these three columns in the first row of the group.

Edit:

If you want to create a new DataFrame you just have to return the group with the relevant changes:

def set_RREFs(group):
    cols = ["RREF1", "RREF2", "RREF3"]
    group[cols] = group[cols].iloc[0].values
    return group

df2 = df.groupby('name').apply(set_RREFs)
>>> df
               name   ref  RREF1  RREF2  RREF3
0  arandela carpeta    32      1      7      8
1  arandela carpeta    33      1      6      3
2  arandela carpeta    34      5      7      8
3                az  6166      5      3     10
4                az  6541      5      3     10
5      Babero Terry  6089      2      6      1
6      Babero Terry  8089      1      7     18

>>> df2
               name   ref  RREF1  RREF2  RREF3
0  arandela carpeta    32      1      7      8
1  arandela carpeta    33      1      7      8
2  arandela carpeta    34      1      7      8
3                az  6166      5      3     10
4                az  6541      5      3     10
5      Babero Terry  6089      2      6      1
6      Babero Terry  8089      2      6      1
    
answered by 24.04.2018 / 19:25
source