What is the correct way to save results of a function in a dataframe with python

0

I'm working with information on a Formula 1 grand prize, I use a function that calculates the time each driver takes in the pits during the race. In this way my function is iterable for each of the pilots. My question is, what is the most efficient way to save the computed results in a dataframe if the function is iterable?

So far I have managed to get the function to do what I have to do, but when I try to write the results for all the pilots in a single dataframe, I only get the result of the last pilot. Thanks in advance.

EDIT. My function looks like this:

def desplazamiento(dfT, t_step, coords=['POSITION_X', 'POSITION_Y']):


    N = int(len(dfT))
    max_time = np.float(N*(q))   
    frames = np.float(max_time/N)
    t_step = frames

    data = pd.DataFrame({'N':[N],'max_time':[max_time],'frames':[frames]})
    tau = t.copy()
    shifts = np.divide(tau,t_step).astype(float)
    msds_sum = np.zeros(shifts.size)
    delta_inv = np.arange(N)
    delta = delta_inv[N-1::-1]




    for i, shift in enumerate(np.round(shifts,0)):
        diffs = dfT[coords] - dfT[coords].shift(-shift)
        sqdist = np.square(diffs).sum(axis=1)
        msds_sum[i] = sqdist.sum()
        msd = np.divide(msds_sum,delta)


    msds = pd.DataFrame({'msd':msd})
    return msds

Pilots = [1,22,4,7,44,22,8,10,99,00,56,77]   
for j in Pilots:

    dfk = df2.loc[j] 
    dfT = dfk.iloc[:5]

    msd = desplazamiento(dfT, t_step, coords=['POSITION_X', 'POSITION_Y'])    
    print(msd)

and the dataframe that I want to analyze would be GPS information with coordinates data for each pilot:

                POSITION_X  POSITION_Y  POSITION_T
    Pilots                                    
    1              1.649       0.368       0.042
    1              1.576       0.371       0.084
    1              1.651       0.313       0.126
    1              1.723       0.340       0.168
    1              1.381       0.355       0.210
    1              1.324       0.469       0.252
   44              1.202       0.540       0.294
   44              1.323       0.427       0.336
   44              1.197       0.599       0.420
   44              1.327       0.519       0.462
   44              1.450       0.595       0.504
   44              1.684       0.577       0.546
   44              1.792       0.678       0.588
    5              1.852       0.906       0.630
    5              1.762       0.827       0.672
    5              1.735       0.961       0.714
    5              1.657       1.083       0.756
    5              1.897       1.074       0.798
    5              1.961       1.126       0.840
    5              2.067       1.167       0.882
    5              2.046       1.267       0.966
    5              1.922       1.228       1.008
    5              1.992       1.230       1.050
    5              1.945       1.198       1.092
    5              2.002       1.224       1.134
    5              1.866       1.213       1.176
    5              1.851       1.482       1.218
    5              1.600       1.724       1.260
    5              1.681       2.064       1.302
    
asked by Jonathan Pacheco 25.07.2017 в 02:55
source

1 answer

0

I will give you a conceptual idea of a possible solution to your question, by not having the real data, let's play with an example. Suppose we have the following Dataframe :

import pandas as pd
df = pd.DataFrame({'a': [100, 1000], 'b': [200, 2000], 'c': [300, 3000]})
df

      a     b     c
0   100   200   300
1  1000  2000  3000

And as in your question, we want to apply a certain function to generate a new column that we will define by way of example as: the sum of the column b minus each element of the column a plus each element of the column c .

The function could be something like this:

def funcion(row, df):
    return df["b"].sum() - row[0] + row[2]

Applying this function on the dataframe and generating a new column can be done like this:

df['d'] = df.apply(funcion, df=df, axis=1)

The result:

      a     b     c     d
0   100   200   300  2400 # 2200 - 100 + 300
1  1000  2000  3000  4200 # 2200 - 1000 + 3000

In your code, I understand, you should redesign desplazamiento to receive the dataframe complete, as I do in my example, and within the function do the search of the 5 readings per Piloto and the rest of the calculations, the important thing is that this function should return the value we want for the point cell of dataframe .

    
answered by 28.07.2017 / 16:13
source