How is the correct way to do an iterable function

Question

How is the correct way to do an iterable function

Navigation

1

I have been working for a few days with a function that operates with data of 2 document .csv . There are many steps, but basically what it does is take identifying numbers (ID) that were categorized from lowest to highest according to the duration characteristic (time that an event lasted) and then apply arithmetic operations from specific coordinates corresponding to the number of identification, that is to say that to each identification number the function is applied and as a result a single number is obtained, then it passes to the next one and produces the result and so on.

The function works !, but the problem is how I am getting the results, because I realize that what I do is:

call the function for the first ID

create a dataframe (with a single number that is the result)

Save as file .csv

Go through the following ID

I repeat the function and overwrite the file using append of dataframes

I think the correct way is to stack all the results in a single dataframe and finally guradarla.

My full code looks like this:

#Open the file: in green put the name of the file
df1=pd.read_csv('/Users/JonathanPacheco/Desktop/Spots in tracks statistics.csv')


df2 = df1.set_index(['TRACK_ID'])

q = df1.iloc[1,3]-df1.iloc[0,3]

#selection of tracks by duration


ef = pd.read_csv('/Users/JonathanPacheco/Desktop/Sort.csv')

ef1 = ef.set_index(['TRACK_DURATION'])

sets = ef1.loc[3:, 'TRACK_ID']   # Set trajectories to analyze from time X to the end <<<<<<<<<<<<<<<<<<<<<

M = sets.values.tolist()


for j in M:

    dfk = df2.loc[j] 
    dfT = dfk.iloc[:5] #clip trajectories at<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<



#Parameter input
    N = int(len(dfT))
    max_time = np.float(N*(q))   
    frames = np.float(max_time/N)
    t_step = frames

    data = pd.DataFrame({'N':[N],'max_time':[max_time],'frames':[frames]})

    #print(data)

    t=np.linspace(q, max_time, N) 

    #function to measure MSD (all displacement)
    def alldisplacement(dfT, t_step, coords=['POSITION_X', 'POSITION_Y']):


        tau = t.copy()
        shifts = np.divide(tau,t_step).astype(float)
        msds_sum = np.zeros(shifts.size)
        delta_inv = np.arange(N)
        delta = delta_inv[N-1::-1]




        for i, shift in enumerate(np.round(shifts,0)):
            diffs = dfT[coords] - dfT[coords].shift(-shift)
            sqdist = np.square(diffs).sum(axis=1)
            msds_sum[i] = sqdist.sum()
            msd = np.divide(msds_sum,delta)


        msds = pd.DataFrame({'msd':msd})
        return msds

    msd = alldisplacement(dfT, t_step, coords=['POSITION_X', 'POSITION_Y'])

    print(msd)    


#Saving files seccion

    b = msd.to_csv('/Users/JonathanPacheco/Desktop/MSD.csv', sep=',',mode='a')

    b = msd
    a = pd.read_csv('/Users/JonathanPacheco/Desktop/MSD.csv')
    c = pd.concat ([a,b],axis=1, ignore_index=True)
    c.to_csv('/Users/JonathanPacheco/Desktop/MSD.csv', sep=',', index=False)

the% co_of initial% is more or less like this:

        TRACK_ID   POSITION_X  POSITION_Y    POSITION_T
0            3       1.649       0.368       0.042
1            3       1.576       0.371       0.084
2            3       1.651       0.313       0.126
3            3       1.723       0.340       0.168
4            3       1.381       0.355       0.210
5           33       1.324       0.469       0.252
6           33       1.202       0.540       0.294
7           33       1.323       0.427       0.336
8           33       1.197       0.599       0.420
9           33       1.327       0.519       0.462
10          33       1.450       0.595       0.504
11          33       1.684       0.577       0.546
12          33       1.792       0.678       0.588
13          53       1.852       0.906       0.630
14          53       1.762       0.827       0.672
15          53       1.735       0.961       0.714
16          53       1.657       1.083       0.756
17          53       1.897       1.074       0.798
18          93       1.961       1.126       0.840
19          93       2.067       1.167       0.882
20          93       2.046       1.267       0.966
21          93       1.922       1.228       1.008
22          93       1.992       1.230       1.050
23          93       1.945       1.198       1.092
24          93       2.002       1.224       1.134
25          93       1.866       1.213       1.176
26          93       1.851       1.482       1.218

and the ef file is like this:

Unnamed:0 TRACK_ID  TRACK_DURATION  
0             3      7652                          
1            33      6676                          
2            53      5828                          
3            93     20008

You see a lot of code but only clarify that my problem is when trying to call the function, I do it with df1 for and I think it is not the most efficient way. Thanks in advance

loops python function pandas

asked by Jonathan Pacheco 22.07.2017 в 02:49

source

0 answers

Datatables footercallback does not work Notifications with android firebase