How is the correct way to do an iterable function

1

I have been working for a few days with a function that operates with data of 2 document .csv . There are many steps, but basically what it does is take identifying numbers (ID) that were categorized from lowest to highest according to the duration characteristic (time that an event lasted) and then apply arithmetic operations from specific coordinates corresponding to the number of identification, that is to say that to each identification number the function is applied and as a result a single number is obtained, then it passes to the next one and produces the result and so on.

The function works !, but the problem is how I am getting the results, because I realize that what I do is:

  • call the function for the first ID
  • create a dataframe (with a single number that is the result)
  • Save as file .csv
  • Go through the following ID
  • I repeat the function and overwrite the file using append of dataframes
  • I think the correct way is to stack all the results in a single dataframe and finally guradarla.

    My full code looks like this:

    #Open the file: in green put the name of the file
    df1=pd.read_csv('/Users/JonathanPacheco/Desktop/Spots in tracks statistics.csv')
    
    
    df2 = df1.set_index(['TRACK_ID'])
    
    q = df1.iloc[1,3]-df1.iloc[0,3]
    
    #selection of tracks by duration
    
    
    ef = pd.read_csv('/Users/JonathanPacheco/Desktop/Sort.csv')
    
    ef1 = ef.set_index(['TRACK_DURATION'])
    
    sets = ef1.loc[3:, 'TRACK_ID']   # Set trajectories to analyze from time X to the end <<<<<<<<<<<<<<<<<<<<<
    
    M = sets.values.tolist()
    
    
    for j in M:
    
        dfk = df2.loc[j] 
        dfT = dfk.iloc[:5] #clip trajectories at<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    
    
    
    #Parameter input
        N = int(len(dfT))
        max_time = np.float(N*(q))   
        frames = np.float(max_time/N)
        t_step = frames
    
        data = pd.DataFrame({'N':[N],'max_time':[max_time],'frames':[frames]})
    
        #print(data)
    
        t=np.linspace(q, max_time, N) 
    
        #function to measure MSD (all displacement)
        def alldisplacement(dfT, t_step, coords=['POSITION_X', 'POSITION_Y']):
    
    
            tau = t.copy()
            shifts = np.divide(tau,t_step).astype(float)
            msds_sum = np.zeros(shifts.size)
            delta_inv = np.arange(N)
            delta = delta_inv[N-1::-1]
    
    
    
    
            for i, shift in enumerate(np.round(shifts,0)):
                diffs = dfT[coords] - dfT[coords].shift(-shift)
                sqdist = np.square(diffs).sum(axis=1)
                msds_sum[i] = sqdist.sum()
                msd = np.divide(msds_sum,delta)
    
    
            msds = pd.DataFrame({'msd':msd})
            return msds
    
        msd = alldisplacement(dfT, t_step, coords=['POSITION_X', 'POSITION_Y'])
    
        print(msd)    
    
    
    #Saving files seccion
    
        b = msd.to_csv('/Users/JonathanPacheco/Desktop/MSD.csv', sep=',',mode='a')
    
        b = msd
        a = pd.read_csv('/Users/JonathanPacheco/Desktop/MSD.csv')
        c = pd.concat ([a,b],axis=1, ignore_index=True)
        c.to_csv('/Users/JonathanPacheco/Desktop/MSD.csv', sep=',', index=False)
    

    the% co_of initial% is more or less like this:

            TRACK_ID   POSITION_X  POSITION_Y    POSITION_T
    0            3       1.649       0.368       0.042
    1            3       1.576       0.371       0.084
    2            3       1.651       0.313       0.126
    3            3       1.723       0.340       0.168
    4            3       1.381       0.355       0.210
    5           33       1.324       0.469       0.252
    6           33       1.202       0.540       0.294
    7           33       1.323       0.427       0.336
    8           33       1.197       0.599       0.420
    9           33       1.327       0.519       0.462
    10          33       1.450       0.595       0.504
    11          33       1.684       0.577       0.546
    12          33       1.792       0.678       0.588
    13          53       1.852       0.906       0.630
    14          53       1.762       0.827       0.672
    15          53       1.735       0.961       0.714
    16          53       1.657       1.083       0.756
    17          53       1.897       1.074       0.798
    18          93       1.961       1.126       0.840
    19          93       2.067       1.167       0.882
    20          93       2.046       1.267       0.966
    21          93       1.922       1.228       1.008
    22          93       1.992       1.230       1.050
    23          93       1.945       1.198       1.092
    24          93       2.002       1.224       1.134
    25          93       1.866       1.213       1.176
    26          93       1.851       1.482       1.218
    

    and the ef file is like this:

    Unnamed:0 TRACK_ID  TRACK_DURATION  
    0             3      7652                          
    1            33      6676                          
    2            53      5828                          
    3            93     20008                          
    

    You see a lot of code but only clarify that my problem is when trying to call the function, I do it with df1 for and I think it is not the most efficient way. Thanks in advance

        
    asked by Jonathan Pacheco 22.07.2017 в 04:49
    source

    0 answers