I have been working for a few days with a function that operates with data of 2 document .csv
. There are many steps, but basically what it does is take identifying numbers (ID) that were categorized from lowest to highest according to the duration characteristic (time that an event lasted) and then apply arithmetic operations from specific coordinates corresponding to the number of identification, that is to say that to each identification number the function is applied and as a result a single number is obtained, then it passes to the next one and produces the result and so on.
The function works !, but the problem is how I am getting the results, because I realize that what I do is:
.csv
append
of dataframes I think the correct way is to stack all the results in a single dataframe and finally guradarla.
My full code looks like this:
#Open the file: in green put the name of the file
df1=pd.read_csv('/Users/JonathanPacheco/Desktop/Spots in tracks statistics.csv')
df2 = df1.set_index(['TRACK_ID'])
q = df1.iloc[1,3]-df1.iloc[0,3]
#selection of tracks by duration
ef = pd.read_csv('/Users/JonathanPacheco/Desktop/Sort.csv')
ef1 = ef.set_index(['TRACK_DURATION'])
sets = ef1.loc[3:, 'TRACK_ID'] # Set trajectories to analyze from time X to the end <<<<<<<<<<<<<<<<<<<<<
M = sets.values.tolist()
for j in M:
dfk = df2.loc[j]
dfT = dfk.iloc[:5] #clip trajectories at<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
#Parameter input
N = int(len(dfT))
max_time = np.float(N*(q))
frames = np.float(max_time/N)
t_step = frames
data = pd.DataFrame({'N':[N],'max_time':[max_time],'frames':[frames]})
#print(data)
t=np.linspace(q, max_time, N)
#function to measure MSD (all displacement)
def alldisplacement(dfT, t_step, coords=['POSITION_X', 'POSITION_Y']):
tau = t.copy()
shifts = np.divide(tau,t_step).astype(float)
msds_sum = np.zeros(shifts.size)
delta_inv = np.arange(N)
delta = delta_inv[N-1::-1]
for i, shift in enumerate(np.round(shifts,0)):
diffs = dfT[coords] - dfT[coords].shift(-shift)
sqdist = np.square(diffs).sum(axis=1)
msds_sum[i] = sqdist.sum()
msd = np.divide(msds_sum,delta)
msds = pd.DataFrame({'msd':msd})
return msds
msd = alldisplacement(dfT, t_step, coords=['POSITION_X', 'POSITION_Y'])
print(msd)
#Saving files seccion
b = msd.to_csv('/Users/JonathanPacheco/Desktop/MSD.csv', sep=',',mode='a')
b = msd
a = pd.read_csv('/Users/JonathanPacheco/Desktop/MSD.csv')
c = pd.concat ([a,b],axis=1, ignore_index=True)
c.to_csv('/Users/JonathanPacheco/Desktop/MSD.csv', sep=',', index=False)
the% co_of initial% is more or less like this:
TRACK_ID POSITION_X POSITION_Y POSITION_T
0 3 1.649 0.368 0.042
1 3 1.576 0.371 0.084
2 3 1.651 0.313 0.126
3 3 1.723 0.340 0.168
4 3 1.381 0.355 0.210
5 33 1.324 0.469 0.252
6 33 1.202 0.540 0.294
7 33 1.323 0.427 0.336
8 33 1.197 0.599 0.420
9 33 1.327 0.519 0.462
10 33 1.450 0.595 0.504
11 33 1.684 0.577 0.546
12 33 1.792 0.678 0.588
13 53 1.852 0.906 0.630
14 53 1.762 0.827 0.672
15 53 1.735 0.961 0.714
16 53 1.657 1.083 0.756
17 53 1.897 1.074 0.798
18 93 1.961 1.126 0.840
19 93 2.067 1.167 0.882
20 93 2.046 1.267 0.966
21 93 1.922 1.228 1.008
22 93 1.992 1.230 1.050
23 93 1.945 1.198 1.092
24 93 2.002 1.224 1.134
25 93 1.866 1.213 1.176
26 93 1.851 1.482 1.218
and the ef file is like this:
Unnamed:0 TRACK_ID TRACK_DURATION
0 3 7652
1 33 6676
2 53 5828
3 93 20008
You see a lot of code but only clarify that my problem is when trying to call the function, I do it with df1
for
and I think it is not the most efficient way. Thanks in advance