a calculation is repeated in operations with columns in python

Question

a calculation is repeated in operations with columns in python

Navigation

#1 by (1 votes)

0

I'm trying to make subtractions between rows of 2 columns. For this I use a function that subtracts row1-row2, squares it and takes it out of square root to eliminate negative sign. Then it goes to row2-row3 and so on until the end of the data. Later the function calculates the same but for a separation of 2 rows (row1-row3), 3 rows and so on up to rows-1. The results are saved in dataFrame. The data that I am working with is arranged like this:

df1
Out[44]: 
   TRACK_ID  POSITION_X  POSITION_Y  POSITION_T
0         0           1           1       35.36
1         0           2           2       35.52
2         0           3           3       35.68
3         0           4           4       35.84
4         0           1           1       35.36
5         0           4           3       34.88
6         0           2           3       34.40
7         0           6           4       33.92
8         0           4           2       33.44

The function seems to work, the problem is that I realized that for some values of separation between subtractions, I repeated exactly the same values of the previous column. Example:

rad
Out[28]: 
          0         1         2         3         4         5         6  \
0  1.414214  2.828427  4.242641  0.000000  3.605551  2.236068  2.236068   
1  1.414214  2.828427  1.414214  2.236068  1.000000  4.472136  4.472136   
2  1.414214  2.828427  1.000000  1.000000  3.162278  1.414214  1.414214   
3  4.242641  1.000000  2.236068  2.000000  2.000000       NaN       NaN   
4  3.605551  2.236068  5.830952  3.162278       NaN       NaN       NaN   
5  2.000000  2.236068  1.000000       NaN       NaN       NaN       NaN   
6  4.123106  2.236068       NaN       NaN       NaN       NaN       NaN   
7  2.828427       NaN       NaN       NaN       NaN       NaN       NaN   
8       NaN       NaN       NaN       NaN       NaN       NaN       NaN   

          7   8  
0  3.162278 NaN  
1       NaN NaN  
2       NaN NaN  
3       NaN NaN  
4       NaN NaN  
5       NaN NaN  
6       NaN NaN  
7       NaN NaN  
8       NaN NaN

Column 5 and 6 is the same.

This is my complete code:

df1 = df[['TRACK_ID','POSITION_X','POSITION_Y','POSITION_T']].copy()



#Parameter input

N = df1.groupby('TRACK_ID').size()          
max_time = N*(0.160)
frames = max_time/N
t_step=frames.item()


data = pd.DataFrame({'N':N,'max_time':max_time,'frames':frames})

print(data)

t=np.linspace(0.160, max_time.item(), N)



#funcion para calcular las diferencias
def radial(df1, coords=['POSITION_X', 'POSITION_Y']):


        tau = t.copy()
        shifts = np.divide(tau,t_step).astype(float) #matrix que se ocupa para construir las diferencias entre valores de filas
        print(shifts)
        radials = list()

        for i, shift in enumerate(shifts):
            diffs = np.array(df1[coords] - df1[coords].shift(-shift))
            sqdist = np.square(diffs).sum(axis=1)
            r = np.sqrt(sqdist)
            radials.append(r)


        radial_disp = pd.DataFrame({'radials':radials})
        return radials


radial_d = radial(df1, coords=['POSITION_X', 'POSITION_Y'])

radd = pd.DataFrame.from_records(radial_d) #horizontal
rad = radd.transpose() #vertical

I already modified some parts of the function and had realized that my variable shifts that establishes the separation between the subtractions gave me repeated results because they were as int fix it by putting float but the result is still the same. Why is the calculation repeated for the same column? Thanks for reading my post

python pandas

asked by Jonathan Pacheco 15.05.2017 в 16:24

source

1 answer

HTTP State 404 - / applicationWEB / the resource is not available How to scroll through a list and display the data by console in QT creator c ++

score 1 · Accepted Answer

The fault is with the array shifts and the way to calculate it, the floats by their internal representation using floating point suffer changes in their accuracy since all the values can not be represented with accuracy in binary. You can find a lot of information on the web about this, a very technical and detailed one is (in English):

What Every Computer Scientist Should Know About Floating-Point Arithmetic

In your specific case you create the array t (and tau since it is a copy) by numpy.linspace() using increments of 0.16. In theory, the array should be:

tau = [ 0.16, 0.32, 0.48, 0.64, 0.80, 0.96, 1.12, 1.28, 1.44]

shifts is created by dividing that array by 0.16, it could be thought that this should always give integer values since the dividend is always a multiple of the divisor. The problem is that some of these values can not be represented accurately in floating point.

If you print t or tau increasing the accuracy of print you will notice the problem:

>>> np.set_printoptions(precision=16)
>>> print(t)
[ 0.16                0.32                0.48                0.64                0.8
  0.9600000000000001  1.1199999999999999  1.28                1.4399999999999999 ]

This leads to that in your shifts 7.0 is actually 6.999999 (1.119999999999999 / 0.16):

>>> print(shifts)
[ 1.                  2.                  3.                  4.                  5.
  6.                  6.9999999999999991  8.                  9. ]

The shift method takes into account the whole part only (floor function) which leads to df1[coords].shift(-6.9) being the same as df1[coords].shift(-6.0) . In this case if you use float 32 instead of float 64 for t the problem is 'corrected':

t = np.linspace(0.160, max_time.item(), N, dtype= np.float32)

This is not the solution really, for other values this may change.

To solve this, you should pass values correctly corrected to shift . The problem can be solved by rounding to the nearest integer, for example with numpy.rint() .

def radial(df1, coords=['POSITION_X', 'POSITION_Y']):
        tau = t.copy()
        shifts = np.rint(np.divide(tau, t_step))
        radials = list()

        for i, shift in enumerate(shifts):

            diffs = np.array(df1[coords] - df1[coords].shift(-shift))
            sqdist = np.square(diffs).sum(axis=1)
            r = np.sqrt(sqdist)
            radials.append(r)


        radial_disp = pd.DataFrame({'radials':radials})
        return radials

With what we get:

          0         1         2         3         4         5         6  \
0  1.414214  2.828427  4.242641  0.000000  3.605551  2.236068  5.830952   
1  1.414214  2.828427  1.414214  2.236068  1.000000  4.472136  2.000000   
2  1.414214  2.828427  1.000000  1.000000  3.162278  1.414214       NaN   
3  4.242641  1.000000  2.236068  2.000000  2.000000       NaN       NaN   
4  3.605551  2.236068  5.830952  3.162278       NaN       NaN       NaN   
5  2.000000  2.236068  1.000000       NaN       NaN       NaN       NaN   
6  4.123106  2.236068       NaN       NaN       NaN       NaN       NaN   
7  2.828427       NaN       NaN       NaN       NaN       NaN       NaN   
8       NaN       NaN       NaN       NaN       NaN       NaN       NaN   

          7   8  
0  3.162278 NaN  
1       NaN NaN  
2       NaN NaN  
3       NaN NaN  
4       NaN NaN  
5       NaN NaN  
6       NaN NaN  
7       NaN NaN  
8       NaN NaN