I have two NumPy arrays called follow_dismiss_i
and follow_dismiss_display_i
that contain a first column that contains counters and a second column that are indexes.
I have created a program that helps me calculate:
-
The result of
SUM_follow_dismiss
andSUM_follow_dismiss_display
which are the sums of the second columns offollow_dismiss_i
andfollow_dismiss_display_i
respectively. -
I get an array called
m_i
which is the result of dividing the first column offollow_dismiss_i
between its counterpart offollow_dismiss_display_i
by using the indexes (second column). If an index exists infollow_dismiss_i
but not infollow_dismiss_display_i
, inm_i
that index is associated with a value of 0.0. -
The variance of the array
m_i
.
I also calculate the average, m
but it turns out that I get 0.517134831461, as you can see in the output of my code, while on the calculator I have 0.63567076
.
I try to understand why there are these differences and if there is a much simpler method to do it.
This is my code:
#!/usr/bin/python
#
# Small script for some stats
#
import traceback
import psycopg2
import numpy as np
import pandas as pd
# pueden necesitar los arrays siguients que estan en el output :
print "follow_dismiss_i"
print follow_dismiss_i
print "SUM_follow_dismiss"
print SUM_follow_dismiss
print "follow_dismiss_display_i"
print follow_dismiss_display_i
print "SUM_follow_dismiss_display"
print SUM_follow_dismiss_display
m = float(SUM_follow_dismiss)/ SUM_follow_dismiss_display
print ("\nmean m")
print m
m_i=[]
print "\nvariance"
for j in range(len(follow_dismiss_display_i)):
new = []
found = 0
for i in range(len(follow_dismiss_i)):
if follow_dismiss_display_i[j,1]==follow_dismiss_i[i,1]:
new.append(follow_dismiss_display_i[j,1])
new.append(float(follow_dismiss_i[i,0])/follow_dismiss_display_i[j,0])
m_i.append(new)
found=1
break
if found == 0:
new.append(follow_dismiss_display_i[j,1])
new.append(0.0)
m_i.append(new)
test = np.array(m_i)
print test[:,1]
variance_eclipse = np.var(test[:,1])
print variance_eclipse
Here is the output in case you need it to reproduce the program with the same data:
follow_dismiss_i
[[505 13]
[ 14 54]
[ 70 68]
[ 21 150]
[ 36 152]
[ 62 156]
[ 59 158]
[120 160]
[ 53 161]
[150 162]
[ 3 169]
[ 1 171]
[ 60 172]
[ 1 177]
[126 179]
[ 41 185]
[239 189]
[163 190]
[ 26 216]
[ 42 223]
[ 1 272]
[ 2 286]
[ 5 289]
[ 1 292]
[ 2 294]
[ 6 296]
[ 25 306]
[ 7 312]]
SUM_follow_dismiss
1841
follow_dismiss_display_i
[[986 13]
[ 20 54]
[484 68]
[ 57 150]
[ 44 152]
[ 95 156]
[ 89 158]
[144 160]
[ 58 161]
[383 162]
[ 3 169]
[ 2 171]
[125 172]
[ 1 177]
[147 179]
[ 61 185]
[325 189]
[334 190]
[ 46 216]
[ 71 223]
[ 1 272]
[ 2 276]
[ 9 286]
[ 5 289]
[ 1 292]
[ 2 294]
[ 10 296]
[ 27 306]
[ 16 312]
[ 12 315]]
SUM_follow_dismiss_display
3560
mean
0.517134831461
variance
[ 0.51217039 0.7 0.1446281 0.36842105 0.81818182 0.65263158
0.66292135 0.83333333 0.9137931 0.39164491 1. 0.5 0.48
1. 0.85714286 0.67213115 0.73538462 0.48802395 0.56521739
0.5915493 1. 0. 0.22222222 1. 1. 1.
0.6 0.92592593 0.4375 0. ]
0.0858073520518