I have a question, and that is that I have 2 dataset, one is AdultTest and another AdultData.
In those dataset you have many rows of this type:
39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Female , 2174, 0, 40, United-States, >50K
and I would like to calculate the probability that a "Female" has more than> 50K, for this I did the following:
#Lee AdultData.csv y lo pone como Integer, así puede calcular el naiveBAyes data1= np.genfromtxt('AdultData.csv',delimiter=',', dtype='int',skip_footer=1) datatest=np.genfromtxt('adultTest.csv',delimiter=',', dtype='int',skip_footer=1) #Borra la ultima columna, porque esa es el target data_new=np.delete(data2, 14, 1) dataTest_new=np.delete(datatest, 14, 1) Class =[row for row in data2] from sklearn.naive_bayes import BernoulliNB clf= BernoulliNB() clf.fit(data_new, Class) print(clf.predict_proba(data_new)) # print(clf.predict_proba(dataTest_new))
and as a result of the probability prediction it always gives me:
I do not understand why, even if I put the AdultTest one, the same results come out, even though it has other data, because I do not get other results? What does the 2 columns mean?
could someone help me?
Greetings and thanks in advance!