I want to count the number of times a data appears in a column in R

Question

I want to count the number of times a data appears in a column in R

Navigation

#1 by (1 votes)
#2 by (0 votes)
#3 by (0 votes)

1

I have a variable in a data frame that has 2 million rows and around 50,000 different values, I want to know the amount of NULL that exist in this column, I used table (data $ variable) the problem is that being 50,000 different values shows me the first 100 only and omits the rest, besides I only want to know the amount of NULL, how can I see how many NULL exist in this column?

r

asked by googolplex 29.10.2018 в 17:17

source

3 answers

add counter value to the input id Syntax Error; expected but else found (Pascal)

score 1 · Answer 1

Understanding that "NULL" is a string, you can take advantage of that

data$variable == "NULL"

returns a logical vector, where the TRUE would be the rows where the aforementioned condition is actually fulfilled. Thanks to the coercion that R makes of the logical value TRUE to 1 ( FALSE would be 0), another super simple way is to directly use sum() :

sum(data$variable == "NULL")

score 0 · Answer 2

I assume that "NULL" is a string of characters, otherwise those rows would not exist. In that case with R base you could do something like this:

length(data$variable[data$variable == "NULL"])

That gives you the length of the vector subset of data$variable when data$variable is equal to the character string "NULL" .

If it does not work it would be important that you put in the question an example of the data, to see what type of variable it is (character, factor, etc.) and to what type of value NULL refers.

score 0 · Answer 3

You can transform NA values to 0 in R and use it to apply functions to data.frame with the sapply function.

Example:

x=c(1,23,5,9,0,NA)
y=c(5,45,NA,78,NA,34)

dataf=data.frame(cbind(x,y))
mean(dataf$x,na.rm=TRUE)
mean(dataf$y,na.rm=TRUE)
#Podría interesarnos tener en cuenta los NAs
sum(dataf$x,na.rm=TRUE)/nrow(dataf)
sum(dataf$y,na.rm=TRUE)/nrow(dataf)

In this case we have a data.frame with two variables that contain missing values, we want to create a function that passes these values to 0 and apply it to the starting data.frame:

haz.cero.na=function(x){
ifelse(is.na(x),0,x)}
dataf.2=data.frame(sapply(dataf,haz.cero.na))
dataf
dataf.2