Count values in multiple columns in R?

0

This is the first time I ask around, since I have not been able to solve a very basic problem.

  • I have a dataframe with 4 categorical variables (columns) (identical categories).
  • I'm looking for a way to count how many times each category appears in the 4 variables at once. (as a set of multiple responses in SPSS).
  • Any suggestions? the Count and Table R function only allow one column at a time.

    Thank you!

        
    asked by user10187828 08.08.2018 в 22:29
    source

    1 answer

    0

    Your question is not clear and I subscribe to the suggestion that you modify it to clarify it. It would be very helpful if you present an example of your data.

    So I understand what you want to get the same result that you get with table() for each of the columns, only that it comes out all at once. That is, for each column shows you the counts of each category. That's relatively easy to do.

    Create the data

    As you do not include data in your question I will create ones that have a structure similar to the ones you describe: four columns with categorical variables (in R: factors) with the same categories. In this example they are dichotomous, Yes and No. By convention I will call it df

    df <- data.frame(columna1 = c("Sí", "No", "Sí", "Sí", "No", "Sí", "Sí", "No"), 
                     columna2 = c("Sí", "No", "Sí", "Sí", "Sí", "Sí", "Sí", "Sí"),
                     columna3 = c("No", "No", "No", "No", "No", "No", "No", "Sí"), 
                     columna4 = c("Sí", "Sí", "Sí", "Sí", "Sí", "Sí", "Sí", "No"))
    

    Apply a function to each column

    The function table() is the base function of R for making counts. As a case of very frequent use in data analysis is to make contingency tables (counts of a categorical variable conditional to another (s)) table() tries to make those "crosses", unless we pass only one variable or column . To achieve what I think you want then you must pass the columns one by one to table() . You can do it manually by entering a sequence of calls: table(df$columna1) , table(df$columna1) and so on.

    A faster way is to use an auxiliary function. In this case it would serve lapply() . lapply() takes a list ( df is a data.frame, but also a list) and passes the function that we indicate to each element. In this case, your list / data.frame is df and the function is table . Strangely, we should not use parentheses next to the function, because that would confuse lapply() .

    Result

    lapply(df, table)
    
    $columna1
    
    No Sí 
    3  5 
    
    $columna2
    
    No Sí 
    1  7 
    
    $columna3
    
    No Sí 
    7  1 
    
    $columna4
    
    No Sí 
    1  7  
    

    The result is a list of tables, which is printed on the screen.

      

    PS: if you are looking for a contingency table with four dimensions, clarify it in the question. You can always edit it.

        
    answered by 09.08.2018 в 02:19