Flatten matrix in R

Question

Flatten matrix in R

Navigation

#1 by (2 votes)
#2 by (1 votes)

1

I have a dataframe with the following format:

And I would like to get a dataframe with the following format:

That is, the unique names would be the identifiers by rows and the unique events the identifiers by columns, so that you can see, for example, how many times the event1 has made name1 without changing the row.

I'm trying to run dataframe 1 with for loops to build dataframe 2 but I can not see it.

Any ideas? Thanks!

r

asked by Zeta 18.09.2018 в 11:41

source

2 answers

Maintain value in inputs Problem when opening txt hosted on server

score 2 · Answer 1

Perhaps the simplest ways to solve it is to use dplyr/magrittr or what is the same the metapackage tidyverse :

df <- data.frame(NAMES=c("name1", "name1", "name1", "name2", "name2", "name3"),
                 EVENTS=c("event1", "event2", "event3", "event2", "event1", "event3"),
                 N=c(3, 6, 8, 2, 1, 4))

library(tidyverse)

# Expandimos los eventos en columnas y rellenamos con 0
df %>%
    spread(key=EVENTS, value=N, fill = 0) 

  NAMES event1 event2 event3
1 name1      3      6      8
2 name2      1      2      0
3 name3      0      0      4

Eventually, if you would like to solve it with R base, it is quite simple too, using reshape() although a little less clear to understand:

dfnew <- reshape(df, direction = "wide", idvar="NAMES", timevar="EVENTS")    
dfnew[,-1][is.na(dfnew[, -1])] <- 0

With reshape(df, direction = "wide", idvar="NAMES", timevar="EVENTS") we indicate that we are going to change from a long to a wide format ( direction = "wide" ). The column that we will not expand will be idvar="NAMES" and the one that will be expanded will be timevar="EVENTS" .
dfnew[,-1][is.na(dfnew[, -1])] <- 0 replaces the NA only in the columns of the events.

score 1 · Answer 2

This is the typical case of the melt and dcast functions (they are inverses to each other) of the reshape2 library ( link ). In your case you have to use dcast, if your dataframe is called df:

library(reshape2)
df <- data.frame(NAMES=c("n1","n1","n1","n2","n2","n3"),EVENTS=c("ev1","ev2","ev3","ev2","ev1","ev3"),N=c(3,6,8,2,1,4))
df2 <- dcast(df,NAMES ~ EVENTS, value.var = "N")
df2[is.na(df2)]<-0