Loops nested in R

2

I have 3 dataframes that contain the following information:

df1

COD      LON       LAT       ALT
C037    -289.976   432.165   162
E000    -274.107   430.783   218
C068    -228.623   428.395   596

df2 #So many data by COD as dates (which are contained in the df3)

 C037    C038    G0E7    G0E9    G0EA    G0F0    G0F4    G0E5    G0B6    G0C1    
-9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999    
1.456   -9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999
-9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999
-9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999   -9999   

df3 #Dates from 01/01/2000 until 12/31/2015 (5844 record)

Date
01/01/2000
02/01/2000
03/01/2000
04/01/2000
05/01/2000
06/01/2000
07/01/2000
08/01/2000
09/01/2000
10/01/2000
11/01/2000
12/01/2000
13/01/2000
14/01/2000

I have to loop to create a final matrix that contains

FECHA        COD   ALT        LAT        LON  datosdf2
01/01/2000   C037  -289.976   432.165    162   -9999
02/01/2000   C037  -289.976   432.165    162   1.456 

In this way, fill in the records of the df2 for each day until 12/31/2015.

01/01/2000   E000   -274.107   430.783   218   -9999 

Do the same for 200 more codes

I'm trying something like that, but honestly I do not know if it's okay or not, I've never done this kind of thing in R

I think I have to do something like this:

for (i in 1:nrow(df1)) 
  for (j in 1:ncol(df2)){
   if (df1(i,1)==(df2(1,j)){ #Esto es, si el COD del df1 coincide con el COD del df 2 (que estaría en la fila 1, columna 1), entonces que tome los valores registrados en el df2
   print ("i is 1", "1 is j")

At this point I do not know how to instruct you to take the dates records and add them to the matrix ... I also do not know how to tell the program that when I have to take the next COD, I added the dates from 01 / 01/2000 without crushing those that you have registered before.

I hope you understand what I want to say and what I should get at the end. Thanks for the help you can give me.

    
asked by Caro 15.05.2018 в 10:50
source

3 answers

1

Hello the following code can help you, the tidyverse is used instead of loops

library(tidyverse)

dt1<-tribble(
   ~COD ,     ~LON,       ~LAT,       ~ALT,
   "C037",    -289.976,   432.165,   162,
   "E000",    -274.107,   430.783,  218,
   "C038",    -228.623,   428.395,   596#Agregue C038
 )

 dt2<-tribble(#una parte de tu ejemplo
   ~C037,    ~C038,     
   -9999,   -9999,       
   1.456,   -9999,   
   -9999,   -9999,   
   -9999,   -9999 
 )

 dt3<-tribble(
   ~Date,
   "01/01/2000",
   "02/01/2000",
   "03/01/2000",
  "04/01/2000"
 )

 #Primero si hay tantas fechas como datos en dt2, los combinamos

 dt4<-cbind(dt3,dt2)
 dt4
        Date      C037  C038
1 01/01/2000 -9999.000 -9999
2 02/01/2000     1.456 -9999
3 03/01/2000 -9999.000 -9999
4 04/01/2000 -9999.000 -9999

 n<-ncol(dt4)#Numero de columnas en dt4
 #Pasamos los nombres de las columnas a una nueva variable "COD"
 #Las observaciones de cada COD por fecha se pasan a la variable "VALUE"
 dt4<-gather(dt4, "COD", "VALUE", 2:n)
 dt4
        Date  COD     VALUE
1 01/01/2000 C037 -9999.000
2 02/01/2000 C037     1.456
3 03/01/2000 C037 -9999.000
4 04/01/2000 C037 -9999.000
5 01/01/2000 C038 -9999.000
6 02/01/2000 C038 -9999.000
7 03/01/2000 C038 -9999.000
8 04/01/2000 C038 -9999.000

 #Agregamos dagtos de lat, lon, alt por cordenada
 #La siguiente funcion agrega los datos sólo a las cordenadas presentes en dt4
 dt5<-left_join(dt4,dt1,"COD")
 dt5
        Date  COD     VALUE      LON     LAT ALT
1 01/01/2000 C037 -9999.000 -289.976 432.165 162
2 02/01/2000 C037     1.456 -289.976 432.165 162
3 03/01/2000 C037 -9999.000 -289.976 432.165 162
4 04/01/2000 C037 -9999.000 -289.976 432.165 162
5 01/01/2000 C038 -9999.000 -228.623 428.395 596
6 02/01/2000 C038 -9999.000 -228.623 428.395 596
7 03/01/2000 C038 -9999.000 -228.623 428.395 596
8 04/01/2000 C038 -9999.000 -228.623 428.395 596

 #Se puede usar full join para unir todos los datos pero quedarian sin fecha 
 full_join(dt4,dt1,"COD")
        Date  COD     VALUE      LON     LAT ALT
1 01/01/2000 C037 -9999.000 -289.976 432.165 162
2 02/01/2000 C037     1.456 -289.976 432.165 162
3 03/01/2000 C037 -9999.000 -289.976 432.165 162
4 04/01/2000 C037 -9999.000 -289.976 432.165 162
5 01/01/2000 C038 -9999.000 -228.623 428.395 596
6 02/01/2000 C038 -9999.000 -228.623 428.395 596
7 03/01/2000 C038 -9999.000 -228.623 428.395 596
8 04/01/2000 C038 -9999.000 -228.623 428.395 596
9       <NA E000        NA -274.107 430.783 218
    
answered by 15.05.2018 в 16:43
1

If you force you do it with loops and conditional is possible. I agree that a join is the best option, but it can be done with explicit control structures.

The data

  

It would be very important to confirm that your data has the structure that those that are being used for example and test in this thread. By "the same structure" I mean that they have the same rows and columns with the same type of data in each one. Otherwise the code will not work.

I return the data created by Rolando, although I change the names.

    library(tidyverse)

df1 <- tribble(
   ~COD ,     ~LON,       ~LAT,       ~ALT,
   "C037",    -289.976,   432.165,   162,
   "E000",    -274.107,   430.783,  218,
   "C038",    -228.623,   428.395,   596#Agregue C038
 )

 df2 <- tribble(#una parte de tu ejemplo
   ~C037,    ~C038,     
   -9999,   -9999,       
   1.456,   -9999,   
   -9999,   -9999,   
   -9999,   -9999 
 )

 df3 <- tribble(
   ~Date,
   "01/01/2000",
   "02/01/2000",
   "03/01/2000",
  "04/01/2000"
 )

The loop you were trying with

The insistence that a join is better than a loop is going to be clear here: one of the problems with loops is that you have to "say" explicitly what to do in each step and then where and how to keep the result. That's why they correspond to a paradigm of imperative programming. The join would be an operation of the paradigm declarative , you indicate to your function what is the result that you are looking for and the function gets rid of the cumbersome work.

Now, why does not your loop work?

  • It has syntax problems. Some symbols are missing to order the code and the semantic interpreter (parser) of R is lost and does not know what to do. The basic syntax for a for loop is:
  • for (indice in iterador) {hacer_algo_con_índice}

    Given this syntax your code is missing a key { opening within the first loop and two keys } closing at the end. Not counting the keys of if .

  • The subset or creation of subsets of data in R is done -among other ways, but this is the most usual- with the symbols [ ] , brackets. In line three of your code you are using parentheses ( ) . When using parentheses you are telling R that what is behind is a function, not a data structure. Then R tries to execute the df1 function with the (i, 1) arguments. Obviously it fails, because there is no such function.

  • The output. Using a for loop, it is necessary to specify where you are going to save what you want to return at the end. In more technical terms: go assigning the output value in a new data structure. In your code there is nothing of this, therefore even if it worked perfectly I would not return anything at the end. I would do the processing, but the result would be lost. I see that you include a print "at the end" of your loop. Maybe that's what you want, that the loop only returns on screen the result. It would be fine for a human user who is reading the screen, but it would not create some output data that you could continue using after processing the loop. In technical terms it produces a side effect, but not an output itself.

  •   

    In the solution below that is what the cbind() and rbind() of x , y , z and out do, go arming step by step the final output.

  • The print of line 4 will not return anything. Even if it works by adding the function paste to concatenate the strings of characters in quotation marks, you would always get a constant: "i is 11 is j" . It does not have much case to make a loop to always return the same :). Also, as we saw in the previous point, what you are interested in is obtaining a data structure at the end, not the screen printing. So print is not used, except maybe to debug the code or help while you're building it.

    for (i in 1:nrow(df1))          #Falta una llave abriendo, si no R no sabe cuando comienza el bucle
      for (j in 1:ncol(df2)){       #Es está "bien"
       if (df1(i,1)==(df2(1,j)){    #En df1(i, 1) debería ser df1[i,1]
       print ("i is 1", "1 is j")   #Acá no hay output, de hecho no hay nada. 
    
  • How would a loop work?

    I will create the loop within a function, so as not to modify or create temporary structures in the global environment. The function will be called unir and serves only for this problem or another exactly the same. The idea of this is to see how painful it is to solve this problem with loops and how much more time is worth investing in learning to use joins than writing a loop that does the same.

    It does not solve the problem of the dates , for that I opened that open another for and another if . It is already complicated enough as it is.

    I do not doubt that there is a better way to do it with loops, but anyway there is no point in improving it, the loops are a suboptimal solution from any point of view.

    unir <- function (df1, df2){
      out <- data.frame()
      for (i in 1:nrow(df1)) {   
          for (j in 1:ncol(df2)) {      
           if (df1[i,1] == colnames(df2)[j]) {
             x <- as.vector(df1[i,])
             y <- as.vector(df2[ ,j])          #Porque si no sale un data.frame
                for (k in 1:length(y)){        #Itero dentro del vector
                  z <- rbind(cbind(x, y[k]))   #Armo la estructura de datos
                  names(z)[ncol(z)] <- "dato_que_interesa_unir"   #Porque en cada ciclo hereda un nombre diferente
            }
           }
          }
      out <- rbind(out, z)                      #Uno recursivamente el resultado de cada ciclo del segundo for
      }
      return(out)
    }
    
    unir (df1, df2)
    
    # Resultado
    
    > unir(df1, df2)
        COD      LON     LAT ALT dato_que_interesa_unir
    1  C037 -289.976 432.165 162              -9999.000
    2  C037 -289.976 432.165 162                  1.456
    3  C037 -289.976 432.165 162              -9999.000
    4  C037 -289.976 432.165 162              -9999.000
    5  C037 -289.976 432.165 162              -9999.000
    6  C037 -289.976 432.165 162                  1.456
    7  C037 -289.976 432.165 162              -9999.000
    8  C037 -289.976 432.165 162              -9999.000
    9  C038 -228.623 428.395 596              -9999.000
    10 C038 -228.623 428.395 596              -9999.000
    11 C038 -228.623 428.395 596              -9999.000
    12 C038 -228.623 428.395 596              -9999.000
    

    Conclusion

    merge and join yes, loops no .

        
    answered by 16.05.2018 в 19:55
    0

    In general, R usually tries to avoid the use of explicit cycles, that is to say the for or the while , for certain operations. This, because in many cases, is much more performant, simple and clear, work directly with the basic functionality of the language that is already prepared for the manipulations that we must do.

    First of all let's import the data of your example. Achieved a little sample so that all data.frame are consistent with each other and for the example is clearer:

    df1 <- read.table(text="COD LON LAT ALT
    C037    -289.976   432.165   162
    E000    -274.107   430.783   218
    C068    -228.623   428.395   596", header= T, stringsAsFactors=F)
    
    # Achicamos la muestra para que sea consistente con los códigos en df1
    df2 <- read.table(text="C037 E000 C068
    -1999  -9999   -9991 
    1.456  -9998   -9992
    -2999  -9997   -9993
    -3999  -9996   -9994", header= T, stringsAsFactors=F)
    
    # Dejamos solo 4 valores de fecha para que sea consistente con las 4 filas de df2
    df3 <- read.table(text="Date
    01/01/2000
    02/01/2000
    03/01/2000
    04/01/2000", header= T, stringsAsFactors=F)
    

    The solution

    # Agregamos la columna de fecha. Aquí se asume que el orden de df2 es el
    # mismo que tiene df3.
    df2$Fecha <- df3$Date 
    
    # Modificamos el formato de df2 "ancho" a "largo"
    df4 <- reshape(df2, 
                   varying = list(names(df2[1:ncol(df2)-1])), 
                   direction = "long", 
                   v.names="VALOR")
    
    #Renombramos valores de columnas y nos quedamos con las que nos interesan
    df4$COD=colnames(df2)[df4$time]
    df4 <- df4[, c(1, 5, 3)]
    
    # Hacemos un merge final de los dos dataframe
    df_final <- merge(df1, df4, by="COD", all=T)
    
    df_final
    

    Explanation:

    Basically what we do is to make the structure of df2 consistent to the structure of df1 , that is, where each row will be COD and% Fecha , if we see how we have df4 then of the reshape() and the rest of the normalizations before merge() will be clearer:

             Fecha  COD     VALOR
    1.1 01/01/2000 C037 -1999.000
    2.1 02/01/2000 C037     1.456
    3.1 03/01/2000 C037 -2999.000
    4.1 04/01/2000 C037 -3999.000
    1.2 01/01/2000 E000 -9999.000
    2.2 02/01/2000 E000 -9998.000
    3.2 03/01/2000 E000 -9997.000
    4.2 04/01/2000 E000 -9996.000
    1.3 01/01/2000 C068 -9991.000
    2.3 02/01/2000 C068 -9992.000
    3.3 03/01/2000 C068 -9993.000
    4.3 04/01/2000 C068 -9994.000
    

    As you can see, we converted everything from column to column, and now it's very simple, apply a merge() to "combine" df1 and df4 , the final output:

        COD      LON     LAT ALT      Fecha     VALOR
    1  C037 -289.976 432.165 162 01/01/2000 -1999.000
    2  C037 -289.976 432.165 162 02/01/2000     1.456
    3  C037 -289.976 432.165 162 03/01/2000 -2999.000
    4  C037 -289.976 432.165 162 04/01/2000 -3999.000
    5  C068 -228.623 428.395 596 01/01/2000 -9991.000
    6  C068 -228.623 428.395 596 02/01/2000 -9992.000
    7  C068 -228.623 428.395 596 03/01/2000 -9993.000
    8  C068 -228.623 428.395 596 04/01/2000 -9994.000
    9  E000 -274.107 430.783 218 01/01/2000 -9999.000
    10 E000 -274.107 430.783 218 02/01/2000 -9998.000
    11 E000 -274.107 430.783 218 03/01/2000 -9997.000
    12 E000 -274.107 430.783 218 04/01/2000 -9996.000
    
        
    answered by 15.05.2018 в 18:04