If you force you do it with loops and conditional is possible. I agree that a join is the best option, but it can be done with explicit control structures.
The data
It would be very important to confirm that your data has the structure that those that are being used for example and test in this thread. By "the same structure" I mean that they have the same rows and columns with the same type of data in each one. Otherwise the code will not work.
I return the data created by Rolando, although I change the names.
library(tidyverse)
df1 <- tribble(
~COD , ~LON, ~LAT, ~ALT,
"C037", -289.976, 432.165, 162,
"E000", -274.107, 430.783, 218,
"C038", -228.623, 428.395, 596#Agregue C038
)
df2 <- tribble(#una parte de tu ejemplo
~C037, ~C038,
-9999, -9999,
1.456, -9999,
-9999, -9999,
-9999, -9999
)
df3 <- tribble(
~Date,
"01/01/2000",
"02/01/2000",
"03/01/2000",
"04/01/2000"
)
The loop you were trying with
The insistence that a join is better than a loop is going to be clear here: one of the problems with loops is that you have to "say" explicitly what to do in each step and then where and how to keep the result. That's why they correspond to a paradigm of imperative programming. The join would be an operation of the paradigm declarative , you indicate to your function what is the result that you are looking for and the function gets rid of the cumbersome work.
Now, why does not your loop work?
It has syntax problems. Some symbols are missing to order the code and the semantic interpreter (parser) of R is lost and does not know what to do. The basic syntax for a for
loop is:
for (indice in iterador) {hacer_algo_con_índice}
Given this syntax your code is missing a key {
opening within the first loop and two keys }
closing at the end. Not counting the keys of if
.
The subset or creation of subsets of data in R is done -among other ways, but this is the most usual- with the symbols [ ]
, brackets. In line three of your code you are using parentheses ( )
. When using parentheses you are telling R that what is behind is a function, not a data structure. Then R tries to execute the df1
function with the (i, 1)
arguments. Obviously it fails, because there is no such function.
The output. Using a for
loop, it is necessary to specify where you are going to save what you want to return at the end. In more technical terms: go assigning the output value in a new data structure.
In your code there is nothing of this, therefore even if it worked perfectly I would not return anything at the end. I would do the processing, but the result would be lost. I see that you include a print
"at the end" of your loop. Maybe that's what you want, that the loop only returns on screen the result. It would be fine for a human user who is reading the screen, but it would not create some output data that you could continue using after processing the loop. In technical terms it produces a side effect, but not an output itself.
In the solution below that is what the cbind()
and rbind()
of x
, y
, z
and out
do, go arming step by step the final output.
The print
of line 4 will not return anything. Even if it works by adding the function paste
to concatenate the strings of characters in quotation marks, you would always get a constant: "i is 11 is j"
. It does not have much case to make a loop to always return the same :). Also, as we saw in the previous point, what you are interested in is obtaining a data structure at the end, not the screen printing. So print is not used, except maybe to debug the code or help while you're building it.
for (i in 1:nrow(df1)) #Falta una llave abriendo, si no R no sabe cuando comienza el bucle
for (j in 1:ncol(df2)){ #Es está "bien"
if (df1(i,1)==(df2(1,j)){ #En df1(i, 1) debería ser df1[i,1]
print ("i is 1", "1 is j") #Acá no hay output, de hecho no hay nada.
How would a loop work?
I will create the loop within a function, so as not to modify or create temporary structures in the global environment.
The function will be called unir
and serves only for this problem or another exactly the same. The idea of this is to see how painful it is to solve this problem with loops and how much more time is worth investing in learning to use joins than writing a loop that does the same.
It does not solve the problem of the dates , for that I opened that open another for
and another if
. It is already complicated enough as it is.
I do not doubt that there is a better way to do it with loops, but anyway there is no point in improving it, the loops are a suboptimal solution from any point of view.
unir <- function (df1, df2){
out <- data.frame()
for (i in 1:nrow(df1)) {
for (j in 1:ncol(df2)) {
if (df1[i,1] == colnames(df2)[j]) {
x <- as.vector(df1[i,])
y <- as.vector(df2[ ,j]) #Porque si no sale un data.frame
for (k in 1:length(y)){ #Itero dentro del vector
z <- rbind(cbind(x, y[k])) #Armo la estructura de datos
names(z)[ncol(z)] <- "dato_que_interesa_unir" #Porque en cada ciclo hereda un nombre diferente
}
}
}
out <- rbind(out, z) #Uno recursivamente el resultado de cada ciclo del segundo for
}
return(out)
}
unir (df1, df2)
# Resultado
> unir(df1, df2)
COD LON LAT ALT dato_que_interesa_unir
1 C037 -289.976 432.165 162 -9999.000
2 C037 -289.976 432.165 162 1.456
3 C037 -289.976 432.165 162 -9999.000
4 C037 -289.976 432.165 162 -9999.000
5 C037 -289.976 432.165 162 -9999.000
6 C037 -289.976 432.165 162 1.456
7 C037 -289.976 432.165 162 -9999.000
8 C037 -289.976 432.165 162 -9999.000
9 C038 -228.623 428.395 596 -9999.000
10 C038 -228.623 428.395 596 -9999.000
11 C038 -228.623 428.395 596 -9999.000
12 C038 -228.623 428.395 596 -9999.000
Conclusion
merge
and join
yes, loops no
.