Any alternatives to the ifelse?

2

I am facing a problem that I do not know how to solve. I have 2 (very large) dataframes of DIFFERENT dimensions that contain meteorological station information on a daily basis for 15 years. The problem I have is that the same station has a different name in each df and I must unify everything in a single df. I will try to explain myself with a tiny example:

``````df1 # Este df tiene "escritas" todas las fechas posibles (2000-01-01 al 2015-31-12), pero realmente sólo registró Temp y humedad del 2000-01-01 al 2008-31-12

ID       Lat      Lon         Alt     Fecha          Temp    Humedad
C065    43.2112   -3.40713    677     2000-01-01     25       50
C065    43.2112   -3.40713    677     2000-02-01     12       55
C065    43.2112   -3.40713    677     2000-03-01     18       58
C065    43.2112   -3.40713    677     2015-01-01     9999     9999 #Esto significa que no tiene registro
C054    43.0423   -2.65763    556     2000-01-01     12       35
C054    43.0423   -2.65763    556     2000-02-01     9        35
C054    43.0423   -2.65763    556     2000-03-01     10       38
C054    43.0423   -2.65763    556     2015-01-01     9999     9999
C035    42.9661   -2.86795    618     2000-12-13     2        25
C035    42.9661   -2.86795    618     2000-12-14     4        22
C035    42.9661   -2.86795    618     2000-12-14     8        27

df2 # Este df tiene "escritas" todas las fechas posibles (2000-01-01 al 2015-31-12), pero realmente sólo registró Temp y humedad del 2009-01-01 al 2015-31-12

ID       Lat      Lon         Alt     Fecha          Temp    Humedad
GH65    43.2112   -3.40713    677     2000-01-01     9999    9999
GH65    43.2112   -3.40713    677     2009-01-01     35       60
GH65    43.2112   -3.40713    677     2009-02-01     32       55
GH65    43.2112   -3.40713    677     2015-01-01     28       58
GH54    43.0423   -2.65763    556     2015-01-12     22       62
GH54    43.0423   -2.65763    556     2015-02-12     28       65
GH54    43.0423   -2.65763    556     2015-03-12     30       68
``````

As it is observed, C065 and GH65 are the same station, since they are in the same coordinates. Now, what I have to do is to unify with a single name those stations that coincide in coordinates, either by calling them C065 or GH 65. I should also be able to make them complement, that is, from C065 I only take the real records (2000 to 2008) and of the GH65 also (2009-2015) It had occurred to me to do it using ifelse

``````df1\$ID2 <- ifelse(df1\$Lat == df2\$Lat & df1\$Lon == df2\$Lon & df1\$Alt == df2\$Alt, df1\$ID ,"NO")
``````

but since the two df do not have the same number of records, it will not work. Any suggestions to solve it? I'll have to make a loop?

I hope I have understood, I know it is half complicated to explain the data I have and what I want to do.

Thanks in advance for any ideas.

Caro

asked by Caro 04.09.2018 в 10:51
source

1

``````df1 <- read.table(text= "ID       Lat      Lon         Alt     Fecha          Temp    Humedad
C065    43.2112   -3.40713    677     2000-01-01     25       50
C065    43.2112   -3.40713    677     2000-02-01     12       55
C065    43.2112   -3.40713    677     2000-03-01     18       58
C065    43.2112   -3.40713    677     2015-01-01     9999     9999
C054    43.0423   -2.65763    556     2000-01-01     12       35
C054    43.0423   -2.65763    556     2000-02-01     9        35
C054    43.0423   -2.65763    556     2000-03-01     10       38
C054    43.0423   -2.65763    556     2015-01-01     9999     9999
C035    42.9661   -2.86795    618     2000-12-13     2        25
C035    42.9661   -2.86795    618     2000-12-14     4        22
C035    42.9661   -2.86795    618     2000-12-14     8        27", header = T, stringsAsFactors = F)

GH65    43.2112   -3.40713    677     2000-01-01     9999    9999
GH65    43.2112   -3.40713    677     2009-01-01     35       60
GH65    43.2112   -3.40713    677     2009-02-01     32       55
GH65    43.2112   -3.40713    677     2015-01-01     28       58
GH54    43.0423   -2.65763    556     2015-01-12     22       62
GH54    43.0423   -2.65763    556     2015-02-12     28       65
GH54    43.0423   -2.65763    556     2015-03-12     30       68", header = T, stringsAsFactors = F)
``````

Then, using `tidyverse` the idea is:

• Combine the two `data.frame` into one only
• Build the new ID, for this we create a new `df` with `Lat` , `Lon` and `Alt` , the new ID will be the first one that appears in each group
• Finally we add the new ID to the data

Solution:

``````library(tidyverse)
library(lubridate)

# unimos ambos df en uno nuevo
# y de paso convertimos Fecha a Date
df1 %>%
union(df2) %>%
mutate(Fecha = as.Date(Fecha)) %>%
as.tibble() -> newdf

# Creamos un nuevo df para generar los grupos
# Generamos un nuevo id con el primer valor de cada grupo
newdf %>%
group_by(Lat, Lon, Alt) %>%
summarise(newid=first(ID)) -> grupos

# Genermos el df final con el nuevo id
newdf %>%
left_join(grupos, by=c("Lat", "Lon", "Alt")) %>%
select(newid, everything()) %>%
arrange(newid) -> newdf
``````

The new `data.frame` will have a new column `newid` that represents the `ID` unified, for example:

``````newdf %>%
filter(newid=='GH65')

# A tibble: 8 x 8
newid ID      Lat   Lon   Alt Fecha       Temp Humedad
<chr> <chr> <dbl> <dbl> <int> <chr>      <int>   <int>
1 GH65  GH65   43.2 -3.41   677 2015-01-01    28      58
2 GH65  GH65   43.2 -3.41   677 2009-02-01    32      55
3 GH65  GH65   43.2 -3.41   677 2009-01-01    35      60
4 GH65  GH65   43.2 -3.41   677 2000-01-01  9999    9999
5 GH65  C065   43.2 -3.41   677 2015-01-01  9999    9999
6 GH65  C065   43.2 -3.41   677 2000-03-01    18      58
7 GH65  C065   43.2 -3.41   677 2000-02-01    12      55
8 GH65  C065   43.2 -3.41   677 2000-01-01    25      50
``````

Filtering later by date is trivial using the functionality of `tidyverse/dplyr` and thanks to that we have previously converted `Fecha` to `Date` :

``````newdf %>%
filter(year(Fecha) >= 2009 & year(Fecha) <= 2015 & ID == 'GH65')
``````

``````library("sqldf")