blocks of 15 days in sql in R

2

I have the following table:

        
id money date

one fifteen 2009-02-07

one 30 2009-02-09

one Four. Five 2009-03-04

one fifty 2009-03-12

I would like to get the most money spent in every fortnight. For example, for the first fortnight of February I would have to leave 30 and for March 50. How could I do this in an sql query in R with the sqldf library?

I do not use any database, I have them in .csv files and import them into a dataframe. The problem that would have is that the period of time in which I have to do this covers several years so I would not know very well how to do it.

    
asked by adamista 17.04.2017 в 15:38
source

2 answers

1

I have not touched SQLDF, so I put this so that it can serve as a guide and / or help. Assuming the sqldf backend is SQLite, the select you are looking for could be the same or similar to the following

Select MAX(dinero) Cantidad,
   ROUND(strftime('%W', fecha)/2) Quincena,
   strftime('%Y', fecha) Year
   from quincenas
   group by ROUND(strftime('%W', fecha)/2),strftime('%Y', fecha)

About the operation, strftime we use it to take separately the week of the year with% W and the year in particular with% Y.

    
answered by 17.04.2017 / 17:21
source
1

I propose a solution using tidyverse and lubridate that contemplates returning all fortnights (although there is no data for that fortnight) and also considers fortnights ranging from 1-15 and 15-last day of the month.

I build the example:

df <- data.frame(id = c(1,2,3,4, 5, 6), 
                 dinero = c(15, 30, 45, 50, 10, 20), 
                 fecha = as.Date(c("2009-02-15", "2009-02-07", "2009-02-09",
                                   "2009-03-04", "2009-03-12", "2015-07-31")))
id dinero      fecha
 1     15 2009-02-15
 2     30 2009-02-07
 3     45 2009-02-09
 4     50 2009-03-04
 5     10 2009-03-12
 6     20 2015-07-31

Now I generate a dataframe that has all the last days of fortnight, from the date in which there starts to be data in df until the date it ends, adding a column that is called dinero , with the value 0 for all dates (which I will use later):

library(tibble)
library(dplyr)

dates <- as_tibble(seq(floor_date(as.Date(min(df$fecha)), unit = "month"), 
                       ceiling_date(as.Date(max(df$fecha)), unit = "month"), 
                       by = 'days')) %>% 
  mutate(dinero = 0) %>% 
  filter(day(value) == 15 |
         day(value) == day(ceiling_date(value, unit = "month") - 1))

# A tibble: 156 × 2
        value dinero
       <date>  <dbl>
1  2009-02-15      0
2  2009-02-28      0
3  2009-03-15      0
4  2009-03-31      0
5  2009-04-15      0
6  2009-04-30      0
7  2009-05-15      0
8  2009-05-31      0
9  2009-06-15      0
10 2009-06-30      0
# ... with 146 more rows

floor_date with unit = "month" takes the first day of the month ( 1 ) and ceiling_date() - 1 takes the last one (can be 28, 29, 30 or 31).

Then I do a full_join , and I consider dinero if df$dinero is not NA and if not dates$dinero , which I defined as 0 for all cases.

resultado <- df %>% 
  full_join(dates, by = c("fecha" = "value")) %>% 
  mutate(year = year(fecha),
         month = month(fecha),
         num_quin = ifelse(1 <= day(fecha) & day(fecha)<=15, 1, 2),
         dinero = ifelse(!is.na(dinero.x), dinero.x, dinero.y)) %>% 
  group_by(year, month, num_quin) %>% 
  summarize(max_dinero = max(dinero))

Source: local data frame [156 x 4]
Groups: year, month [?]

    year month num_quin max_dinero
   <dbl> <dbl>    <dbl>      <dbl>
1   2009     2        1         45
2   2009     2        2          0
3   2009     3        1         50
4   2009     3        2          0
5   2009     4        1          0
6   2009     4        2          0
7   2009     5        1          0
8   2009     5        2          0
9   2009     6        1          0
10  2009     6        2          0
# ... with 146 more rows

There you are ordered by year, month and fortnight 1 or 2 of the month.

    
answered by 02.05.2017 в 20:49