Annual SQL query grouped by dates

1

I need to make an annual consultation grouped by months of all registered users in the period of one year. The query would be the following:

SELECT 
    COUNT(*) AS total_users, 
    YEAR(created_at), 
    MONTH(created_at) 
FROM 
    'users' 
WHERE 
    date('created_at') >= '2016-03-01 00:00:00' 
    AND date('created_at') <= '2017-03-31 23:59:59' 
GROUP BY 
    YEAR(created_at) ASC, 
    MONTH(created_at) ASC 

Until then everything is correct, it gives the expected results, but it would need that in addition it would throw the months in which there are no records at 0 with its corresponding month and year.

Solution finally used

SELECT table_dates.year, table_dates.month, COUNT(id) AS total_registers
FROM 'users'
RIGHT JOIN (
    SELECT 2016 AS year, 3 as month, CONCAT(2016, 3) as identifier UNION ALL 
    SELECT 2016, 4, CONCAT(2016, 4) UNION ALL 
    SELECT 2016, 5, CONCAT(2016, 5) UNION ALL 
    SELECT 2016, 6, CONCAT(2016, 6) UNION ALL 
    SELECT 2016, 7, CONCAT(2016, 7) UNION ALL 
    SELECT 2016, 8, CONCAT(2016, 8) UNION ALL 
    SELECT 2016, 9, CONCAT(2016, 9) UNION ALL 
    SELECT 2016, 10, CONCAT(2016, 10) UNION ALL 
    SELECT 2016, 11, CONCAT(2016, 11) UNION ALL 
    SELECT 2016, 12, CONCAT(2016, 12) UNION ALL 
    SELECT 2017, 1, CONCAT(2017, 1) UNION ALL 
    SELECT 2017, 2, CONCAT(2017, 2) UNION ALL 
    SELECT 2017, 3, CONCAT(2017, 3)
) table_dates ON table_dates.identifier = CONCAT(DATE_FORMAT(created_at, '%Y'), DATE_FORMAT(created_at, '%c'))
GROUP BY table_dates.year ASC, table_dates.month ASC
ORDER BY 'table_dates'.'year' ASC, 'table_dates'.'month' ASC

Thank you very much everyone for your contributions.

    
asked by rafni 23.03.2017 в 13:03
source

1 answer

0

To achieve what you ask, the query must, in some way, obtain or generate a list of fixed months so that you can make a left join of the data in your table users , so that regardless of the data in this, always get records for all months.

Establishing or generating this list of months is the part that is uncomfortable with MySQL. With other databases, it is possible to do this relatively easily, either by using recursive CTEs, or as in PostgreSQL, using functions designed precisely for that ( generate_series ).

As commented by @amenadiel, with MySQL, the options are more limited.

On the one hand, you can create a table (truth or derivative) with the list of months. But if the dates of interest you vary from query to query, you have to modify this table every time.

On the other hand, there are techniques using variables and joins with "system" tables that already exist and in which it is assumed that they have a certain number of records. Personally, these techniques seem very fragile.

Although not ideal, the best option I have found, and which I propose, is based on this answer . The technique used there is a combination of cross join and union all to set a number of base records that can be used to then combine it with other join or left join .

For example, the following query generates 1000 records with numbers from 0 to 999:

select a.a + (10 * b.a) + (100 * c.a) as seq
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c

If we reasonably assume that you will never need to consider more than 1000 months in a query (and if necessary, you can modify the query to generate more records), then we can use the following query to generate the list of dates from a start and end date:

select dateRange.rangeStart + interval (seq1000.seq) month
from (select date '2016-03-01' as rangeStart, date '2017-04-01' as rangeEnd) dateRange
join (
  select a.a + (10 * b.a) + (100 * c.a) as seq
  from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
  cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
  cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) seq1000
on (dateRange.rangeStart + interval (seq1000.seq) month) < dateRange.rangeEnd

Result:

2016-03-01
2016-04-01
2016-05-01
2016-06-01
2016-07-01
2016-08-01
2016-09-01
2016-10-01
2016-11-01
2016-12-01
2017-01-01
2017-02-01
2017-03-01

And as you can see, if you need to adjust the start and end date that interests you, you only need to make the modification within the derived table dateRange without changing the rest of the query.

Using the above, we can combine it with your existing query with a left join :

select count(u.created_at) as total_users,
       year(dateRange.rangeStart + interval (seq1000.seq) month) as created_at_year,
       month(dateRange.rangeStart + interval (seq1000.seq) month) as created_at_month
from (select date '2016-03-01' as rangeStart, date '2017-04-01' as rangeEnd) dateRange
join (
  select a.a + (10 * b.a) + (100 * c.a) as seq
  from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
  cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
  cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) seq1000
on (dateRange.rangeStart + interval (seq1000.seq) month) < dateRange.rangeEnd
left join users u
  on u.created_at >= dateRange.rangeStart
 and u.created_at < dateRange.rangeEnd
 and (dateRange.rangeStart + interval (seq1000.seq) month) = date(u.created_at - interval (day(u.created_at)-1) day)
group by year(dateRange.rangeStart + interval (seq1000.seq) month),
         month(dateRange.rangeStart + interval (seq1000.seq) month)
order by created_at_year, created_at_month

Demo

Although the query is not as compact or simple as you would like, or as it could be achieved using other databases, at least it fulfills the objective and it is very easy to execute it with different dates, only changing the dates in the derived table dateRange .

Additional note: You will notice that the expression date(created_at) I removed the date() . I do not think that adding the date() changes the result, but it can have a negative effect on the performance of your query, because it prevents any index defined in that column from being used.

    
answered by 23.03.2017 / 16:02
source