To achieve what you ask, the query must, in some way, obtain or generate a list of fixed months so that you can make a left join
of the data in your table users
, so that regardless of the data in this, always get records for all months.
Establishing or generating this list of months is the part that is uncomfortable with MySQL. With other databases, it is possible to do this relatively easily, either by using recursive CTEs, or as in PostgreSQL, using functions designed precisely for that ( generate_series
).
As commented by @amenadiel, with MySQL, the options are more limited.
On the one hand, you can create a table (truth or derivative) with the list of months. But if the dates of interest you vary from query to query, you have to modify this table every time.
On the other hand, there are techniques using variables and joins with "system" tables that already exist and in which it is assumed that they have a certain number of records. Personally, these techniques seem very fragile.
Although not ideal, the best option I have found, and which I propose, is based on this answer . The technique used there is a combination of cross join
and union all
to set a number of base records that can be used to then combine it with other join
or left join
.
For example, the following query generates 1000 records with numbers from 0 to 999:
select a.a + (10 * b.a) + (100 * c.a) as seq
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
If we reasonably assume that you will never need to consider more than 1000 months in a query (and if necessary, you can modify the query to generate more records), then we can use the following query to generate the list of dates from a start and end date:
select dateRange.rangeStart + interval (seq1000.seq) month
from (select date '2016-03-01' as rangeStart, date '2017-04-01' as rangeEnd) dateRange
join (
select a.a + (10 * b.a) + (100 * c.a) as seq
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) seq1000
on (dateRange.rangeStart + interval (seq1000.seq) month) < dateRange.rangeEnd
Result:
2016-03-01
2016-04-01
2016-05-01
2016-06-01
2016-07-01
2016-08-01
2016-09-01
2016-10-01
2016-11-01
2016-12-01
2017-01-01
2017-02-01
2017-03-01
And as you can see, if you need to adjust the start and end date that interests you, you only need to make the modification within the derived table dateRange
without changing the rest of the query.
Using the above, we can combine it with your existing query with a left join
:
select count(u.created_at) as total_users,
year(dateRange.rangeStart + interval (seq1000.seq) month) as created_at_year,
month(dateRange.rangeStart + interval (seq1000.seq) month) as created_at_month
from (select date '2016-03-01' as rangeStart, date '2017-04-01' as rangeEnd) dateRange
join (
select a.a + (10 * b.a) + (100 * c.a) as seq
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) seq1000
on (dateRange.rangeStart + interval (seq1000.seq) month) < dateRange.rangeEnd
left join users u
on u.created_at >= dateRange.rangeStart
and u.created_at < dateRange.rangeEnd
and (dateRange.rangeStart + interval (seq1000.seq) month) = date(u.created_at - interval (day(u.created_at)-1) day)
group by year(dateRange.rangeStart + interval (seq1000.seq) month),
month(dateRange.rangeStart + interval (seq1000.seq) month)
order by created_at_year, created_at_month
Demo
Although the query is not as compact or simple as you would like, or as it could be achieved using other databases, at least it fulfills the objective and it is very easy to execute it with different dates, only changing the dates in the derived table dateRange
.
Additional note: You will notice that the expression date(created_at)
I removed the date()
. I do not think that adding the date()
changes the result, but it can have a negative effect on the performance of your query, because it prevents any index defined in that column from being used.