How to verify if two data intervals have shared elements?

13

A few days ago I saw a question that among other things implied a problem similar to the one I am going to raise, I wanted to do it in a more general way, because I understand that the well-thought-out solution could serve as a reference to similar problems. Maybe for some of you the answer is trivial or obvious, but in my case, just when I chewed enough I found (at least I think so) that it was simpler than I thought. I raise it in Sql but it could be more of algorithms, the issue is that I found it more practical to be able to test the solutions.

Suppose the following example:

CREATE TABLE A (
    NRO_DESDE INT,
    NRO_HASTA INT
    )

CREATE TABLE B (
    NRO_DESDE INT,
    NRO_HASTA INT
    )

INSERT INTO A (NRO_DESDE, NRO_HASTA)
VALUES (5, 8)

INSERT INTO B (NRO_DESDE, NRO_HASTA)
VALUES (1, 2), (4, 5), (5, 8), (6, 7), (7, 9), (4, 10), (9, 11)

SELECT NRO_DESDE, NRO_HASTA FROM A;
SELECT NRO_DESDE, NRO_HASTA FROM B;

Table A has a single value:

NRO_DESDE NRO_HASTA
========= =========
5         8

Table B

NRO_DESDE NRO_HASTA
========= =========
1         2
4         5
5         8
6         7
7         9
4         10
9         11

The tables A and B represent sets of intervals, but of which we do not have all the values but we know the first and last element of each set, the idea is to compare the only set in A with all of B and determine if they share any element. As an example, the B % registration (4, 5) shares the 5 with A , the (1, 2) does not share any element, the (7, 9) share the 7, 8 . The result would then be the records of B that have elements shared with those of A , it is not important to know what they are, just to know that there are, we can also assume that the number of elements in each set is relatively manageable. Do not worry about absences of primary keys is simply a conceptual example.

Note: The code is built in SQL Server but could be resolved in any "flavor" of SQL.

    
asked by Patricio Moracho 03.08.2017 в 04:55
source

3 answers

14

I do not know if it's the most performant query on the planet ... but you have a set problem.

SELECT B.ID, *
FROM A, B
where
A.nro_desde between B.nro_desde and B.nro_hasta
or
A.nro_hasta between B.nro_desde and B.nro_hasta
or
B.nro_desde between A.nro_desde and A.nro_hasta
or
B.nro_hasta between A.nro_desde and A.nro_hasta

With a query like this you are looking for all the variants of the two sets that intersect.

Notice that we first try A against the intervals of B. And then we do the same for B. The latter, because if the interval of A completely contains B, then it would not come out in the first query.

    
answered by 03.08.2017 / 06:04
source
14

Gbianchi's answer effectively solves my question, contemplating the 4 different cases in which the intervals may be connected. Let's see:

Cases 1 and 2 are classic intersections of sets, being intervals, one can be located before the other and have contact by their ends. In the example, for example, the interval 4-5 or 7-9 that are connected to the interval A 5-8 by one of its extremes.

The other two possible cases are inclusion cases, where one set is included in the other. For example, the interval B (6-7) is included in the A (5-8) , or vice versa, the A is included in the B (4-10) . To contemplate the 4 cases and as the only data we have of the sets are their limits, the query must be resolved as Gbianchi mentions it.

SELECT *
FROM A, B
where
    A.nro_hasta between B.nro_desde and B.nro_hasta or -- Caso 1/3
    A.nro_desde between B.nro_desde and B.nro_hasta or -- Caso 2/3
    B.nro_desde between A.nro_desde and A.nro_hasta or -- Caso 1/4
    B.nro_hasta between A.nro_desde and A.nro_hasta    -- Caso 2/4

Another way of looking at this problem is to think that cases do not meet these conditions. If we go to the theory the only case is that of the disjoint sets , that is:

Sets that do not connect, when the intervals are two, when B is less than A or vice versa. If we study now how to compare the limits we will see that it is much simpler, it would be:

  • that B.NRO_HASTA < A.NRO_DESDE or
  • that A.NRO_HASTA < B.NRO_DESDE

and the interesting thing about this is that the denial of these cases is one of the previous cases, so we could simplify the original query to the following:

SELECT *
    FROM A, B
    WHERE NOT (
                B.NRO_HASTA < A.NRO_DESDE OR
                A.NRO_HASTA < B.NRO_DESDE
              )

Or better yet, with the Joins explicit:

SELECT *
        FROM A 
        INNER JOIN B
        ON NOT (
                    B.NRO_HASTA < A.NRO_DESDE OR
                    A.NRO_HASTA < B.NRO_DESDE
               )

Or as Fede H says, applying a bit of algebra to remove NOT :

SELECT *
        FROM A 
        INNER JOIN B
        ON (
                B.NRO_HASTA >= A.NRO_DESDE AND
                A.NRO_HASTA >= B.NRO_DESDE
           )

Although this is a conceptual example, in reality we can find many cases similar to this, for example, it is common to work with dates in certain problems in this way:

  • Which rooms in a hotel are unoccupied between a range of dates?
  • What people are traveling between dates?
  • What historical figures lived between certain centuries?
answered by 04.08.2017 в 05:32
5

For me the clearest solution is similar to Patrick's but with the conditions reversed. I just had to use it recently with dates and maybe in that case it is better understood. I will try to paraphrase it to see if it is understood.

The ranges I look for are what end after the start of the target range and begin before the end of the target range

SELECT *
    FROM A, B
    WHERE B.NRO_HASTA >= A.NRO_DESDE
      AND B.NRO_DESDE <= A.NRO_HASTA
    
answered by 10.08.2017 в 21:29