Remove duplicates ... under certain guidelines

5

I have a table Peticiones whose design is:

NPeticion       - nvarchar
FechaApertura   - date
TipoPeticion    - nvarchar
FechaResolucion - nvarchar
...
(más campos, irrelevantes para este asunto)

The question is that I try to make a query that eliminates duplicates of records that have the same NPeticion , FechaApertura and TipoPeticion stay with the record with FechaResolucion more recent and different from 'NULL' unless all the records are 'NULL' that I would stay with any of them.

I made this query: (possibly not the most elegant / efficient but that's what I came up with)

;with aBorrar as
(select [NPeticion], [FechaApertura], [TipoPeticion],[FechaResolucion],
    row_number() over(partition by [NPeticion], [FechaApertura] 
                      order by  [FechaResolucion] desc) rn
  from [dbo].[Peticiones]
  where NPeticion in (

          SELECT
            [NPeticion]
        FROM
            [dbo].[Peticiones]
        GROUP BY
            [NPeticion],FechaApertura, TipoPeticion
        HAVING 
            COUNT(*) > 1

            )
  and [FechaResolucion] <> 'NULL'
  Union 
  select [NPeticion], [FechaApertura], [TipoPeticion],[FechaResolucion],2
  from [dbo].[Peticiones]
  where NPeticion in (

          SELECT
            [NPeticion]
        FROM
            [dbo].[Peticiones]
        GROUP BY
            [NPeticion],FechaApertura, TipoPeticion
        HAVING 
            COUNT(*) > 1                
            )
  and [FechaResolucion] = 'NULL'
)
select * 
from aBorrar 
where rn > 1;

Explanation:

Since FechaResolucion is a nvarchar , if I order desc to keep the most recent, the values 'NULL' appear first. That's why I make a select ordering the records that are not 'NULL' and then union of those null.

But that does not solve the problem of if all the duplicates contain that column equal to 'NULL' I would erase them all ...

How do I leave a record in case all duplicates have FechaResolucion = 'NULL' ?

Sample data:

[NPeticion]       [FechaApertura] [TipoPeticion] [FechaResolucion]
'20171204000001'     12/04/2017        'A'          '12/04/2017'
'20171204000001'     12/04/2017        'A'          '10/04/2017'
'20171204000001'     12/04/2017        'A'          '10/04/2017'
'20171204000001'     12/04/2017        'A'             'NULL'
'20171204000002'     12/04/2017        'B'             'NULL'
'20171204000002'     12/04/2017        'B'             'NULL'

Current result (records that would be deleted):

[NPeticion]       [FechaApertura] [TipoPeticion] [FechaResolucion]
'20171204000001'     12/04/2017        'A'          '10/04/2017'
'20171204000001'     12/04/2017        'A'          '10/04/2017'
'20171204000001'     12/04/2017        'A'             'NULL'
'20171204000002'     12/04/2017        'B'             'NULL'
'20171204000002'     12/04/2017        'B'             'NULL'

Expected result (records that would be deleted):

[NPeticion]       [FechaApertura] [TipoPeticion] [FechaResolucion]
'20171204000001'     12/04/2017        'A'          '10/04/2017'
'20171204000001'     12/04/2017        'A'          '10/04/2017'
'20171204000001'     12/04/2017        'A'             'NULL'
'20171204000002'     12/04/2017        'B'             'NULL'

Note: I have not designed the BD. Modifying your design is not possible.

    
asked by lois6b 12.04.2017 в 13:12
source

2 answers

6

Some comments about your current code. One of your problems is that by simply doing ORDER BY FechaResolucion DESC , and being FechaResolucion a string with the date format dd/mm/aaaa , it will not sort correctly, but alphabetically.

On the other hand, it is not necessary to separate two queries to see if they contain or not NULL , but you can do something like the following:

WITH CTE1 AS
(
    SELECT  NPeticion,
            FechaApertura,
            TipoPeticion,
            NULLIF(FechaResolucion,'NULL') FechaResolucion
    FROM dbo.Peticiones
), CTE2 AS
(
    SELECT  *,
            RN = ROW_NUMBER() OVER( PARTITION BY NPeticion, FechaApertura, TipoPeticion
                                    ORDER BY CONVERT(DATETIME,FechaResolucion,103) DESC)
    FROM CTE1
)
SELECT *
FROM CTE2
WHERE RN > 1;

In this case, the first CTE is not even necessary to do it separately, and you could use the second one replacing CONVERT(DATETIME,FechaResolucion,103) with CONVERT(DATETIME,NULLIF(FechaResolucion,'NULL'),103) , but I prefer to leave it that way because I find it more readable.

    
answered by 12.04.2017 / 15:10
source
1

I would simply sort by FechaResolucion DESC with a ROW_NUMBER() in a subquery, then filter by only Number = 1 and it will bring you the most recent date or any of the null ones in case they are all null

;WITH _peticiones AS(
SELECT ROW_NUMBER() OVER(PARTITION BY NPeticion ORDER BY CAST(P.FechaResolucion AS DATE) DESC) [Number],
    P.NPeticion, P.FechaApertura, P.TipoPeticion, CAST(P.FechaResolucion AS DATE) [FechaResolucion]
  FROM @Peticiones P 
) SELECT * FROM _peticiones P WHERE P.Number = 1
    
answered by 12.04.2017 в 15:50