In these cases is where the regular expressions in SQL Server are missed, it must be said, the search of patterns in the engine is extremely limited, the only thing we can do is look for an initial pattern or chain and a pattern or chain later to be able to cut what we have among these. In short, it is what you are already doing, but what you need is the possibility of defining different criteria for trimming depending on two cases that you have defined.
What you can do is create a clause CASE WHEN ... THEN ... END
and define a search pattern in the string that defines the behavior of the clipping.
The idea would be something like this:
SELECT SUBSTRING(PARRAFO, PATINDEX('% Año [0-9][0-9][0-9][0-9]%', PARRAFO) + 5, 4) AS 'AÑO',
CASE WHEN PATINDEX('% N ° [0-9]%,%', PARRAFO) > 0 THEN
SUBSTRING(PARRAFO,
PATINDEX('% N ° [0-9]%,%', PARRAFO) + 5,
CHARINDEX(',', PARRAFO, PATINDEX('% N ° [0-9]%,%', PARRAFO)) - PATINDEX('% N ° [0-9]%,%', PARRAFO) - 5
)
WHEN PATINDEX('% Nro. [0-9]%', PARRAFO) > 0 THEN
SUBSTRING(PARRAFO,
PATINDEX('% Nro. [0-9]%', PARRAFO) + 6,
CHARINDEX(' del', PARRAFO, PATINDEX('% Nro. [0-9]%', PARRAFO)) - PATINDEX('% Nro. [0-9]%', PARRAFO) - 6
)
ELSE NULL END AS 'NUMERO'
FROM (SELECT 'a Fs. 35933 Nro. 39021 del Año 2017 en favor de Itaú Corpbanca para garantizar al acreedor el cumplimiento de las obligaciones que contraiga actualmente o en el futuro' AS PARRAFO
UNION
SELECT 'al día de hoy le afecta: una Hipoteca, inscrita a Fs. 744 N ° 698, correspondiente al año 2010, a favor del Banco del Estado de Chile.- DOY FE.-'
) T
Exit:
+---+------+--------+
| # | AÑO | NUMERO |
+---+------+--------+
| 1 | 2017 | 39021 |
+---+------+--------+
| 2 | 2010 | 698 |
+---+------+--------+
Comments:
- I recommend using
PATINDEX()
over CHARINDEX()
to find the initial pattern, it has some more flexibility and allows us more precision when finding a pattern. For example: PATINDEX('% Año [0-9][0-9][0-9][0-9]%', PARRAFO)
this will search Año
followed by 4 numbers, it will not fail if the text Año del nacimiento del General San Martín
is found, while CHARINDEX()
if it will
- The
Año
does not need two different patterns as long as the table is set to non-case sensitive.
- For the
Número
if, you can see that we use the CASE WHEN
to determine, according to a pattern, the behavior of the clipping. Importantly, whenever you use this type of structure, it is essential that the most specific patterns are first and the most general at the end.