Safe way to escape user input to be processed by regular expressions in JavaScript

7

The following example is published in link :

function escapeRegExp(string) {
  return string.replace(/[.*+?^${}()|[\]\]/g, '\$&'); 
}

Is this the safe way to escape a string provided by the end user, for example through a dialog box?

Example:

/**
 * Ejemplo. Eliminar todas las instancias de una subcadena en una cadena.
 * 
 * Require dos entradas al usuario, cadena y subcadena.
 * Debemos asegurarnos que la subcadena es segura para ser procesada
 * como parte de una expresión regular.
 */

/** 
  * Tomado de 
  * https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
  * ¿Es esto seguro?
  */
function escapeRegExp(string) {
  return string.replace(/[.*+?^${}()|[\]\]/g, '\$&'); 
}

// Cadena a procesar
var cadena = "Test abc test test abc test test test abc test test abc";
var entradaUsuario1 = prompt("Escribe la cadena a procesar",cadena);

// Subcadena a eliminar
var subcadena = "abc";
var entradaUsuario2 = prompt("Escribe la subcadena a eliminar",subcadena);

// Aplicar la función para escapar la entrada de usuario
var re = new RegExp(escapeRegExp(entradaUsuario2),'g');

// Aplicar reemplazo
var resultado = entradaUsuario1.replace(re, '');

// Imprimir en la consola el resultado
console.log(resultado);
    
asked by Rubén 13.02.2017 в 15:48
source

1 answer

5

It's a safe way, but characters are running away.

  • The ] has only a special meaning within a class of characters (closing it). But if we are already escaping the [ , there could not be any class within the regular expression.
  • The } has only a special meaning as the end of the quantifier of rank {m,n} . And, again, if we are escaping the { , there could not be a quantifier of this style within the regex.


Escaping metacharacters

The metacharacters (or special characters) are exclusively:

\   ^   $   .   |   ?   *   +   (   )   [   {


The simplified function:

function escaparRegex(string) {
    return string.replace(/[\^$.|?*+()[{]/g, '\$&'); 
}


Escaping metacharacters in a character class

It may be the case that you want to add characters within a character class (in brackets), for example in

var re = new RegExp("\S+{2,} [" + caracteres + "]{3,}")

In that case, they should escape:

^ (al principio)   \   ]   -


The function to escape the content of a character class:

function escaparClaseRegex(string) {
    return string.replace(/^\^|[-\]\]/g, '\$&');
}


Escaping metacharacters in the replacement text

When using cadena.replace(re, reemplazo) , there are some patterns replacement that have a special meaning. To ensure that it is being replaced by the literal value, the $ should be escaped as $$ for:

$$   $&   $'   $'   $n (n es un dígito)

The function to escape the replacement text:

function escaparReemplazoRegex(string) {
    return string.replace(/\$(?=[$&''\d])/g, '$$$$');
}
    
answered by 13.02.2017 / 16:26
source