Get text in quotes with regex in JavaScript

2

Through a textarea I am capturing what the user enters. My goal is to capture textarea, everything that is in double quotes, or single quotes.
Note: It is assumed that there are no line breaks if the quotes have not been closed. p>

For example, if the textarea contains the following:

aaa 'bbb' ccc "ddd"

Then the regex must capture

bbb
ddd

The regex that I am using:

/((\".*?\")|(\'.*?\'))/g

It works fine for the exposed case, however it throws me an error in console Unterminated group when the textarea contains the following:

aaa 'rgba(255,255,255,'

What I need is that any string, regardless of what it contains, is captured as the strings of the first example were captured.

    
asked by cristiandev05 15.10.2016 в 19:42
source

3 answers

2

Three ways to capture the text in quotation marks (single or double)

1. Simple

To get the text in single or double quotes, we use 2 groups. After the match, only one of these 2 groups will have the searched text, and we will use only that value. Thus, you get only the text in quotation marks (not including the quotes).

/"([^"]*)"|'([^']*)'/g

function obtenerTextoEnComillas() {
    const regex = /"([^"]*)"|'([^']*)'/g,
    	  texto = document.getElementById("ingreso").value;
    var   grupo,
          resultado = [];
    
    while ((grupo = regex.exec(texto)) !== null) {
        //si coincide con comillas dobles, el contenido estará en el
        //   grupo[1], con el grupo[2] undefined, y viceversa
        resultado.push(grupo[1] || grupo[2]);
    }
    
    //resultado es un array con todas las coincidencias
    // mostramos los valores separados con saltos de línea
    document.getElementById("resultado").innerText = resultado.join("\n");
}
<textarea id="ingreso" style="width:100%" rows="4">
aaa 'bbb' ccc "ddd"a
aaa 'rgba(255,255,255,'
</textarea>
<input type="button" value="Obtener texto entre comillas" onclick="obtenerTextoEnComillas()">
<pre id="resultado"></pre>


2. All in one

We can always get the searched text within the same group ( grupo[2] ).
At the end of the expression, we use , which is a retroreference to group 1 (or backreference ), to ensure that it ends with the same character that was captured at the beginning (the quotes used to open).

/(["'])(.*?)/g

function obtenerTextoEnComillas() {
    const regex = /(["'])(.*?)/g,
    	  texto = document.getElementById("ingreso").value;
    var   grupo,
          resultado = [];
    
    while ((grupo = regex.exec(texto)) !== null) {
        //el grupo 1 contiene las comillas utilizadas
        //el grupo 2 es el texto dentro de éstas
        resultado.push(grupo[2]);
    }
    
    //resultado es un array con todas las coincidencias
    // mostramos los valores separados con saltos de línea
    document.getElementById("resultado").innerText = resultado.join("\n");
}
Texto:
<textarea id="ingreso" style="width:100%" rows="4">
aaa 'rgba(255,255,255,'
"texto con comillas 'simples' incluidas" ... 'y "viceversa"'
</textarea>
<input type="button" value="Obtener texto entre comillas" onclick="obtenerTextoEnComillas()">
<pre id="resultado"></pre>


Or, allowing line breaks between the quotes, replacing the dot with [\s\S] :

/(["'])([\s\S]*?)/g

function obtenerTextoEnComillas() {
    const regex = /(["'])([\s\S]*?)/g,
    	  texto = document.getElementById("ingreso").value;
    var   grupo,
          resultado = [];
    
    while ((grupo = regex.exec(texto)) !== null) {
        //el grupo 1 contiene las comillas utilizadas
        //el grupo 2 es el texto dentro de éstas
        resultado.push(grupo[2]);
    }
    
    //resultado es un array con todas las coincidencias
    // mostramos los valores separados con saltos de línea
    document.getElementById("resultado").innerText = resultado.join("\n\n");
}
Texto:
<textarea id="ingreso" style="width:100%" rows="4">
aaa 'rgba(255,
255,255,'
"texto con comillas 'simples' incluidas" ... 'y "viceversa"'
</textarea>
<input type="button" value="Obtener texto entre comillas" onclick="obtenerTextoEnComillas()">
<pre id="resultado"></pre>


Also, many times you want to implement more elaborate structures than .*? within the quotes. This expression is hardly less efficient than the previous one, but many times more effective with more complex structures (like the regex that will be seen later).

/(["'])([^"']*(?:(?!)["'][^"']*)*)/g
  • We define the first group to match either of the two types of quotes (["'])
  • At the end of the expression, we use , such as feedback to the group 1 (the quotes used to open).
  • In the middle, the group 2 ([^"']*(?:(?!)["'][^"']*)*) , which will contain the searched text. Matches:

    • any text without any of the two types of quotes [^"']* , followed (optionally) by
    • quotes not captured in group 1 (?!)["'] , followed by more text allowed [^"']*
      (?! .. ) is a negative forecast (or negative lookahead ).


    * In this structure we use a technique known as Unrolling The Loop , which follows the format normal* (?: especial normal* )* .


3. "With \" escapes \ ""

We can also consider the escaped quotes with a bar \" as valid (like most languages).
In this case, we use the /y modifier ( sticky ), which requires the match starts at the beginning of the text or at the end of the last match, and thus ensures that the quotes are balanced. * see compatibility

/[^'"\]*(?:\.[^'"\]*)*(["'])([^"'\]*(?:(?:(?!)["']|\.)[^"'\]*)*)/gy


Description:

/
[^'"\]*                    # Texto antes de las comillas
(?:                         # Grupo sin capturar
    \.[^'"\]*             #   Un \escape y más texto
)*                          # repetido 0 o más veces
(["'])                      # Comilla inicial (grupo 1)
(                           # Grupo 2: texto entre comillas
    [^"'\]*                #   Caracteres que no son comillas ni \
    (?:                     #   Grupo sin capturar
        (?:(?!)["']|\.)  #     Comillas que no son las usadas o un \escape
        [^"'\]*            #     Seguido de más caracteres permitidos
    )*                      #   repetido 0 o más veces (unrolling the loop)
)                           # fin del grupo 2
                          # Cierre de comillas ( es el texto capturado en el grupo 1)
/gy                         # Modos: g (todas las coincidencias) y (sticky, anclado)


Code:

function obtenerTextoEnComillas() {
    const regex = /[^'"\]*(?:\.[^'"\]*)*(["'])([^"'\]*(?:(?:(?!)["']|\.)[^"'\]*)*)/gy,
    	  texto = document.getElementById("ingreso").value;
    var   grupo,
          resultado = [];
    
    while ((grupo = regex.exec(texto)) !== null) {
        //el grupo 1 contiene las comillas utilizadas
        //el grupo 2 es el texto dentro de éstas
        resultado.push(grupo[2]);
    }
    
    //resultado es un array con todas las coincidencias
    // mostramos los valores separados con saltos de línea
    document.getElementById("resultado").innerText = resultado.join("\n");
}
Texto:
<textarea id="ingreso" style="width:100%" rows="4">
aaa 'bbb' ccc "ddd"a
aaa 'rgba(255,255,255,'
acá "se \"permiten\" 'comillas' con escapes"
</textarea>
<input type="button" value="Obtener texto entre comillas" onclick="obtenerTextoEnComillas()">
<pre id="resultado"></pre>
    
answered by 15.10.2016 / 23:00
source
0

Try doing it that way :

var str = 'aaa \'bbb\' ccc "ddd" aaa \'b"bb\' ccc "d\'dd"',
  re = /"[^"]*"|'[^']*'/,
  match;
while (match = re.exec(str)) {
  console.log(match[0]);
  str = str.replace(match[0], '');
}

Console output :

'bbb'
"ddd"
'b"bb'
"d'dd"
    
answered by 15.10.2016 в 20:27
-2
(?:'|")(.+)(?:'|")

This will capture you if there is any internal content in single or double quotes

link

You can try it here with all your cases

    
answered by 15.10.2016 в 20:41