The key is not to try to separate, but to match each element.
And the trick is to add a comma before the text, then it's as simple as matching a comma followed by "[^"]*"
or [^,]*
.
This is the best way to obtain each element, ensuring that empty elements are also respected at the beginning or end of the text.
Regular Expression
,("[^"]*"|[^,]*)
-
,
- matches a literal comma
-
("[^"]*"|[^,]*)
We use the parentheses, to capture what matches, and retrieve it with matcher#group(1)
. Within the group, two separate options with |
:
-
"[^"]*"
- Opening quotation marks, followed by any number of characters that are not quotes, and closing quotes.
-
[^,]*
- Any number of characters that are not commas.
Code
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = ",(\"[^\"]*\"|[^,]*)";
final String text = ",10010222,\"The Royal Bank of Scotland, Niederlassung Deutschland\",10105,Berlin";
final Pattern pattern = Pattern.compile(regex);
// le agregamos una coma al texto para que coincida con el primer elemento
final Matcher matcher = pattern.matcher("," + text);
int n = 0; //sólo para mostrar el número de elemento (opcional)
while (matcher.find()) {
System.out.print ("Elemento " + ++n + ": ");
System.out.println(matcher.group(1));
}
Result
Elemento 1:
Elemento 2: 10010222
Elemento 3: "The Royal Bank of Scotland, Niederlassung Deutschland"
Elemento 4: 10105
Elemento 5: Berlin
Demo
link
Option 2: omit the quotes in the result
If one of the elements is in quotation marks and you want to get the text without the quotes in the result, we can use one more group, to get only the text that is between the quotes. That is, we add a couple more parentheses:
,("([^"]*)"|[^,]*)
And in the code, we evaluate if matcher.group(2)
has any value.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = ",(\"([^\"]*)\"|[^,]*)";
final String text = ",12070024,Deutsche Bank Privat\" und\" Geschäftskunden,16856,\"Kyrätz, Prägnitz\"";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher("," + text);
String elemento;
int n = 0;
while (matcher.find()) {
System.out.print ("Elemento " + ++n + ": ");
if (matcher.group(2) != null)
{ // Elemento entre comillas?
elemento = matcher.group(2); // Obtener el texto sin las comillas
}
else
{
elemento = matcher.group(1);
}
System.out.println(elemento);
}
Demo: link
Option 3: Allow escaped quotes inside quotes
To be able to allow escaped quotes with a \
, it is necessary to generate the exception for all \
s, and at the same time allow a bar followed by any character.
final String regex = ",(\"[^\\\"]*(?:\\.[^\\\"]*)*\"|[^,]*)";
Demo: link