Seeing the syntax you are using in your CSV, escaped embedded quotes are allowed as double ( ""
), and a slash ( \
) is not taken as a special character.
To match a text in quotation marks, allowing ""
within them, you can search for all characters other than "
, optionally followed by any amount of ""
and more characters. That is:
"[^"]*(?:""[^"]*)*"
- Although there are more limited patterns for this, this is the most efficient way to do it, using a technique called unrolling the loop .
The complete regex, for elements with or without quotes, would be:
,("[^"]*(?:""[^"]*)*"|[^,]+)
Code
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String[] csv = new String[] { // Líneas de la pregunta
",codigo,nom,cognom",
",111,michael,salinas",
",222,\"luis\",\"doh, \”jik\"",
",333,ram,\"Lak\"\"\\"\"\"\"\\"\"\"\"\\"\" , \"\"\\"\"“one\""
};
final String regex = ",(\"[^\"]*(?:\"\"[^\"]*)*\"|[^,]+)";
final Pattern pattern = Pattern.compile(regex);
for (String line : csv) { // loop a cada línea
System.out.println("Línea: " + line);
final Matcher m = pattern.matcher(line);
while (m.find()) { // loop a cada coincidencia (cada elemento sin la coma)
// Imprimimos el grupo 1 (lo que coincidió entre paréntesis)
System.out.println(" Elemento: " + m.group(1));
}
}
Exit
Línea: ,codigo,nom,cognom
Elemento: codigo
Elemento: nom
Elemento: cognom
Línea: ,111,michael,salinas
Elemento: 111
Elemento: michael
Elemento: salinas
Línea: ,222,"luis","doh, \”jik"
Elemento: 222
Elemento: "luis"
Elemento: "doh, \”jik"
Línea: ,333,ram,"Lak""\""""\""""\"" , ""\""“one"
Elemento: 333
Elemento: ram
Elemento: "Lak""\""""\""""\"" , ""\""“one"
Demo in ideone