Perform a comma split taking into account if there are quotes

4

I have the following lines in java:

1234,"Calle Jaime III, 34", 67,3,U
1235,Avenida Los Algodones, 12,1,L
1236,"Calle Principal""31234", 46,3,H
1237,"Calle Alfonso X,22", 65,2,J

I would like to do a Split for the character

  

,

but as you can see in the example, the address has quotes so when you have a comma within a field with quotation marks the split is done poorly.

I try to get the following:

1234    Calle Jaime III 34       67     3     U
1235    Avenida Los Algodones    12     1     L
1236    Calle Principal 31234    46     3     H
1237    Calle Alfonso X 22       65     2     J
    
asked by adamista 23.05.2017 в 18:15
source

2 answers

4

I have found a solution to your problem in SO in English in the following answer

in which the following regular expression is used, in which it makes the split in the comma only if that comma has zero, or an even number of quotes ahead

,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)

Next I put a small java code to prove that expression

String line = "1234,\"Calle Jaime III, 34\", 67,3,U";
String[] tokens = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
for(String t : tokens) {
    System.out.println("> "+t);
}

Showing the following on the screen:

  

1234

     

"Calle Jaime III, 34"

     

67

     

3

     

U

On the other hand, I have tested the regular expression with the data you have put on a page called link and it works correctly as you can see in the following page

If you also want to remove the quotes and the comma you can do the following:

String line = "1234,\"Calle Jaime III, 34\", 67,3,U";
String[] tokens = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
for(String t : tokens) {
    t = t.replace(",","");
    t = t.replace("\"", "");
    System.out.println("> "+t);
}

so that it looks exactly like the data you want to get.

    
answered by 24.05.2017 / 13:29
source
2

If the patterns that you have are exactly those, you can do in each record replaceAll (",", "") with this you would only remove the comma that "bothers" for the split or StringTokenizer of the first case of the example, since the other commas are not separated by spaces. Then you do the split or StringTokenizer in normal form and finally a new replaceAll ("\" "," ") to remove all the quotes. Repeating this procedure in each record should leave it as the expected end result. If you have more patterns I put all the examples and we keep thinking ...

    
answered by 23.05.2017 в 23:30