I tell you that this topic has haunted my head all day and for some time I had met and had done the remove, but today I have to investigate what is happening. First of all this research was done with JDK 8 (1.8.0_144)
just in case.
The first thing I found is that when the String.split
method is passed as a parameter, a regular expression ( regex
) delegates its resolution to the Pattern.split
method.
The Patter.split
method creates a Matcher
object according to regex
and the string to which it is applied.
For each coincidence that the Matcher finds the start of the string it creates a subsequence of zero length ( 0
) that adds to the result array, String match = cadena.subSequence(indexStart, indexEnd);
, where indexStart
equals indexEnd
. Subsequence is resolved by the String.substring(int beginIndex, int endIndex)
method.
And this is why the empty string is created ( ""
), but the Pattern.split
method does not ignore them and adds them to the output array.
I have taken the code of the method Pattern.split
and I have modified it, in such a way that it ignores the empty chains, to this method small changes were made
a)
This is método estatico
only to be able to call it from the main
method.
b)
The second parameter String regex
was added in order to create the Matcher
.
c)
We would comment on the original Matcher
of the method in order to create your own. //Matcher m = matcher(input);
d)
We create our own Matcher
: Pattern pa = Pattern.compile(regex); m = pa.matcher(input);
and finally we add the validation to not add empty strings to the result
if(!match.equals("")){
matchList.add(match);
}
I leave the modified method and below I leave the original method of Patter.split()
public static String[] myPatternSplit(CharSequence input, String regex, int limit) {
int index = 0;
boolean matchLimited = limit > 0;
ArrayList<String> matchList = new ArrayList<>();
//Matcher m = matcher(input); <-- comentariamos esta linea
Pattern pa = Pattern.compile(regex); // <-- adicionamos un Patter
Matcher m = pa.matcher(input); // <-- creamos un Matcher
// Add segments before each match found
while(m.find()) {
if (!matchLimited || matchList.size() < limit - 1) {
if (index == 0 && index == m.start() && m.start() == m.end()) {
// no empty leading substring included for zero-width match
// at the beginning of the input char sequence.
continue;
}
String match = input.subSequence(index, m.start()).toString();
// Adicionamos validación para no ingresar vacios
if(!match.equals("")){
matchList.add(match);
}
index = m.end();
} else if (matchList.size() == limit - 1) { // last one
String match = input.subSequence(index,
input.length()).toString();
matchList.add(match);
index = m.end();
}
}
// If no match was found, return this
if (index == 0)
return new String[] {input.toString()};
// Add remaining segment
if (!matchLimited || matchList.size() < limit)
matchList.add(input.subSequence(index, input.length()).toString());
// Construct result
int resultSize = matchList.size();
if (limit == 0)
while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
resultSize--;
String[] result = new String[resultSize];
return matchList.subList(0, resultSize).toArray(result);
}
Original code
public String[] split(CharSequence input, int limit) {
int index = 0;
boolean matchLimited = limit > 0;
ArrayList<String> matchList = new ArrayList<>();
Matcher m = matcher(input);
// Add segments before each match found
while(m.find()) {
if (!matchLimited || matchList.size() < limit - 1) {
if (index == 0 && index == m.start() && m.start() == m.end()) {
// no empty leading substring included for zero-width match
// at the beginning of the input char sequence.
continue;
}
String match = input.subSequence(index, m.start()).toString(); <-- Aca genera la cadena vacia
matchList.add(match); <-- aca la adiciona
index = m.end();
} else if (matchList.size() == limit - 1) { // last one
String match = input.subSequence(index,
input.length()).toString();
matchList.add(match);
index = m.end();
}
}
// If no match was found, return this
if (index == 0)
return new String[] {input.toString()};
// Add remaining segment
if (!matchLimited || matchList.size() < limit)
matchList.add(input.subSequence(index, input.length()).toString());
// Construct result
int resultSize = matchList.size();
if (limit == 0)
while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
resultSize--; <-- aca elimina las cadenas vacias al final del arreglo
String[] result = new String[resultSize];
return matchList.subList(0, resultSize).toArray(result);
}