1. With java.net.URL
The URL class already does it without you having to reinvent the wheel, even serves to validate it:
import java.net.URL;
import java.net.MalformedURLException;
String prueba = "http://www.devbg.org:8080/forum/index.php?busq=abc#sec1";
try{
URL miUrl = new URL(prueba);
System.out.println("protocolo = " + miUrl.getProtocol());
System.out.println("autoridad = " + miUrl.getAuthority());
System.out.println("dominio = " + miUrl.getHost());
System.out.println("puerto = " + miUrl.getPort());
System.out.println("ruta = " + miUrl.getPath());
System.out.println("búsqueda = " + miUrl.getQuery());
System.out.println("archivo = " + miUrl.getFile());
System.out.println("ancla = " + miUrl.getRef());
}catch(MalformedURLException ex){
System.out.println("URL inválida");
}
Result:
protocolo = http
autoridad = www.devbg.org:8080
dominio = www.devbg.org
puerto = 8080
ruta = /forum/index.php
búsqueda = busq=abc
archivo = /forum/index.php?busq=abc
ancla = sec1
Demo: link
2. With regex
But if you are interested in reinventing the wheel (although it does not make sense), instead of looking to separate the string, it is best to create a regular expression that matches with each of the parts, capturing the What interests us with each parenthesis:
^(?:([^:]*):(?://)?)?([^/]*)(/.*)?
-
^
≝ Start of the text
-
(?:([^:]*):(?://)?)?
≝ Optional part:
-
([^:]*)
≝ Group 1 (protocol) - all characters other than :
-
:
≝ followed by :
-
(?://)?
- optionally the //
of the start
-
([^/]*)
≝ Group 2 (host) - all characters other than /
-
(/.*)?
≝ Group 3 optional (path):
-
/
≝ a bar
-
.*
≝ followed by any number of characters
If the regex matches when we use Matcher.find () , the text that coincided with each pair of parentheses is obtained with Matcher.group (n) .
Code:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
String prueba = "http://www.devbg.org/forum/index.php";
//Variables para el regex
final String regex = "^(?:([^:]*):(?://)?)?([^/]*)(/.*)?";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(prueba);
//Ver si coincide el regex
if (matcher.find()) {
//Obtener el texto capturado por cada conjunto de paréntesis
String protocolo = matcher.group(1);
String dominio = matcher.group(2);
String ruta = matcher.group(3);
System.out.println("protocolo = " + protocolo);
System.out.println("dominio = " + dominio);
System.out.println("ruta = " + ruta);
}
Result:
protocolo = http
dominio = www.devbg.org
ruta = /forum/index.php
Demo: link