Separate a URL in its parts

2

I am trying to write a method that is called separarURL() , that receives a string of characters from a URL and returns an array of three Strings, containing the protocol, the server and the resource of the received URL.

Example:

  

link

The expected result is:

resultado[0]: "http" 
resultado[1]: "www.devbg.org" 
resultado[2]: "/forum/index.php"

I have been thinking of using an expression to separate the elements (protocol, server and resource of the URL) and use the split method to separate the URL into parts and thus be able to enter each part in the final String array, but not how to raise it The expression I had thought is this:

String exp = "(http)|(://)|(/)";

but I'm not sure it's very correct.

Could someone help me out with it?

    
asked by Thullnev 16.01.2018 в 16:04
source

2 answers

8

1. With java.net.URL

The URL class already does it without you having to reinvent the wheel, even serves to validate it:

import java.net.URL;
import java.net.MalformedURLException;
String prueba = "http://www.devbg.org:8080/forum/index.php?busq=abc#sec1";
try{
    URL miUrl = new URL(prueba);
    System.out.println("protocolo = " + miUrl.getProtocol());
    System.out.println("autoridad = " + miUrl.getAuthority());
    System.out.println("dominio   = " + miUrl.getHost());
    System.out.println("puerto    = " + miUrl.getPort());
    System.out.println("ruta      = " + miUrl.getPath());
    System.out.println("búsqueda  = " + miUrl.getQuery());
    System.out.println("archivo   = " + miUrl.getFile());
    System.out.println("ancla     = " + miUrl.getRef());
}catch(MalformedURLException ex){
    System.out.println("URL inválida");
}

Result:

protocolo = http
autoridad = www.devbg.org:8080
dominio   = www.devbg.org
puerto    = 8080
ruta      = /forum/index.php
búsqueda  = busq=abc
archivo   = /forum/index.php?busq=abc
ancla     = sec1

Demo: link


2. With regex

But if you are interested in reinventing the wheel (although it does not make sense), instead of looking to separate the string, it is best to create a regular expression that matches with each of the parts, capturing the What interests us with each parenthesis:

^(?:([^:]*):(?://)?)?([^/]*)(/.*)?
  • ^ ≝ Start of the text
  • (?:([^:]*):(?://)?)? ≝ Optional part:
    • ([^:]*) ≝ Group 1 (protocol) - all characters other than :
    • : ≝ followed by :
    • (?://)? - optionally the // of the start
  • ([^/]*) ≝ Group 2 (host) - all characters other than /
  • (/.*)? ≝ Group 3 optional (path):
    • / ≝ a bar
    • .* ≝ followed by any number of characters

If the regex matches when we use Matcher.find () , the text that coincided with each pair of parentheses is obtained with Matcher.group (n) .


Code:

import java.util.regex.Pattern;
import java.util.regex.Matcher;
String prueba = "http://www.devbg.org/forum/index.php";

//Variables para el regex
final String regex = "^(?:([^:]*):(?://)?)?([^/]*)(/.*)?";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(prueba);

//Ver si coincide el regex
if (matcher.find()) {
    //Obtener el texto capturado por cada conjunto de paréntesis
    String protocolo = matcher.group(1);
    String dominio   = matcher.group(2);
    String ruta      = matcher.group(3);

    System.out.println("protocolo = " + protocolo);
    System.out.println("dominio   = " + dominio);
    System.out.println("ruta      = " + ruta);
}

Result:

protocolo = http
dominio   = www.devbg.org
ruta      = /forum/index.php

Demo: link

    
answered by 16.01.2018 / 16:50
source
0

I recommend using the String.Split method to separate them. You can mark the limit of the number of bars you want to cut:

public class MyClass {
    public static void main(String args[]) {
      String string = "http://www.devbg.org/forum/index.php"; String[] parts = string.split("//");

String domain=parts[1].split("/")[0];
//aquí es donde debes decir que corte menos de 2 para que cuente todo
String suburl=parts[1].split("/", 2)[1];

System.out.println(domain); System.out.println(suburl);

    } }

Here you can see how it works

link

    
answered by 16.01.2018 в 16:19