Delete text until the first occurrence in Java

7

How can you eliminate part of a text until you find the first occurrence? Also eliminating the occurrence found.

Sample text:

<div>soy la primera linea</div><div>soy la segunda linea</div>

the occurrence would be </div> so the final text should be

<div>soy la segunda linea</div>
    
asked by Webserveis 28.08.2016 в 10:40
source

5 answers

4

Use indexOf to find the first occurrence of </div> , and get the substring with substring :

String str = "<div>soy la primera linea</div><div>soy la segunda linea</div>";
str = str.substring(str.indexOf("</div>") + 6);
System.out.println(str); // <div>soy la segunda linea</div>
    
answered by 28.08.2016 / 14:32
source
6

Without using regular expressions, you could do something like this:

public class Ejemplo {

  public static void main(String[ ] arg) 
  {
    String cadena = "<div>soy la primera linea</div><div>soy la segunda linea</div>";
    System.out.println(cadena);

    String patron = "</div>";
    int posicion = cadena.indexOf(patron);

    System.out.println(cadena.subSequence(posicion+patron.length(),cadena.length()));

  }
}

Exit:

yo@pc:/tmp⟫ javac Ejemplo.java  
yo@pc:/tmp⟫ java Ejemplo
<div>soy la primera linea</div><div>soy la segunda linea</div>
<div>soy la segunda linea</div> 

Check out the java api reference on the class String .

    
answered by 28.08.2016 в 14:31
5

There are millions of reasons why HTML should not be manipulated with String or regex methods. There is a lot of information on the web, but not to go into so much detail, the following HTML would make most of those attempts fail:

<!-- quiero eliminar hasta el primer </div> -->
<DIV id='elid'>soy la primera linea</DIV 
><div>soy la segunda linea</div>

Using DOM

It's the right way to do it, because it represents HTML as a document with nodes, and will prevent future headaches.

import java.io.StringReader;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.InputSource;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;


To convert a String to Document :

public static Document loadXMLFromString(String xml) throws Exception
{
    xml = "<Wrapper>" + xml + "</Wrapper>";
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    InputSource is = new InputSource(new StringReader(xml));
    return builder.parse(is);
}


And then, we go through all the nodes that are in the root of the document, until we find the first <div> . From there, we add to a StringBuilder .

String texto = "<!-- quiero eliminar hasta el primer </div> -->\n<DIV id='elid'>soy la primera linea</DIV \n ><div>soy la segunda linea</div>";

//String -> doc
Document doc = loadXMLFromString(texto);

//Construir el serializer y sacar la declaración XML
DOMImplementationLS lsImpl = (DOMImplementationLS)doc.getImplementation().getFeature("LS", "3.0");
LSSerializer lsSerializer = lsImpl.createLSSerializer();
lsSerializer.getDomConfig().setParameter("xml-declaration", false);

//Bucle en todos los nodos de la raíz
Node docRoot = doc.getDocumentElement();
NodeList childNodes = docRoot.getChildNodes();
StringBuilder sb = new StringBuilder();
Boolean divEncontrado = false;
for (int i = 0; i < childNodes.getLength(); i++) {
    if (!divEncontrado) {
        //Se encontró?
        divEncontrado = childNodes.item(i).getNodeName().equalsIgnoreCase("div");
    } else {
        //Si se encontró antes, agregarlo al StringBuilder
        sb.append(lsSerializer.writeToString(childNodes.item(i)));
    }
}
String resultado = sb.toString();

System.out.println(resultado);

Result:

<div>soy la segunda linea</div>

Demo on ideone.com

    
answered by 30.08.2016 в 11:20
4

You can try with the following regular expression:

<div>.*?</div>

This is:

String input = "<div>soy la primera linea</div><div>soy la segunda linea</div>";

System.out.println(
    input.replaceFirst("<div>.*?</div>", "")
);  // imprime "<div>soy la segunda linea</div>"
    
answered by 29.08.2016 в 17:01
1

I perform in this way:

    String pattern = "</div>";
    String cadena = "<div>soy la primera linea</div><div>soy la segunda linea</div>";
    System.out.println(cadena.substring(cadena.indexOf(pattern) + pattern.length()));

You get as output:

<div>soy la segunda linea</div>
    
answered by 29.08.2016 в 17:30