There are millions of reasons why HTML should not be manipulated with String
or regex
methods. There is a lot of information on the web, but not to go into so much detail, the following HTML would make most of those attempts fail:
<!-- quiero eliminar hasta el primer </div> -->
<DIV id='elid'>soy la primera linea</DIV
><div>soy la segunda linea</div>
It's the right way to do it, because it represents HTML as a document with nodes, and will prevent future headaches.
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.InputSource;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
To convert a String
to Document
:
public static Document loadXMLFromString(String xml) throws Exception
{
xml = "<Wrapper>" + xml + "</Wrapper>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
return builder.parse(is);
}
And then, we go through all the nodes that are in the root of the document, until we find the first <div>
. From there, we add to a StringBuilder
.
String texto = "<!-- quiero eliminar hasta el primer </div> -->\n<DIV id='elid'>soy la primera linea</DIV \n ><div>soy la segunda linea</div>";
//String -> doc
Document doc = loadXMLFromString(texto);
//Construir el serializer y sacar la declaración XML
DOMImplementationLS lsImpl = (DOMImplementationLS)doc.getImplementation().getFeature("LS", "3.0");
LSSerializer lsSerializer = lsImpl.createLSSerializer();
lsSerializer.getDomConfig().setParameter("xml-declaration", false);
//Bucle en todos los nodos de la raíz
Node docRoot = doc.getDocumentElement();
NodeList childNodes = docRoot.getChildNodes();
StringBuilder sb = new StringBuilder();
Boolean divEncontrado = false;
for (int i = 0; i < childNodes.getLength(); i++) {
if (!divEncontrado) {
//Se encontró?
divEncontrado = childNodes.item(i).getNodeName().equalsIgnoreCase("div");
} else {
//Si se encontró antes, agregarlo al StringBuilder
sb.append(lsSerializer.writeToString(childNodes.item(i)));
}
}
String resultado = sb.toString();
System.out.println(resultado);
Result:
<div>soy la segunda linea</div>
Demo on ideone.com