Unable to remove a space in java

0

Good morning,

I'm trying to remove a space with java. The value I get through scraping, for example the value 11 990,00 .

I've tried with:

.replace(" ", "")

.replace("\S", "")

.replace(" ", "z").replace("z", "") This has already been to test if I was not capturing the space well.

Well, there's no way ... I enter that value later in a database, and when I collect that data with PHP, I get the following value: 11�990.00 .

I also tried to remove it with PHP but without results using the following code:

sub_str(" ", "",$posts[0]["$pais"])

But he still does not take away the space. I do not know what I'm doing wrong. The example I'm showing is the price taken from this link .

    
asked by JetLagFox 26.04.2017 в 12:58
source

2 answers

4

It seems that it is giving you back a hard space. Did you try to remove it with .replace("\u00a0","") ?

    
answered by 26.04.2017 / 13:07
source
3

The HTML content of the page is returning an HTML entity   that corresponds to a blank space that can not be broken (in HTML, several blanks are unified in one, the blanks at the beginning or end of a paragraph is deleted, etc.).

It must be replaced, for example, with .replace(" ", "").replace(" ", " ") or directly with .replace(" ", "") .

If you want to be meticulous when analyzing web pages, check out StringEscapeUtils.unescapeHtml4 () to make other substitutions for HTML entities such as &gt; ( > ), &lt; ( < ), &amp; ( & ), etc.

Please note that this library defaults to ISO-8859-1 , so you must use the ISO equivalent of the entity ( &nbsp; corresponds to the 160 character or A0 in hexadecimal) . If you use another one that converts it to UNICODE then the character is 00A0 in hexadecimal (in UNICODE the characters occupy two octets).

Thus, summarizing, depending on the format or character set in which the string is:

  • HTML: .replace("&nbsp;", "")
  • ISO-8859-1: .replace("\a0","")
  • UNICODE: .replace("\u00a0","")
answered by 26.04.2017 в 13:02