BufferedReader class of java

2

It turns out that I made a program where you get a String String or a text file txt where I should take it the occurrences of each word found within the text or string and also take the occurrences of punctuation and question marks, exclamation, and parentheses but when converting the text into a string the bufferedreder does not recognize these signs "¿", "¡" and puts it in the chain and when I print it I get this □ instead of the corresponding sign and when I enter the debugger it takes it as ' ' some solution?

    
asked by J.zer 02.04.2017 в 21:43
source

3 answers

0

Well nothing else I can say THANK YOU to those who were interested in answering and telling them that I found a solution to my problem was WORK DIRECT WITH ASCII This is the code fragment that solves my problem

public String AbrirATexto(File archivo){
            String contenido="";
            try {
                entrada = new FileInputStream(archivo);
                int ascci;
                while((ascci = entrada.read())!= -1){
                    char carcater = (char)ascci;
                    contenido += carcater;
                }
            } catch (Exception e) {
            }
            return contenido;
        }
    
answered by 04.04.2017 / 00:55
source
1

It may be that the problem has to do with the encode look at the following line:

BufferedReader in = new BufferedReader(
    new InputStreamReader(new FileInputStream(fileDir), "UTF8"));
    
answered by 02.04.2017 в 23:15
1

Check the Charset of your file and Check that you need to visualize it.

If you use the constructor InputStreamReader(InputStream is, String charset) try first with:

  • "ISO-8859-1" (most likely)
  • "US-ASCII"
  • "UTF-8" (less likely)

You can exclude the Charset by default in the tests, you get it with InputStreamReader#getEncoding() .

Alternatively you can use Charset#availableCharsets() to get a map of the Charset available to read the file.

Unfortunately, you can not determine the code of a text file, you can only check for errors and thus determine that no is the suspected code.

Even if it were a file in utf-8 that is then shown in ISO or ASCII, you would get 2-byte combinations instead of unknown singular characters, so you're probably trying to show an ISO-8859 file in a utf-8 output.

    
answered by 03.04.2017 в 04:52