Convert / decode character in html page and convert it into a list

0

I have a server in python 'simpleHTTP' running on my machine. He works as he should but there is a problem with the client side.

An error is raised when you try to read a directory that has an accent in its name.

This has happened to me now because I have always used the English language in my machines ... Because I made a new installation based on Arch Linux and this corrected the language based on the time zone this is the error that arose now .

A normal folder in English would be: Videos

But already in the Spanish / Latin language it would be: Videos

This is an example of what the client receives with:

WebServerResponse = urllib.urlopen(Url).read()

Exit:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html>
<body>
<h2>Directory listing for /home/user/</h2>
<hr>
<ul>
<li><a href="Descargas/">Descargas/</a>
<li><a href="Documentos/">Documentos/</a>
<li><a href="Escritorio/">Escritorio/</a>
<li><a href="Im%C3%A1genes/">Imágenes/</a>
<li><a href="Matrix.txt">Matrix.txt</a>
<li><a href="M%C3%BAsica/">Música/</a>
<li><a href="Plantillas/">Plantillas/</a>
<li><a href="P%C3%BAblico/">Público/</a>
<li><a href="V%C3%ADdeos/">Vídeos/</a>
</ul>
<hr>
</body>

As you can see for the folder: Videos this has value V% C3% ADdeos

By running URLReturn = urllib.urlopen (RemoteDevice) .read () and then executing the urllib.unquote (URLReturn) function it manages to give the correct value to each character ... The problem is that if I want to split the result into pieces with the split ('\ n') method it is re-encoded but this time it replaces the characters with others.

For example:

'href="V\xc3\xaddeos/">V\xc3\xaddeos/</a>'

Locale: es_PR.UTF-8

What should I do to change this behavior?

Edit: This is the server part

def HTTPServerStart(Secure=False):

   # Generate Certificate
   # sudo openssl req -new -x509 -keyout /etc/ssl/certs/LocalHTTPSSever.pem -out /etc/ssl/certs/LocalHTTPSSever.pem -days 365 -nodes
   # https://letsencrypt.org/

   if Secure:         
      import BaseHTTPServer, ssl
      ServerType="HTTPS"
      print "Local Secure HTTP Server Enabled"
      print "You May Need To Add A Certificate Exception In Your Browser To Access The Server" 
   else:
      import SocketServer 
      ServerType="HTTP"

      import SimpleHTTPServer, os

   Port = 8000

   if not Secure:
      Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
      Handler.extensions_map.update({'.webapp': 'application/x-web-app-manifest+json',});

   ServerPath = os.environ['HOME']
   os.chdir(ServerPath)

   try:
      if Secure:
         try:
            httpd = BaseHTTPServer.HTTPServer(("", Port), SimpleHTTPServer.SimpleHTTPRequestHandler)
            httpd.socket = ssl.wrap_socket (httpd.socket, certfile='/etc/ssl/certs/LocalHTTPSSever.pem', server_side=True)
         except ssl.SSLError:
            print "Error With SSL Certificate. Maybe You Will Need To Generate Another One" 
      else:
         httpd = SocketServer.TCPServer(("", Port), Handler)
   except socket.error:
      print "%s Server Already Running On Selected Port %s" % (ServerType, Port)
      return False 

   print "Serving %s Server On Address %s://%s:%s/" % (ServerType, ServerType.lower(), Address()['NetworkIP'], Port)

   try:
      httpd.serve_forever()
   except KeyboardInterrupt:
      return "Exit" 

Edit 2: System information

uname -a

  

Linux User 4.10.8-1-ARCH # 1 SMP PREEMPT Fri Mar 31 16:50:19 CEST 2017 x86_64 GNU / Linux

cat /etc/arch-release

  

Antergos Linux release 17.4 (ISO-Rolling)

cat /etc/os-release

  

NAME="Antergos Linux"   VERSION="17.4-ISO-Rolling"   ID="antergos"   ID_LIKE="arch"   PRETTY_NAME="Antergos Linux"

env

  

LC_COLLATE = en_PR.UTF-8   LANG = es_PR.UTF-8   GDMSESSION = xfce   TERM = xterm-256color   SHELL = / bin / bash

Operating System: Antergos

This operating system was installed and specified Puerto Rico as a country as well as a time zone.

The desktop environment is: XFCE

Edit 3:

HTTP Client

import urllib 
WebServerResponse = urllib.urlopen(RemoteFile).read()

So far everything is going more or less well. This is the output of the html that I showed above.

So if I use WebServer.split() then it is translated to 'href="V%C3%ADdeos/">V\xc3\xaddeos/</a>'

But if I execute urllib.unquote(WebServerResponse).split() then the output 'href="V\xc3\xaddeos/">V\xc3\xaddeos/</a>'

is generated     
asked by DarkXDroid 18.04.2017 в 23:04
source

0 answers