Create named files that include rare characters


I've been looking for too much but I have not found the answer, the case is this:

  • With PHP I have to get the list of the files of a directory.
  • I have to save the list in a file with JSON format, for which I use json_encode() .
  • I saved the saved file from a client with Python 2.7.3
  • The file must be read and files created from the names saved in the file.
  • Here's how I do it:

    header('Content-Type: application/json; charset=UTF-8');
    function listar_directorios_ruta($repository) {
        $response = array();
           if (is_dir($repository))
              $files = new FilesystemIterator($repository);
              foreach ($files as $file) {
                 if ($file->getFilename()[0] === '.') continue;
                 array_push($response, array(
                    'name' => $file->getFilename()
        catch(Exception $ex)
       return $response;
    $response['files'] = listar_directorios_ruta('directorio');
    $responseJSON = json_encode($response);
    file_put_contents('listado_archivos.json', $responseJSON);

    In the directory I have a single file called ñ.txt, when creating the file if ñ.txt appears, up to here "everything is fine".

    I do the following in Python (command line):

    >>>test = open('listado_archivos.json').read()
    >>>print test
    >>>newFile = open(test, 'w')
    >>>createFile.write('Contenido para el archivo')

    I try to show the contents of the file:

    # cat ñ.txt
    cat: can't open 'ñ.txt': No such file or directory

    But if I do the following with Python directly, if it works:

    >>> test = 'ñ.txt'
    >>> test
    >>> print test
    >>> newFile = open(test, 'w')
    >>> newFile.write('Contenido para el archivo')
    >>> newFile.close()
    >>> quit()
    # cat ñ.txt
    Contenido para el archivo

    The coding of the ñ when it comes from PHP is different from that of Python.

    How to make PHP correctly translate or interpret the ñ so that Python can create the file correctly?

    Greetings ...

    asked by Rodolfo 12.01.2018 в 18:56

    1 answer


    What is apparently happening is that both languages are working with Unicode (UTF-8 in particular) but using different but canonically equivalent characters.

    In Unicode, a Unicode entity (code-point) is a composite character that can be defined as a sequence of other characters. ( ó = 0 + ' , ñ = n + ~ , à = a + , etc)

    PHP is sending ñ as a sequence U+006E + U+0303 ( n + ~ ). On the other hand Python uses the code-point U+00F1 ( ñ ), which is a compound character.

    They are considered canonically equivalent but not the same , they have the same appearance and meaning when they are printed but they are formed with different code-points.

    From the Python side, what you can do is normalize the encoded UTF-8 string you receive to get its composite form. For this we can use unicodedata.normalize of the standard library:

    >>> test = 'n\xcc\x83.txt'
    >>> test = unicodedata.normalize('NFC', test.decode("UTF-8")).encode("UTF-8")
    >>> test

    In PHP you can use Normalizer :: normalize .

    answered by 13.01.2018 / 01:19