Select and format a Text from a Tkinter Text

2

Inside a Tkinter Text widget, in Python, I'm uploading a document.
After analyzing the content and locating certain words of it, I must select them to give them a certain format, specifically, give them a background color and put them in bold. For example, if I have this content, I should select the words that I already put in italics (and give it a background color and put them in bold):

This is a any text where I have to select certain words and give them some format type.

So, could someone tell me, with an example, how to select a specific chunk or chunks (word or words) from the content of a Text, then give them the aforementioned format? I guess you can select more than one word in different separate places of the same content.
I have seen some of the documentation but I have not understood how it is done.

Work environment: Python2.7.x - Ubuntu 16.04

Edited

I have two Tkinter Text widgets in the main window:

  • In the first Text element, called, for example txt_ini , I load the text of a document, for example, the one I have already put twice before to have more repetitions and, in addition, consider changes or line breaks :
  

This is any text where I have to select certain words and give them some kind of format.

     

This is any text where I have to select certain words and give them some kind of format.

I have to locate certain words in that loaded content. The words to search are the keys of the following dictionary. And that will be replaced by other words that are the values of the same dictionary. This is the dictionary in question:

dicc_palabras = {
    1: ['cualquiera', 'al azar'],
    2: ['seleccionar', 'elegir'],
    3: ['palabras', 'partes'],
    4: ['algún', 'este']
}

Well, once the content has been processed and certain words have been found, the same content will be loaded, but now with those words located to which a specific style format will have been applied (in this editor, only bold but it could be to change the background color or other options). Something similar to this:

  

This is a text where I have to select certain words and give them some type of format.

     

This is a text where I have to select certain words and give them some type of format.

  • In the second Text, called, for example, txt_fin , the content will appear with the words replaced. Substitute words to which another style format will have been applied:
  

This is a random text where I have to choose certain parts and give them this type of format.

     

This is a random text where I have to choose certain parts and give them this type of format.

This whole process so that the comparison of the text before and after processing is clearer, to see the original content highlighting the localized words and the already processed content, also highlighting the > substitute words .

    
asked by zacktagnan 21.05.2018 в 07:44
source

2 answers

1

Ok, I have the result I wanted. At least, in this test I am doing well. We will see later when it happens to the final prototype.

Starting with the code, directly.

# encoding: utf-8


# Solución de importación para que las palabras con tildes
# u otros caracteres raros sean bien consideradas como en Python 3.x
#from __future__ import unicode_literals
# Se opta por usar el decode('utf-8') cuando haga falta (ver más abajo)


from Tkinter import *

import os


class MiTkinter(Tk):

    def __init__(self, *args, **kwargs):
        Tk.__init__(self, *args, **kwargs)

        frame_btn = Frame(self, bg='black')
        frame_btn.pack(fill='both')
        frame_btn.config(padx=10, pady=10)

        self.btn_toggle_01 = Button(frame_btn, text='Cambio a BLANCO', command=self.toggle_accion)
        self.btn_toggle_01.grid(row=0, column=0, sticky=E)
        #self.btn_toggle_01.config(padx=6, pady=4)

        self.btn_toggle_02 = Button(frame_btn, text='Fondo en NEGRO', relief='sunken', command=self.toggle_accion)
        self.btn_toggle_02.grid(row=0, column=1, sticky=W)
        #self.btn_toggle_02.config(padx=6, pady=4)

        frame_txt = Frame(self, bg='grey')
        frame_txt.pack()
        frame_txt.config(padx=10, pady=10)

        # Valor(es) Predeterminado(s)
        self.bg_color = 'black'
        self.fg_color = 'white'

        self.txt_ini = Text(frame_txt, wrap=WORD, bd=0)
        self.txt_ini.grid(row=0, column=0, padx=15, pady=15, sticky=N+S+E+W)
        self.txt_ini.config(bd=0, padx=6, pady=4, font=('Consolas', 12), selectbackground='lightblue', width=22, height=16, bg=self.bg_color, fg=self.fg_color, insertbackground=self.fg_color, highlightbackground=self.bg_color, highlightcolor=self.fg_color)

        self.txt_fin = Text(frame_txt, wrap=WORD, bd=0)
        self.txt_fin.grid(row=0, column=1, padx=15, pady=15, sticky=N+S+E+W)
        self.txt_fin.config(bd=0, padx=6, pady=4, font=('Consolas', 12), selectbackground='lightblue', width=22, height=16, bg=self.bg_color, fg=self.fg_color, insertbackground=self.fg_color, highlightbackground=self.bg_color, highlightcolor=self.fg_color)

        dicc_palabras = {
            1: ['cualquiera', 'al azar'],
            2: ['seleccionar', 'elegir'],
            3: ['palabras', 'partes'],
            4: ['algún', 'este']
        }

        txt_ini_contenido = '''Esto es un texto cualquiera dónde tengo que seleccionar ciertas palabras y darles algún tipo de formato.

Esto es un texto cualquiera dónde tengo que seleccionar ciertas palabras y darles algún tipo de formato.'''
        txt_fin_contenido = '''Esto es un texto al azar dónde tengo que elegir ciertas partes y darles este tipo de formato.

Esto es un texto al azar dónde tengo que elegir ciertas partes y darles este tipo de formato.'''

        self.txt_ini.insert(1.0, txt_ini_contenido)
        self.txt_fin.insert(1.0, txt_fin_contenido)

        # Número de líneas del contenido de un determinado Text
        txt_ini_num_lin = int(self.txt_ini.index('end-1c').split('.')[0])
        txt_fin_num_lin = int(self.txt_fin.index('end-1c').split('.')[0])

        self.resaltar_palabra(self.txt_ini, txt_ini_num_lin, dicc_palabras)
        self.resaltar_palabra(self.txt_fin, txt_fin_num_lin, dicc_palabras)


    def resaltar_palabra(self, _Text, _num_lin, _dicc_palabras):

        if(_Text == self.txt_ini):
            #print('Formateando el "txt_ini"...')
            _dicc_i = 0
            bg_color = 'red'

        elif(_Text == self.txt_fin):
            #print('Formateando el "txt_fin"...')
            _dicc_i = 1
            bg_color = 'green'

        for lin in range(1, _num_lin+1):
            lin_str = '{}.0'.format(lin)

            for _k_ in _dicc_palabras:
                palabra = _dicc_palabras[_k_][_dicc_i]
                # Al querer calcular la longitud de cada palabra, se observa
                # que, al menos, por cada carácter con tilde, se suma 1
                # a la longitud final de la palabra, como, por ejemplo,
                # en el caso de "algún".
                # haciendo un simple len('algún'), en vez de 5, sale 6.
                # Pero si, primero, se pasa la palabra a unicode el
                # resultado sale correcto, osea,
                #     len('algún'.decode('utf-8')) >> 5  
                # Otra opción, en vez del decode() es descomentar
                # el import referido al "unicode_literals"
                palabra_len = len(palabra.decode('utf-8'))

                # search()
                # Recogiendo la posición en la que se sitúa la palabra
                # en el contenido.
                #   -> se busca una cadena (o una expresión regular)
                #   desde la posición '1.0' (línea 1, carácter 0).
                #   -> el índice devuelto es del estilo X.Y
                #       >> X, el número de línea contando desde 1.
                #       >> Y, el número de carácter contando desde 0.
                plbra_i_ini = _Text.search(palabra, lin_str)
                # Índice de la línea en la que está situada la palabra
                plbra_i_linea = plbra_i_ini.split('.')[0]
                plbra_i_ini = plbra_i_ini.split('.')[1]
                # Calculando el índice final tras el fin de la palabra
                plbra_i_fin = int(plbra_i_ini) + palabra_len
                # construyendo índices válidos para el tag_add
                plbra_i_ini = '{}.{}'.format(plbra_i_linea, plbra_i_ini)
                plbra_i_fin = '{}.{}'.format(plbra_i_linea, plbra_i_fin)

                # Configurando y añadiendo la TAG en el lugar adecuado, según
                # los índices configurados, para formatear la palabra deseada
                _Text.tag_configure('highlightline', background=bg_color, font='helvetica 11 bold')
                _Text.tag_add('highlightline', plbra_i_ini, plbra_i_fin)


    '''
    Otra función EXTRA de regalo para el que le interese:
    Cambio de estilo en los Text según se pulse un botón u otro
    '''
    def toggle_accion(self):

        if self.btn_toggle_01.config('relief')[-1] == 'sunken':
            self.btn_toggle_01.config(relief='raised', text='Cambio a BLANCO')
        else:
            self.btn_toggle_01.config(relief='sunken', text='Fondo en BLANCO')

        if self.btn_toggle_02.config('relief')[-1] == 'sunken':
            self.btn_toggle_02.config(relief='raised', text='Cambio a NEGRO')
        else:
            self.btn_toggle_02.config(relief='sunken', text='Fondo en NEGRO')

        if self.btn_toggle_01.config('relief')[-1] == 'sunken':
            self.bg_color = 'white'
            self.fg_color = 'black'
        else:
            self.bg_color = 'black'
            self.fg_color = 'white'

        self.txt_ini.config(bg=self.bg_color, fg=self.fg_color, insertbackground=self.fg_color, highlightbackground=self.bg_color, highlightcolor=self.fg_color)
        self.txt_fin.config(bg=self.bg_color, fg=self.fg_color, insertbackground=self.fg_color, highlightbackground=self.bg_color, highlightcolor=self.fg_color)


if __name__ == '__main__':
    # Limpia consola antes de empezar ejecución
    os.system('clear')

    # Tk (Raíz) objeto raíz por defecto
    # ==========================================================
    root = MiTkinter()

    root.title('Probando ~ Acción Toggle con Botones y Selección de Palabras')
    root.geometry('{}x{}+{}+{}'.format(548, 412, 400, 100))

    root.mainloop()

Here is an image of how I am left with the final result of the test:

Well, that goes there for every person that you can use. Also, experts to see what they say about the solution, if they want to make improvements.

Well, in the end looking for something else online or reviewing some more documentation about it I have come to get the result exposed. Also, any collaboration or help has been good to find the desired result.

Some links about it:

And, by the way, an EXTRA on button actions, a kind of Toggle.

Greetings.

Edited

Well, in the end, as it was read in one of the comments of this answer, the code exposed in the top part of it does not consider the repetition of the word sought if it is on the same line.

So, based on the other suggested solution, after trying to understand it and navigating a little more through the extensive Internet, I added a solution that, now, considers the aforementioned repetitions.

Among other things, I have left the for that ran through each line of the content of the Text to be treated because, in the end, it is not something useful for the objective to be achieved.

I have added more content with more repeated words to each of the Text.

Well this is the code:

# encoding: utf-8


# Solución de importación para que las palabras con tildes
# u otros caracteres raros sean bien consideradas como en Python 3.x
#from __future__ import unicode_literals
# Se opta por usar el decode('utf-8') cuando haga falta (ver más abajo)


from Tkinter import *

import os


class MiTkinter(Tk):

    def __init__(self, *args, **kwargs):
        Tk.__init__(self, *args, **kwargs)

        frame_btn = Frame(self, bg='black')
        frame_btn.pack(fill='both')
        frame_btn.config(padx=10, pady=10)

        self.btn_toggle_01 = Button(frame_btn, text='Cambio a BLANCO', command=self.toggle_accion)
        self.btn_toggle_01.grid(row=0, column=0, sticky=E)

        self.btn_toggle_02 = Button(frame_btn, text='Fondo en NEGRO', relief='sunken', command=self.toggle_accion)
        self.btn_toggle_02.grid(row=0, column=1, sticky=W)

        frame_txt = Frame(self, bg='grey')
        frame_txt.pack()
        frame_txt.config(padx=10, pady=10)

        # Valor(es) Predeterminado(s)
        self.bg_color = 'black'
        self.fg_color = 'white'

        self.txt_ini = Text(frame_txt, wrap=WORD, bd=0)
        self.txt_ini.grid(row=0, column=0, padx=10, pady=10, sticky=N+S+E+W)
        self.txt_ini.config(bd=0, padx=6, pady=4, font=('Consolas', 11), selectbackground='lightblue', width=24, height=16, bg=self.bg_color, fg=self.fg_color, insertbackground=self.fg_color, highlightbackground=self.bg_color, highlightcolor=self.fg_color)

        self.txt_fin = Text(frame_txt, wrap=WORD, bd=0)
        self.txt_fin.grid(row=0, column=1, padx=10, pady=10, sticky=N+S+E+W)
        self.txt_fin.config(bd=0, padx=6, pady=4, font=('Consolas', 11), selectbackground='lightblue', width=24, height=16, bg=self.bg_color, fg=self.fg_color, insertbackground=self.fg_color, highlightbackground=self.bg_color, highlightcolor=self.fg_color)

        dicc_palabras = {
            1: ['cualquiera', 'al azar'],
            2: ['seleccionar', 'elegir'],
            3: ['palabras', 'partes'],
            4: ['algún', 'cierto']
        }

        txt_ini_contenido = '''Esto es un texto cualquiera, bien digo cualquiera, dónde tengo que seleccionar diversas palabras y darles algún tipo de formato. Algún tipo de resultado es el que da. Esperando que "álgún" resultado, sea el mejor.

De algún modo, en cierto lugar del texto cualquiera, voy a seleccionar varias palabras y darles algún tipo de formato. Deseando que las palabras a seleccionar sean las adecuadas.'''
        txt_fin_contenido = '''Esto es un texto al azar, bien digo al azar, dónde tengo que elegir diversas partes y darles cierto tipo de formato. Cierto tipo de resultado es el que da. Esperando que "ciérto" resultado sea el mejor.

De cierto modo, en algún lugar del texto al azar, voy a elegir varias partes y darles este tipo de formato. Deseando que las partes a elegir sean las adecuadas.'''
        self.txt_ini.insert(1.0, txt_ini_contenido)
        self.txt_fin.insert(1.0, txt_fin_contenido)

        self.resaltar_palabra(self.txt_ini, dicc_palabras)
        self.resaltar_palabra(self.txt_fin, dicc_palabras)


    def resaltar_palabra(self, _Text, _dicc_palabras):

        if(_Text == self.txt_ini):
            print('\nFormateando el "txt_ini"...')
            print('=================================================')
            _dicc_i = 0
            bg_color = 'red'
            tag_nom = 'orig'

        elif(_Text == self.txt_fin):
            print('\nFormateando el "txt_fin"...')
            print('=================================================')
            _dicc_i = 1
            bg_color = 'green'
            tag_nom = 'anon'

        for _k_ in _dicc_palabras:
            palabra = _dicc_palabras[_k_][_dicc_i]
            # Al querer calcular la longitud de cada palabra, se observa
            # que, al menos, por cada carácter con tilde, se suma 1
            # a la longitud final de la palabra, como, por ejemplo,
            # en el caso de "algún".
            # haciendo un simple len('algún'), en vez de 5, sale 6.
            # Pero si, primero, se pasa la palabra a unicode el
            # resultado sale correcto, osea,
            #     len('algún'.decode('utf-8')) >> 5  
            # Otra opción, en vez del decode() es descomentar
            # el import referido al "unicode_literals"
            palabra_len = len(palabra.decode('utf-8'))

            # search()
            # Recogiendo la posición en la que se sitúa la palabra
            #   -> se busca una cadena (o una expresión regular)
            #   desde la posición '1.0' (línea 1, carácter 0).
            #   -> el índice devuelto es del estilo X.Y
            #       >> X, el número de línea contando desde 1.
            #       >> Y, el número de carácter contando desde 0.

            '''
            Podría darse el caso de haber más de una repetición de palabra
            en la misma línea analizada:
                Por eso:
                    -> Hay que considerar las posibles repeticiones de la palabra en la misma línea recorrida.
            '''

            index = '1.0'
            while True:
                plbra_i = _Text.search(palabra, index, stopindex=END)
                if not plbra_i:
                    break
                plbra_i_ini = int(plbra_i.split('.')[0])
                plbra_i_fin  = int(plbra_i.split('.')[1]) + palabra_len
                coords = '{}.{}'.format(plbra_i_ini, plbra_i_fin)
                _Text.tag_add(tag_nom, plbra_i, coords)
                _Text.tag_configure(tag_nom, background=bg_color, font='helvetica 11 bold')
                index = coords


    def toggle_accion(self):

        if self.btn_toggle_01.config('relief')[-1] == 'sunken':
            self.btn_toggle_01.config(relief='raised', text='Cambio a BLANCO')
        else:
            self.btn_toggle_01.config(relief='sunken', text='Fondo en BLANCO')

        if self.btn_toggle_02.config('relief')[-1] == 'sunken':
            self.btn_toggle_02.config(relief='raised', text='Cambio a NEGRO')
        else:
            self.btn_toggle_02.config(relief='sunken', text='Fondo en NEGRO')

        if self.btn_toggle_01.config('relief')[-1] == 'sunken':
            self.bg_color = 'white'
            self.fg_color = 'black'
        else:
            self.bg_color = 'black'
            self.fg_color = 'white'

        self.txt_ini.config(bg=self.bg_color, fg=self.fg_color, insertbackground=self.fg_color, highlightbackground=self.bg_color, highlightcolor=self.fg_color)
        self.txt_fin.config(bg=self.bg_color, fg=self.fg_color, insertbackground=self.fg_color, highlightbackground=self.bg_color, highlightcolor=self.fg_color)


if __name__ == '__main__':
    # Limpia consola antes de empezar ejecución
    os.system('clear')

    # Tk (Raíz) objeto raíz por defecto
    # ==========================================================
    root = MiTkinter()

    root.title('Probando ~ Acción Toggle con Botones y Selección de Palabras')
    root.geometry('{}x{}+{}+{}'.format(568, 402, 400, 100))

    root.mainloop()

And the result in the sample image is this:

As you can see, every repetition of a word is considered in the same line, as in other lines before or after. Of course, as you can see, too, what is considered is the exact word, that is, if you search for the word some , you will search, exactly, the word written in that way. If the word is capitalized, without a tilde, ..., it will not be considered. For that, it would be necessary to consider more parameters and, I suppose, that for those cases, it will be better to use a Regular Expression (Regex).

Well, that, for anyone who can serve you. Thanks for the collaborations.

    
answered by 22.05.2018 / 16:25
source
1

To replace a certain part of the text simply use the delete and insert methods together:

text_widget.delete(indice1, indice2)
text_widget.insert(indice1, "nuevo texto")

In order for a part of the text to present a certain format, labels are used, for example:

# Creamos una etiqueta
widget_text.tag_config('nombre_etiqueta',
                       background='#ff4d4d',
                       font=("Helvetica", 12, "bold")
                      )

# Aplicamos la etiqueta a un fragmento del texto usando indices de nuevo
widget_text.tag_add("nombre_etiqueta", indice_inicial, indice_final)

With the above we have the base, now it only remains to know how to obtain the indexes. To search for text in a Text widget we have at our disposal the method search .

Since you intend to find and replace a list of words, I think the best option is to use a regular expression:

regex = "\y(?:Hola|StackOverflow)\y)"

The previous expression searches for the words 'Hello' and 'StackOverflow'. Keep in mind that there are differences between the syntax of regular expressions in Tk-Tcl and Python (module re ), one of them affects the specifier for word boundaries , in Python and most modern engines use \b / \B while Tk-Tcl uses \y / \Y .

When searching and replacing words, it is easier to go through the content from the end to the beginning and not the other way around. The reason is that by replacing we can alter the indexes (by increasing or reducing the number of characters in the text). If we do it from back to front we never alter the indexes in the text that remains to be traversed and we will not have to make calculations to see the correspondence between the indexes of both widgets Text .

Let's see an example:

# -*- coding: utf-8 -*-

import Tkinter as tk
from tkFont import Font




class App(tk.Tk):
    def __init__(self, *args, **kwargs):
        tk.Tk.__init__(self, *args, **kwargs)
        self.text1 = tk.Text(self)
        self.text2 = tk.Text(self)
        self.search_and_replace_btn = tk.Button(self, text="Buscar y sustituir",
                                                command=self.search_and_replace
                                               )

        self.normal_font = Font(family="Helvetica", size=12)
        self.select_font = Font(family="Helvetica", size=12, weight="bold" )
        self.text1.configure(font=self.normal_font)
        self.text2.configure(font=self.normal_font)

        self.text1.tag_config('found', background='#ff4d4d', font=self.select_font)
        self.text2.tag_config('replaced', background='#d2ff4d', font=self.select_font)


        # Insertamos texto de ejemplo en el primer Text
        text = (u"Hola StackOverflow, tkinter es un binding de la biblioteca gráfica Tcl/Tk para Python.\n"
                u"Este ejemplo busca palabras en un array y las sustituye por otras."
                )
        print(text)
        self.text1.insert('1.0', text)

        self.text1.pack()
        self.text2.pack()
        self.search_and_replace_btn.pack()




    def search_and_replace(self):
        sustituciones = {u"tkinter":       u"Tkinter",
                         u"StackOverflow": u"StackOverflow en español",
                         u"biblioteca":    u"librería",
                         u"array":         u"lista",
                         u"sustituye":     u"cambia"}

        regex = r'\y(?:{})\y'.format('|'.join(sustituciones.keys()))

        self.text2.delete('1.0', tk.END)
        self.text2.insert('1.0', self.text1.get('1.0', tk.END))
        self.text1.tag_remove("failed", '1.0', tk.END)
        self.text2.tag_remove("passed", '1.0', tk.END)

        count = tk.IntVar(self)
        pos = self.text1.index("end")

        while True:
            pos = self.text1.search(regex,  pos, "1.0",  backwards=True, regexp=True, count=count)
            if not pos:
                break

            idx2  ='{}+{}c'.format(pos, count.get())

            self.text1.tag_add("found", pos, idx2)

            new = sustituciones[self.text1.get(pos, idx2)]
            self.text2.delete(pos, idx2)
            self.text2.insert(pos, new)
            self.text2.tag_add("replaced", pos, '{}+{}c'.format(pos, len(new)))



if __name__ == "__main__":
    app = App()
    app.mainloop()

This is just a simplified example, playing with the expression we can modify the search criteria to our liking.

  

Note: To make the code compatible with Python 3, simply change the imports by:

import tkinter as tk    
from tkinter.font import Font

For more information on the operation of the indexes, the cursor and the labels can be helpful:

Documentation of effbot on the Text widget

    
answered by 22.05.2018 в 13:51