How to use Tkinter ProgressBar

1

Hi, I have a very simple application where I load an .xlsx file in a dataframe with the pandas library and then generate a treeview to show it in table mode. With a file of 100 or 200 rows it takes very little, the problem comes when I try to load a file with 2000 or 3000 records that takes about 1 minute. This is normal? Is there any way to optimize this operation? In any case I would like to include a progress bar from the Tkinter ttk.Progressbar(parent, option=value, ...) library like this one, in order to show the user how much time is left. Is this possible? synchronize the loading time with the progress of the bar?

Thanks, regards!

I EDIT MY QUESTION BY ADDING CODE

import pandas as pd
import tkinter as tk
from tkinter import ttk
def getTreeViewUser(df, frame):
tv = ttk.Treeview(frame, columns=("#1", "#2", "#3", 
"#4",'#5','#6','#7','#8','#9',
                                  "#10","#11","#12","#13","#14","#15","#16","#17","#18"))
tv.heading('#0', text="Col0")
tv.heading('#1', text="Col1")
tv.heading('#2', text="Col2")
tv.heading('#3', text="Col3")
tv.heading('#4', text="Col4")
tv.heading('#5', text="Col5")
tv.heading('#6', text="Col6")
tv.heading('#7', text="Col7")
tv.heading('#8', text="Col8")
tv.heading('#9', text="Col9")
tv.heading('#10', text="Col10")
tv.heading('#11', text="Col11")
tv.heading('#12', text="Col12")
tv.heading('#13', text="Col13")
tv.heading('#14', text="Col14")
tv.heading('#15', text="Col15")
tv.heading('#16', text="Col16")
tv.heading('#17', text="Col17")
tv.heading('#18', text="Col18")


for ind in df.index:
#        rojo = df.values[ind][17]
#        tag=""
#        if(rojo==1):
#            tag="col18"

    tv.insert("", tk.END, text=ind+1,
                    values=(df.values[ind][0],df.values[ind][1],
                            df.values[ind][2],df.values[ind][3],
                            df.values[ind][4],df.values[ind][5],
                            df.values[ind][6],df.values[ind][7],
                            df.values[ind][8],df.values[ind][9],
                            df.values[ind][10],df.values[ind][11],
                            df.values[ind][12],df.values[ind][13],
                            df.values[ind][14],df.values[ind][15],
                            df.values[ind][16],
                            df.values[ind][17]))

#    tv.tag_configure('rojo', background='#F6CECE')

scrollbar_vertical = ttk.Scrollbar(frame, orient='vertical', command = tv.yview)
scrollbar_vertical.pack(side='right', fill=tk.Y)

scrollbar_horizontal = ttk.Scrollbar(frame, orient='horizontal', command = tv.xview)
scrollbar_horizontal.pack(side='bottom', fill=tk.X)

tv.configure(yscrollcommand=scrollbar_vertical.set)
tv.configure(xscrollcommand=scrollbar_horizontal.set)


return tv

def clickbutton():
file = pd.read_excel('Ejemplo.xlsx')
df = pd.DataFrame(file)
getTreeViewUser(df, main_window).pack(expand=True, fill='both')

class Application(ttk.Frame):
def __init__(self, main_window):
    super().__init__(main_window)
    main_window.geometry("600x500")
    self.button = tk.Button(main_window, text="Button", command=clickbutton).pack()
    self.progressbar = ttk.Progressbar(main_window)
    self.progressbar.pack()

main_window = tk.Tk()
app = Application(main_window)
app.mainloop()

In this example I have changed the text of the cells by privacy issues but I load it right away, I have the problem with the same number of rows but with different text in each one, and in some cell dates and BOOLEANOS. I do not know how to exactly recreate my problem and I tell you it takes almost a minute and the theme of the progressbar would be to show it in that window and synchronize it with the load of the treeview. Thanks in advance.

Example file: link

I have modified the file so that it simulates my problem a little better by filling some columns with dates and booleans as in the original file.

link

    
asked by Alfredo Lopez Rodes 05.09.2018 в 20:30
source

1 answer

0

To create a progress bar while adding each item (row) to Treeview the following details should be taken into account:

  • We can use the cycle that iterates over the rows of the DataFrame to determine the progress and to be able to create a deterministic progress bar.

  • However, the previous cycle itself is blocking, so the app will freeze while the tree is created when the mainloop is blocked and it can not respond to events and redraw the window. This is a problem because we will not be able to update anything in the interface when we stop responding.

  • You should only interact with widgets (and therefore add items to TreeView ) from the main thread, so using threads is not a solution.

The solution is to use the method update_idltasks every x iterations of the cycle in charge of creating the rows, which allows to resume the mainloop to process the pending events, at which time we take advantage to update the bar in addition.

It is important that you avoid as much as possible any superfluous operation in the previous cycle, when iterating thousands of times a small overhead per iteration can be very significant. The way in which iterates over the DataFrame itself is key too. For example, avoid indexing individually on each column:

values = [df.values[ind][0], df.values[ind][2], ..., df.values[17]] 

use iloc and get all you need at one time:

columns = ["A", "B", "C", ...]
values = df.iloc[ind, columns]

I leave a simplified example of how this can be implemented:

import pandas as pd
import tkinter as tk
from tkinter import ttk



class DataFrameTreeView(tk.Frame):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.tree_view = None
        self.hscrollbar = None
        self.vscrollbar = None

    def load_table(self, df, columns=None, columns_headers=None, chunk_size=100):
        """
        Args:
            path: cadena -> ruta al fichero .xlsx
            columns: list -> columnas a mostrar en la tabla, si es None se,muestran todas
            columns_headers: list -> Nombres para las cabeceras de las columnas,
                                     si es None se usan las cabeceras del DataFrame

            chunk_size: int -> Número de filas creadas por iteración
        """ 

        if columns is not None:
            dif = set(columns) - set(df.columns)
            if dif:
                raise ValueError(f"Columns: {tuple(dif)} are not in DataFrame") 
        else:
            columns = df.columns

        if columns_headers is not None:
            if  len(columns_headers) != len(df.columns):
                raise ValueError("headers length not mismath columns number")
        else:
            columns_headers = columns
        tk_col_names =[f"#{name}" for name in columns_headers]

        # Treeview y barras
        if self.tree_view is not None:
            self.tree_view.destroy()
            self.hscrollbar.destroy()
            self.vscrollbar.destroy()

        self.tree_view = ttk.Treeview(self, columns=tk_col_names)
        self.vscrollbar = ttk.Scrollbar(self, orient='vertical', command = self.tree_view.yview)
        self.vscrollbar.pack(side='right', fill=tk.Y)
        self.hscrollbar = ttk.Scrollbar(self, orient='horizontal', command = self.tree_view.xview)
        self.hscrollbar.pack(side='bottom', fill=tk.X)
        self.tree_view.configure(yscrollcommand=self.vscrollbar.set)
        self.tree_view.configure(xscrollcommand=self.hscrollbar.set)

        # Configuar columnas y cabeceras
        for name, header in zip(tk_col_names, columns_headers):
            self.tree_view.column(name, anchor=tk.W)
            self.tree_view.heading(name, text=header, anchor=tk.W)

        # Cargamos los items
        rows = df.shape[0]
        chunks = rows / chunk_size
        progress = 0
        step = 100 / chunks

        progress_bar = ttk.Progressbar(self, orient="horizontal",
                                        length=100, mode="determinate")
        progress_bar["value"] = progress 
        label = tk.Label(self, text="Cargando filas")
        label.place(relx=0.50, rely=0.45, anchor=tk.CENTER)
        progress_bar.place(relx=0.5, rely=0.5, relwidth=0.80,  anchor=tk.CENTER)

        for ind in df.index:
            values = [str(v) for v in df.loc[ind, columns].values]
            self.tree_view.insert("", tk.END, text=ind+1, values=values)
            if ind % chunk_size == 0:
                self.update_idletasks()
                progress += step
                progress_bar["value"] = progress 

        progress_bar["value"] = progress
        self.update_idletasks()

        progress_bar.destroy()
        label.destroy()
        self.tree_view.pack(expand=True, fill='both')
        #self.tree_view['show'] = 'headings'


class Application(ttk.Frame):
    def __init__(self, main_window):
        super().__init__(main_window)
        main_window.geometry("600x500")
        self.button = tk.Button(self, text="Cargar datos", command=self.on_button_clicked)
        self.button.pack()
        self.treeview = DataFrameTreeView(self)
        self.treeview.pack(expand=True, fill='both')


    def on_button_clicked(self):
        self.button.configure(state=tk.DISABLED)
        columns_headers = [f"Columna {n}" for n in range(1, 19)]
        dataframe = pd.read_excel("EJEMPLO.xlsx")
        self.treeview.load_table(dataframe, columns_headers=columns_headers)
        self.button.configure(state=tk.NORMAL)


if __name__ == "__main__":
    root = tk.Tk()
    app = Application(root)
    app.pack(expand=True, fill='both')
    root.mainloop()

What shows us something like this:

In my case, for the 5000 rows 18 columns takes approximately what is shown in the GIF, about 3-4 seconds.

If the reading of 100 rows between updates causes appreciable freezes while the bar is displayed, the number of rows loaded per update can be lowered (parameter chunk_size of method load_table ). Although as mentioned above, the priority must always be to maximize the efficiency of the cycle in each iteration.

The use of update_idletask means that pending callabacks are not processed, so the user can not immediately interact with the app during loading. This is the surest way to do this, however to allow switching between tabs, maximized, click on widgets, etc. you can use the update method instead of update_idletasks that also processes outstanding callbacks, but you have to be very careful in this case since you can fall into nested or even infinite cycles because of processed callbacks while loading the treeview. The generation of Treeview is not asynchronous in any case, if while loading (using update ) another callback is executed the load will stop until that callback is processed and then resumed again. As I say, you can only interact with the widgets in the main thread (this is not exclusive to Tk , it also happens with any framework that uses OpenGL) which prevents the asynchronous creation of widgets. This is one of the reasons why it emphasizes being careful when allowing the user to interact with the app while loading the widget.

If there are blocking operations with the DataFrame data itself that take time to process, this is likely to be moved to another thread or process, everything that is not interacting with a widgets is directly.

    
answered by 06.09.2018 / 17:20
source