How to load a large volume of data with Django Querysets?

3

I have a loading process in a Django project which from an excel file stores information in a Business model. My loading process works, the problem is that the execution time is too high due to the data volume of the excel input.

Model.py

class Negocio(models.Model):
    fecha_carga=models.DateField(auto_now_add=True)
    fecha_actividad=models.DateField()
    plataforma=models.CharField(max_length=50)
    tecnologia=models.CharField(max_length=50)
    terminal=models.CharField(max_length=50)
    cantidad=models.FloatField()
    plan_key=models.ForeignKey(Produccion,on_delete=models.CASCADE,null=True)
    local_key=models.ForeignKey(Local,on_delete=models.CASCADE,null=True)
    def __str__(self):
        return self.plataforma

Views.py

def Carga(dataframe):
    for row,col in dataframe.iterrows():
        fecha=col["Fecha"]
        Negocio(fecha_actividad=fecha,
                    plataforma = col['Plataforma'],
                    tecnologia = col['Tecnologia'],
                    terminal = col['Tipo Equipo'],
                    cantidad = col['Actividad'],
                    plan_key = ProduccionPlan.objects.get(fecha_produccion__year=fecha.year, fecha_produccion__month=fecha.month,plan__codigo_plan__iexact=col['Codigo Plan']),
                    local_key = Local.objects.get(codigo__iexact=col['Codigo Vendedor'])    
          ).save()

This is what at the time I came up with to store the information, it works but it is not optimal since the loading time ranges between 40 and 45 minutes for excel files of approximately 80,000 rows.

I am trying to use the F () expressions but until now I have not been successful

To read the excel feed, I'm using pandas and my DB is in SQL Server.

Any help or idea?

    
asked by Javier Valero 14.05.2018 в 17:29
source

1 answer

0

Apparently the great part of the load is taken to consult the database that you perform in each insertion. One solution could be to order the entire dataframe by date, plan code and vendor code, then you get the entities of ProduccionPlan and Local only when they change:

def Carga(dataframe):
    dataframe = dataframe.sort_values(['Fecha', 'Codigo Plan', 'Codigo Vendedor'], ascending=[False])  # Ordeno por Fecha, Codigo Plan y Codigo Vendedor
    fecha_anterior = codigo_plan_anterior = codigo_vendedor_anterior = produccion_plan_obj = local = None
    for row,col in dataframe.iterrows():
        # Me fijo si la fecha o codigo de plan cambio
        fecha = col["Fecha"]
        codigo_plan = col['Codigo Plan']
        if fecha != fecha_anterior or codigo_plan_anterior != codigo_plan:
            fecha_anterior = fecha
            codigo_plan_anterior = codigo_plan
            produccion_plan_obj = ProduccionPlan.objects.get(fecha_produccion__year=fecha.year, fecha_produccion__month=fecha.month,plan__codigo_plan__iexact=codigo_plan)

        # Me fijo si el codigo de vendedor cambia
        codigo_vendedor = col['Codigo Vendedor']
        if codigo_vendedor_anterior != codigo_vendedor:
            codigo_vendedor_anterior = codigo_vendedor
            local_obj = Local.objects.get(codigo__iexact=codigo_vendedor)

        # Aporte de @F Delgado, create es mas rapido
        Negocio.objects.create(fecha_actividad=fecha,
                    plataforma = col['Plataforma'],
                    tecnologia = col['Tecnologia'],
                    terminal = col['Tipo Equipo'],
                    cantidad = col['Actividad'],
                    plan_key = produccion_plan_obj,
                    local_key = local_obj  
        )

We also add the Negocio.objects.create as F Delgado said it is more performant than instantiating the object and saving it (we avoid the load in memory).

I hope I have been helpful!

Greetings!

    
answered by 28.05.2018 в 02:45