Mutable fields in Dataclasses, prevent them from being shared by all instances

1

As a practice, I designed a database with two types of objects. Each of them has several fields, and some of them are instances of classes that I have created for use. As I add objects to the database, I realize that these fields, instead of being individual for each instance, are shared.

Specifically, this field is an instance of a Historial class that accumulates notes and some changes. But all instances of the same type ( ObjVs ) share the same notes ... I am finding it hard to find and understand the problem.

The following is a small version of the program that includes the classes involved.

class ObjNota:
    '''Objeto básico que contiene una nota.'''
    def __init__(self, text:str, tag:str):
        self.nota={"text":text, "tag":tag, "fecha":time.asctime()}


class ObjNotas:
    '''Objeto de control para las notas.'''
    def __init__(self):
        self.notas=[]

    # Omito métodos que gestionan las notas.

class ObjHistorial(ObjNotas):
    '''Objeto que gestiona la antiguedad, las notas y las vacaciones.'''
    def __init__(self):
        super().__init__() #self.notas = []
        self.antiguedad = None


@dataclass
class ObjVs:
    ''' objeto operarios '''
    nombre:str
    id:int = None           # Nº identificación de empresa
    telf:str = None
    movilidad:str = None    # Fijo en servicio, correturnos o sin servicio.
    tip:int = None          # Nº tarjeta interprofesional
    servicio_asignado:str = None
    historial:ObjHistorial = ObjHistorial()

The idea is to add operators to the database, with most of their fields empty, and then edit them. For what I am interested that the histories are initialized objects but without data.

    
asked by BigfooTsp 04.08.2018 в 12:50
source

2 answers

2

Indeed, if we create two objects we can easily see what you mean:

>>> a = ObjVs("Pepe")    
>>> b = ObjVs("Maria")  

>>> id(a.historial.notas)
139639690892168
>>> id(b.historial.notas)
139639690892168

>>> a.historial.notas.append("Hola")
>>> a.historial.notas
['Hola']
>>> b.historial.notas
['Hola']

But the problem is not ObjVs.historial.notas , the problem is ObjVs.historial , instance of ObjHistorial :

>>> id(a.historial)
139639701356160
>>> id(b.historial)
139639701356160

This problem is because you have inadvertently fallen into one of the most common "anti-patterns" in Python, use mutable objects as default arguments .

Keep in mind that Python stores the default values of the member variables as class attributes , which causes the previous practice to cause all instances of ObjVs to use the same instance of ObjHistorial . Keep in mind that the __init__ equivalent to your dataclass would be something like this:

class ObjVs:

    def __init__(self,
                 nombre: str,
                 id: Optional[int] = None,
                 movilidad: Optional[str] = None,
                 telf: Optional[int] = None,
                 servicio_asignado: Optional[str] = None,
                 historial: ObjHistorial = ObjHistorial()) -> None:

        self.nombre  = nombre
        self.id = id 
        self.telf = telf
        self.movilidad = movilidad
        self.telf = telf 
        self.servicio_asignado = servicio_asignado
        self.historial = historial

It's very different from doing in __init__ :

self.historial = ObjHistorial() 

or a correct implementation of a mutable default parameter in a function or method:

from typing import Optional

class ObjVs:
    def __init__(self, historial: Optional[ObjHistorial] = None) -> None:
        self.historial = ObjHistorial() if historial is None else historial

in which case you would have an instance of ObjHistorial for each instance of OBjVs .

There is a way to handle mutable objects as default arguments in dataclasses by using field , which allows you to customize each field of a dataclass individually. It supports the following parameters:

  • default : Default value of the field.
  • default_factory : callable without arguments (we can use functools.partial if necessary) that returns the initial value of the field. It should never be used next to default . Even though init is defined as False the field will be passed to __init__ because it is the only way to assign an initial value.
  • init : Enable the use of the field in the __ init __ () method (The default is True ).
  • repr : enable the use of the field in the generation of the chain by the __repr__ method (the default is True ).
  • compare : Include the field in the methods responsible for implementing the comparisons and equality tests for the objects, such as __eq__ (The default value is True ).
  • hash : Include the field when calculating hash() . (By default it uses the same value as compare ).
  • metadata : mapping (or None ) with information about the field.

We are interested in the default_factory parameter:

from dataclasses import dataclass, field
from typing import Optional


class ObjNota:
    '''Objeto básico que contiene una nota.'''
    def __init__(self, text: str, tag: str):
        self.nota={"text":text, "tag":tag, "fecha":time.asctime()}


class ObjNotas:
    '''Objeto de control para las notas.'''
    def __init__(self):
        self.notas = []

    # Omito métodos que gestionan las notas.

class ObjHistorial(ObjNotas):
    '''Objeto que gestiona la antiguedad, las notas y las vacaciones.'''
    def __init__(self):
        super().__init__() #self.notas = []
        self.antiguedad = None


@dataclass
class ObjVs:
    ''' objeto operarios '''
    nombre: str
    id_: Optional[int] = None           # Nº identificación de empresa
    telf: Optional[str] = None
    movilidad: Optional[str] = None     # Fijo en servicio, correturnos o sin servicio.
    tip: Optional[int] = None           # Nº tarjeta interprofesional
    servicio_asignado: Optional[str] = None
    historial: ObjHistorial = field(default_factory=ObjHistorial)

Now everything works as it should:

>> id(a.historial)
139892733289248
>>> id(b.historial)
139892733289360

>>> a.historial.notas.append("Hola")
>>> a.historial.notas
['Hola']
>>> b.historial.notas
[]

This allows you to pass an instance of ObjHistorial or None when instantiating ObjVs ( a = ObjVs("Juan", historial=ObjHistorial() )). If you do not want it to be a parameter, you can use the __post_init__ method which is executed immediately after the __init__ :

@dataclass
class ObjVs:
    ''' objeto operarios '''
    nombre: str
    id_: Optional[int] = None           # Nº identificación de empresa
    telf: Optional[str] = None
    movilidad: Optional[str] = None     # Fijo en servicio, correturnos o sin servicio.
    tip: Optional[int] = None           # Nº tarjeta interprofesional
    servicio_asignado: Optional[str] = None

    def  __post_init__(self):
        self.historial: ObjHistorial = ObjHistorial()
  

Note: I have modified the types in the dataclass to be compliant with MyPy, the rest I left the same because I lack information to assign the types properly.

    
answered by 05.08.2018 / 04:13
source
0

Thanks for the answer Jose, very useful as always ... In the end for the case that I had asked, with historial: ObjHistorial = field(default_factory=ObjHistorial) I solved it perfectly.

Now ... I had omitted it in the question, but ObjVs also has a field direccion which is a very simple object with some attributes. Unlike Objhistorial , sometimes I'm interested in initializing 'empty' when instancio ObjVs and other times with a string as an argument ... but default_factory does not allow it. In the end I have resorted to solve it in _post_init_ so that it adapts according to the case and works well, although maybe there is some better way to do it.

class ObjDireccion:
    def __init__(self, direccion=None):
        self.dir = direccion
        self.coordenadas = None

@dataclass
class ObjVs:
    nombre:str
    direccion:ObjDireccion = None

    def __post_init__(self):
        if self.direccion:
            self.direccion = ObjDireccion(self.direccion)
        else:
            self.direccion = ObjDireccion()


v1 = ObjVs(nombre = 'Pedro', direccion='Barcelona')
v2 = ObjVs(nombre = 'Maria')
print (id(v1.direccion), v1.direccion.dir)
print (id(v2.direccion), v2.direccion.dir)

# 7281072 Barcelona
# 7280720 None
    
answered by 05.08.2018 в 19:57