In python 3 and Pandas, how to add data from a dictionary to a dataFrame, only if they do not exist?

0

In my Python script I have generated a dictionary with several data, which I want to add to a CSV file, only if the dictionary keys do not exist inside the file, or if they exist then they are overwritten so they do not duplicate .

How can I do this using pandas?

My script is already able to add the data to the dataFrame, but I duplicate the data.

data_dict = {'2018mbusasalesjan': [0, '2018mbusasalesjan', 2018, 'January', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalesfeb': [1, '2018mbusasalesfeb', 2018, 'February', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalesmar': [2, '2018mbusasalesmar', 2018, 'March', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalesapr': [3, '2018mbusasalesapr', 2018, 'April', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalesmay': [4, '2018mbusasalesmay', 2018, 'May', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalesjun': [5, '2018mbusasalesjun', 2018, 'June', 0, 0, 0, 0, 0, 0, 0, 1000, 0, 0, 0], '2018mbusasalesjul': [6, '2018mbusasalesjul', 2018, 'July', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalesaug': [7, '2018mbusasalesaug', 2018, 'August', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalessep': [8, '2018mbusasalessep', 2018, 'September', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalesoct': [9, '2018mbusasalesoct', 2018, 'October', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalesnov': [10, '2018mbusasalesnov', 2018, 'November', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], '2018mbusasalesdec': [11, '2018mbusasalesdec', 2018, 'December', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}

datos_Sales_Mbusa =pd.read_csv('mbusasales.csv', index_col = 0, encoding = 'utf-8')
datos_Sales_Mbusa.apply(lambda x: pd.api.types.infer_dtype(x.values))
num_datos = int(datos_Sales_Mbusa['indice'].count())
datos_Sales_Mbusa.index = range(datos_Sales_Mbusa.shape[0])
indice_archivo = list(datos_Sales_Mbusa.indice)
llaves = data_dict.keys()
nuevo_Valor = num_datos+1
valores = list(data_dict.values())
for i, valor in enumerate(valores):
        nuevo_valor = int(nuevo_Valor)+i
        datos_Sales_Mbusa.loc[int(nuevo_valor)]= [valor[1], valor[2], valor[3], valor[4], valor[5], valor[6], valor[7], valor[8], valor[9],
                  valor[10], valor[11], valor[12], valor[13],valor[14] ]
datos_Sales_Mbusa.to_csv('mbusasales.csv',  encoding = 'utf-8',)

The file 'mbusasales.csv contains the following data:

,indice,year,month,brocc conv,caulif conv,br spr,Sugar Snap Peas,Yellow Sq,Green Zucch,Brocc Org,Caulif Org,Carrots Org,Corn Org,Edamame Org
0,2018mbusasalesjan,2018,January,0,0,0,0,0,0,0,0,0,0,0
1,2018mbusasalesfeb,2018,February,0,0,0,0,0,0,0,0,0,0,0
2,2018mbusasalesmar,2018,March,0,0,0,0,0,0,0,0,0,0,0
3,2018mbusasalesapr,2018,April,0,0,0,0,0,0,0,0,0,0,0
4,2018mbusasalesmay,2018,May,0,0,0,0,0,0,0,0,0,0,0
5,2018mbusasalesjun,2018,June,0,0,0,0,0,0,0,0,0,0,0
6,2018mbusasalesjul,2018,July,0,0,0,0,0,0,0,0,0,0,0
7,2018mbusasalesaug,2018,August,0,0,0,0,0,0,0,0,0,0,0
8,2018mbusasalessep,2018,August,350,0,0,0,0,0,0,0,0,0,0
9,2018mbusasalesoct,2018,October,0,0,0,0,0,0,0,0,0,0,0
10,2018mbusasalesnov,2018,November,0,0,0,0,0,0,0,0,0,0,0
11,2018mbusasalesdec,2018,December,0,0,0,0,0,0,0,0,0,0,0
12,2017mbusasalesjan,2017,January,0,0,0,0,0,0,0,0,0,0,0
13,2017mbusasalesfeb,2017,February,0,0,0,0,0,0,0,0,0,0,0
14,2017mbusasalesmar,2017,March,0,0,0,0,0,0,0,0,0,0,0
15,2017mbusasalesapr,2017,April,0,0,0,0,0,0,0,0,0,0,0
16,2017mbusasalesmay,2017,May,0,0,0,0,0,0,0,0,0,0,0
17,2017mbusasalesjun,2017,June,0,0,0,0,0,0,0,0,0,0,0
18,2017mbusasalesjul,2017,July,0,0,0,0,0,0,0,0,0,0,0
19,2017mbusasalesaug,2017,August,0,0,0,0,0,0,0,0,0,0,0
20,2017mbusasalessep,2017,August,350,0,0,0,0,0,0,0,0,0,0
21,2017mbusasalesoct,2017,October,0,0,0,0,0,0,0,0,0,0,0
22,2017mbusasalesnov,2017,November,0,0,0,0,0,0,0,0,0,0,0

The column that indicates whether the data is already in the csv file is 'index' and in the dictionary is the key. If you compare the data of the dictionary with those of the file, you can see that these indexes are already in the file, therefore they should be written.

    
asked by Alejandro Gomez 31.01.2018 в 21:45
source

1 answer

1

Based on the fact that as mentioned what you do is probably not the most efficient, you can update the data by loading both the csv and the dictionary in two DataFrames, apply pandas.concat and then eliminate the duplicates keeping the rows from the dictionary:

import pandas as pd

data_dict = {'2018mbusasalesjan': [0, '2018mbusasalesjan', 2018, 'January', 125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalesfeb': [1, '2018mbusasalesfeb', 2018, 'February', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalesmar': [2, '2018mbusasalesmar', 2018, 'March', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalesapr': [3, '2018mbusasalesapr', 2018, 'April', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalesmay': [4, '2018mbusasalesmay', 2018, 'May', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalesjun': [5, '2018mbusasalesjun', 2018, 'June', 0, 0, 0, 0, 0, 0, 0, 1000, 0, 0, 0],
             '2018mbusasalesjul': [6, '2018mbusasalesjul', 2018, 'July', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalesaug': [7, '2018mbusasalesaug', 2018, 'August', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalessep': [8, '2018mbusasalessep', 2018, 'September', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalesoct': [9, '2018mbusasalesoct', 2018, 'October', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalesnov': [10, '2018mbusasalesnov', 2018, 'November', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
             '2018mbusasalesdec': [11, '2018mbusasalesdec', 2018, 'December', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}

datos_sales_mbusa = pd.read_csv('mbusasales.csv', index_col=0, encoding='utf-8')
cols = [''] + list(datos_sales_mbusa)
update = pd.DataFrame(list(data_dict.values()), columns=cols).set_index('')
res = pd.concat((datos_sales_mbusa, update))
res.drop_duplicates(subset="indice", keep='last', inplace=True)
res.reset_index(drop=True, inplace=True)
res.to_csv('mbusasales.csv',  encoding='utf-8')
    
answered by 01.02.2018 / 01:33
source