Group strings by date using Pandas Python


I'm working on a program, and I need to group strings that are generated on a certain date.

The form of grouping that I want is to group the same chain every time it is repeated, as long as a next repetition occurs in less than 30 days. In the case that after 30 days it is repeated again (it is a mobile account).

Until now I am reading from a CSV and I have the strings separated and the dates converted into date, as follows:

    import pandas as pd
    import numpy as np
    import numpy
    import csv
    from numpy import genfromtxt
    import numpy as np
    from datetime import datetime, timedelta
    from time import time

    df = pd.read_csv('Export.csv', header=0, sep=';')


    FDE=df["#"] +"-"+df["Text"]                          #Crear lista
    df[" Date"] = pd.to_datetime(df[" Date"])            #Convertir fechas a date


FDE is the chain that I should look for, whenever FDE is repeated and the date associated with the repeated FDE is less than 30 days, it should be counted. In cases where FDE is greater than 30 days, the count must be restarted. This way there are different stories for a type of FDE.

The file I'm reading about is the following:


Any idea or help is appreciated.

Example of output:

    CBH-G AIR = 2
    CBD-FD AIL = 2
    CBD-FFAIL = 2


asked by Jorge Ponti 10.07.2017 в 03:21

1 answer


Next I think it does what you want. I have not found a magic functionality of that allows you to do it more direct The same exists but I do not know:

import datetime as dt

import pandas as pd

# funciones de ayuda
dateparser = lambda d: dt.datetime.strptime(d, "%d-%m-%Y %H:%M")

def agrupa(data):
    groups = [1]
    delta = pd.Timedelta(days=30)
    for i in range(1,len(data)):
        d1 = data['FDE Date'].iloc[i]
        d0 = data['FDE Date'].iloc[i-1]
        print(i, d1, d0)
        if d1 - d0 > delta:
            groups.append(groups[-1] + 1)
    return groups

# Leemos los datos y los organizamos como queremos
df = pd.read_csv("kk.csv",
                 parse_dates=[6, 8, 21],

df['FDE'] = df["Tail #"] +"-"+df["FDE Text"]

# Agrupamos por 'FDE'
groups = df.groupby('FDE')

# Bucle sobre los grupos y extraemos grupos de fechas con la fn agrupa
for group in groups:
    df_tmp = group[1].sort_values('FDE Date')
    group = group[0]
    df_tmp['grupos'] = agrupa(df_tmp)

    # Esto lo tendrás que retocar para extraer la info como consideres
    print("Grupos para ", group)
    print(df_tmp.groupby('grupos')['FDE Date'].count().values)

By parts, dateparser is a function that allows you to read the csv and transform the columns of dates into dates. agrupa is used to create groups by dates within each group (see code). We read the file and create groups using groupby . Within each group we iterate over the column FDE Date to create a column of grupos that determines whether an event should be associated with one group or another.

answered by 10.07.2017 / 12:00