Sort CSV Matrix by Python date

0

I am doing a small development, in which I read from a CSV file in this way:

    csv = np.genfromtxt('MMRExport.csv',delimiter=",", dtype=str)

Then what I do is create lists since I only need 2 columns of the file, one of "date" and another of "message". The list of messages left the same, but the date as it was string transformed it to date as follows:

   Date[i]= datetime.strptime(Date[i], '%d-%b-%Y')

In this way, the positions of the elements "message" and "date" are the same, but in different lists.

But today I just realized that I should order the dates more and more (from the oldest to the newest). My question is can I make a matrix that has the "message" and its "date" on one side, so when ordering the dates the messages are also ordered, or there is some way of ordering the "dates" but without losing the order of the "messages"?

    
asked by Jorge Ponti 05.07.2017 в 15:08
source

1 answer

1

By your example I understand that you read the CSV in a numpy.array , but then it would seem that the same weapon has two separate lists for mensaje and fecha . Effectively order one of the lists does not automatically order the other, so it is best to keep both data together in a list of lists. I show you an example based on your question

First we put together a numpy.array by way of example, with data similar to the ones you mention, more or less what you would get from csv = np.genfromtxt('MMRExport.csv',delimiter=",", dtype=str)

import numpy as np
import datetime
import pprint

def dt (s):
  return datetime.datetime.strptime(s, '%Y%m%d')

x = np.array([
              ["mensaje 1", dt('20170121')],
              ["mensaje 2", dt('20170101')],
              ["mensaje 3", dt('20170131')]
            ])

For what you comment at some point you go to work with two Python lists of the data just read, it is not necessary to do that, you can manage a single list where each element is another list with both fields. Using numpy you can do the following:

lista = x.tolist()
pprint.pprint(lista)

With what you get the following:

[['mensaje 1', datetime.datetime(2017, 1, 21, 0, 0)],
 ['mensaje 2', datetime.datetime(2017, 1, 1, 0, 0)],
 ['mensaje 3', datetime.datetime(2017, 1, 31, 0, 0)]]

Clarification: I'm using pprint because it formats much better particularly lists and similar objects.

This is very simple to order, and Jose Hermosilla Rodrigo gave you a guideline in a comment:

# Sort sencillo sobre una lista de listas 
lista.sort(key = lambda x : x[1])
pprint.pprint(lista)

The function sort is passed an anonymous function that determines the data that will be used to order, in our example the column of the date (index = 1). It is important to note that the sort is "in place", that is, the list is directly ordered, if you need to keep the original order you must make a copy before ordering or using sorted . The final result is the complete list sorted by date:

[['mensaje 2', datetime.datetime(2017, 1, 1, 0, 0)],
 ['mensaje 1', datetime.datetime(2017, 1, 21, 0, 0)],
 ['mensaje 3', datetime.datetime(2017, 1, 31, 0, 0)]]

Finally, if you decide to do everything from the object numpy you also have this possibility:

x = x[x[:, 1].argsort()]
pprint.pprint(x)

In this case the order is not "in place" so we make an assignment to keep it, the output is somewhat different but the order is the same

array([['mensaje 2', datetime.datetime(2017, 1, 1, 0, 0)],
       ['mensaje 1', datetime.datetime(2017, 1, 21, 0, 0)],
       ['mensaje 3', datetime.datetime(2017, 1, 31, 0, 0)]], dtype=object)
    
answered by 05.07.2017 / 18:36
source