Sort CSV Python values

0

I have the following code that creates a CSV with data obtained from another CSV. In this image I die my results:

Now, what I'm looking for is that the new csv instead of recording with all the data, first I ordered the data from highest to lowest in terms of impressions and only show me the top 20. Something like this:

Code:

import csv
input_file = 'report_2017_12_11_12_31_19UTC.csv'
output_file= "All_Data_Tags.csv"

with open(input_file) as csvfile, open(output_file,  "w") as output:
    reader = csv.DictReader(csvfile)
    cols = ("domain","ddomain","opportunities", "impressions", "fillRate", "DATA")
    writer = csv.DictWriter(output, fieldnames=cols, extrasaction='ignore')

    writer.writeheader()
    for row in reader:
        row['fillRate'] = '{:.2f}'.format(float(row['fillRate']) * 100)
        if row['ddomain']  == "":
            if row['domain']  == "":
                row['ddomain'] = "App"
                row['domain'] = " "
        if row['domain'] == row['ddomain']:
            row['domain'] = "Real Site"    
        if row['domain']  == "":
            row['domain'] = "Detected Only"
        if row['ddomain']  == "":
            row['ddomain'] = "Vast Media"
        if row['ddomain'] != row['domain']:
            if row['ddomain'] != "Vast Media":
                if row['domain'] != "Real Site":
                    if row['domain'] != "Detected Only":
                        if row['ddomain'] != "App":
                            row['DATA'] = "FAKE"
                        else:
                            row['DATA'] = "OK"
                    else:
                        row['DATA'] = "OK"
                else:
                    row['DATA'] = "OK"
            else:
                row['DATA'] = "OK"

        writer.writerow(row)
    
asked by Martin Bouhier 11.12.2017 в 14:07
source

2 answers

1

In this way with pandas I achieved what I was looking for. Greetings

import pandas as pd 


movies = pd.read_csv('Top20_Media_Yesterday.csv')

movies = movies.sort_values(['impressions'], ascending=False)

movies = movies.to_csv("Top20_Media_Yesterday.csv")

movies = pd.read_csv('Top20_Media_Yesterday.csv', nrows=21)

movies = movies.to_csv("Top20_Media_Yesterday.csv")
    
answered by 11.12.2017 в 15:51
0

You must first store the csv in memory, preferably in a list (orderly and mutable), then apply list.sort to sort by the column you want and finally iterate only over the first 20 rows of the list.

import csv
import operator


input_file = 'report_2017_12_11_12_31_19UTC.csv'
output_file= "All_Data_Tags.csv"

with open(input_file) as csvfile, open(output_file,  "w") as output:
    reader = csv.DictReader(csvfile)
    cols = ("domain","ddomain","opportunities", "impressions", "fillRate", "DATA")
    writer = csv.DictWriter(output, fieldnames=cols, extrasaction='ignore')
    rows = sorted(reader, reverse=True, key=operator.itemgetter('impressions'))[:20]

    writer.writeheader()
    for row in rows:

        # Resto del código igual

The key is in:

sorted(reader, reverse=True, key=operator.itemgetter('impressions'))[:20]

That returns a list ordered in descending order ( reverse=True ) based on the column impressions (key impressions of each dictionary that makes up each row and that is returned by operator.itemgetter ).

If you need more performance, carry the code to use Pandas.

    
answered by 11.12.2017 в 15:51