Working with parameters of a table using Pandas

1

I have a .csv file that contains this type of data:

Nombre   Curso     Nota  
Miguel   Primero   8
Juan     Primer    2
Pedro    Segundo   6
Luisa    Primero   6
Teresa   Primero   3
Pepe     Segundo   5
Ana      Segundo   6
Natalia  Segundo   4
Maria    Primero   3

How do I get, for example, the average of the grade of the first grade? And to get the total number of the second grade?

To get the amount I have tried something like that but it gives me an error, and for the average I think that if the total amount can be divided with its summation.

print len(archivo.Curso=='Segundo')

Greetings and thanks

    
asked by NEA 14.11.2018 в 14:58
source

3 answers

1

I do not know how your code is at this moment, but the steps you should follow are:

  • Import the file with Pandas: This way you will have an object of type Frame, that will give you eases for indexing and filtering
  • Use numpy operators for fixes: Once you have filtered the section of the table you need, you can operate the values with the operators numpy.mean (), numpy.sum (), etc.
answered by 14.11.2018 в 15:47
1

First of all let's reproduce your example in a dataframe of pandas:

import pandas as pd
from io import StringIO

txt = """
Nombre   Curso     Nota  
Miguel   Primero   8
Juan     Primer    2
Pedro    Segundo   6
Luisa    Primero   6
Teresa   Primero   3
Pepe     Segundo   5
Ana      Segundo   6
Natalia  Segundo   4
Maria    Primero   3
"""

df = pd.read_table(StringIO(txt), sep="\s+")

One possibility is to group and calculate the two metrics: the mean and the quantity. This can be easily solved using groupby() , with .agg() we apply the aggregation functions, in our case mean and count , with .reset_index() we make that Curso is one more column and not an index:

totales = df.groupby(['Curso'])['Nota'].agg(['mean', 'count']).reset_index()
print(totales)

     Curso  mean  count
0   Primer  2.00      1
1  Primero  5.00      4
2  Segundo  5.25      4

To then access the individual values of each cell, you can do the following

print("La media de Primero es   : {0}".format(totales.loc[totales['Curso'] == 'Primero', 'mean'].values[0]))
print("La cantidad de Segundo es: {0}".format(totales.loc[totales['Curso'] == 'Segundo', 'count'].values[0]))

La media de Primero es   : 5.0
La cantidad de Segundo es: 4
    
answered by 14.11.2018 в 16:03
0

As you said @gustavovelascoh, reading the csv with pandas makes it easier to do operations with your columns.

Here is an example of what you can do with it, but it is best to keep an eye on the Pandas documentation:

link

Also in it you find a 'Cookbook' with very useful short examples:

link

Example:

import pandas as pd
df = pd.read_csv('notas.csv') #Aquí tienes la opción de indicar un sparador concreto de tus datos, ejem: df = pd.read_csv('notas.csv', sep=';')
notas = df['Nota'] #Extrar la columna 'Notas'
mean = notas.mean() #Hacer la media de las notas
num_segundo = len(df[df.Curso == 'Segundo']) #Ver el número de alumnos de segundo
    
answered by 14.11.2018 в 16:21