A question recently , that did not specify with what language / technology I wanted to solve it, it impelled me to think how I would do it with Pandas.
The question can be rephrased in the Pandas context as follows.
Given a dataframe such as the one I build here:
import pandas as pd
datos = [["jose", "manzana"],
["andres","pera"],
["luis", "pera"],
["jose", "manzana"]]
df = pd.DataFrame(datos, columns=["Nombre", "Fruta"])
print(df)
What looks like this:
Nombre Fruta
0 jose manzana
1 andres pera
2 luis pera
3 jose manzana
I want to get another one in which the index is the names (without repetitions), the columns the fruits, and the cells the number of times each name appears with each fruit. That is:
Fruta manzana pera
Nombre
andres 0 1
jose 2 0
luis 0 1
My attempt
It seems that this is asking for pivot_table()
, but in order to solve it I have not come up with another way to add an extra column filled with 1
to the original dataframe:
>>> print(df.assign(Cuantas=1))
Nombre Fruta Cuantas
0 jose manzana 1
1 andres pera 1
2 luis pera 1
3 jose manzana 1
To be able to use that column as a value to add in pivot_table
. My solution therefore remains as follows:
sol = df.assign(Cuantas=1).pivot_table(
index="Nombre",
columns="Fruta",
values="Cuantas",
aggfunc="sum",
fill_value=0)
and this effectively gives the desired result, but I find it a bit cumbersome.
The question is Is there another easier way? .