From the example that you propose, you do not understand what you are trying to do, because you start by showing a dictionary whose data are strings, or years, or durations of time, but then you pose a pseudocode in which you iterate by rows (which would be the dataframe? I understand that the one created from the dictionary) and you do numerical operations, comparing with 20, 10 and similar things that do not appear among your data.
In any case I indicate that Pandas is designed so that you do not have to iterate practically ever, because it provides very high level methods to operate "at the same time" with all the elements of a column or a dataframe. In reality, obviously, pandas must loop through the elements of the dataframe to achieve that functionality, but it saves you having to do it yourself. In many occasions a single line of code does what you need.
Since your example is not well understood, I propose another. I will create a dataframe with random numbers that follow a normal distribution:
import pandas as pd
import numpy as np
def numeros():
return np.random.randn(10)
df = pd.DataFrame({"c1": numeros(), "c2": numeros(), "c3": numeros()})
print(df)
What would this result:
c1 c2 c3
0 0.933062 -0.331660 0.570088
1 -1.056521 0.653821 -0.715223
2 0.415285 0.580467 0.275368
3 0.603351 1.259974 1.532510
4 -1.494285 -1.446740 1.590340
5 -0.462880 0.657413 -0.086055
6 -1.243113 -0.016631 -0.451884
7 0.968619 -0.729009 0.176846
8 -0.756221 0.502987 0.573067
9 1.079186 -1.599314 1.275140
Now suppose I want to change all the negatives by the word "no", and of the positives change by the word "yes" those that are less than 1.
I start by writing a function that makes that "transformation" to a single data that it receives as a parameter, returning the result of that transformation. As simple as this:
def transformar(n):
if n<0:
return "no"
if n<1:
return "si"
return n
And now is when the Pandas power comes into play. Using your applymap()
method you can apply that function to all the elements of the dataframe, in a single line and without writing loops:
resultado = df.applymap(transformar)
print(resultado)
And it comes out:
c1 c2 c3
0 si no si
1 no si no
2 si si si
3 si 1.25997 1.53251
4 no no 1.59034
5 no si no
6 no no no
7 si no si
8 no si si
9 1.07919 no 1.27514
This way of working not only allows you to write less code, but it is also much more efficient, because the loops Pandas uses internally to traverse the data are implemented in C and run much faster than python loops than you. you pretended.
If you use functions lambda
you can save yourself writing the function transformar
, but I think it is more readable in this case to use an external function. By reference, with lambda it would look like this:
df.applymap(lambda n: "no" if n<0 else "si" if n<1 else x)