how to iterate row by row in pandas

Question

how to iterate row by row in pandas

Navigation

#1 by (0 votes)

0

How can I iterate row by row in a dataframe using python: For example, I have the following data:

canciones={'albun':['cacho','beto','pedrito','loshermanos'],
       'ano':[1992,1998,1994,1993],
       'tiempo':['00:22:04','00:42:02','00:23:33','00:44:33']}

Now I want to know how to do to iterate row 1 or maybe 2 or maybe 1 and two:

  for i en fila 1:
     if fila 1 elemento 1(osea la de la columna 1 y despues hasta columna n) es > 20:
            guardame en ese espacio ('no')
     elif:
        si es <20 y >10:
         guardame en ese espacio ('si')
     else
     dejalo como esta

and maybe not just grabbing a row but row 10, 20, 40 thank you I hope you can help me

python pandas

asked by Emanuel Lemos 03.11.2018 в 20:00

source

1 answer

How to create a scheduled CRON task in linux How do I print lists with objects?

score 0 · Answer 1

From the example that you propose, you do not understand what you are trying to do, because you start by showing a dictionary whose data are strings, or years, or durations of time, but then you pose a pseudocode in which you iterate by rows (which would be the dataframe? I understand that the one created from the dictionary) and you do numerical operations, comparing with 20, 10 and similar things that do not appear among your data.

In any case I indicate that Pandas is designed so that you do not have to iterate practically ever, because it provides very high level methods to operate "at the same time" with all the elements of a column or a dataframe. In reality, obviously, pandas must loop through the elements of the dataframe to achieve that functionality, but it saves you having to do it yourself. In many occasions a single line of code does what you need.

Since your example is not well understood, I propose another. I will create a dataframe with random numbers that follow a normal distribution:

import pandas as pd
import numpy as np

def numeros():
  return np.random.randn(10)

df = pd.DataFrame({"c1": numeros(), "c2": numeros(), "c3": numeros()})
print(df)

What would this result:

         c1        c2        c3
0  0.933062 -0.331660  0.570088
1 -1.056521  0.653821 -0.715223
2  0.415285  0.580467  0.275368
3  0.603351  1.259974  1.532510
4 -1.494285 -1.446740  1.590340
5 -0.462880  0.657413 -0.086055
6 -1.243113 -0.016631 -0.451884
7  0.968619 -0.729009  0.176846
8 -0.756221  0.502987  0.573067
9  1.079186 -1.599314  1.275140

Now suppose I want to change all the negatives by the word "no", and of the positives change by the word "yes" those that are less than 1.

I start by writing a function that makes that "transformation" to a single data that it receives as a parameter, returning the result of that transformation. As simple as this:

def transformar(n):
  if n<0:
    return "no"
  if n<1:
    return "si"
  return n

And now is when the Pandas power comes into play. Using your applymap() method you can apply that function to all the elements of the dataframe, in a single line and without writing loops:

resultado = df.applymap(transformar)
print(resultado)

And it comes out:

        c1       c2       c3
0       si       no       si
1       no       si       no
2       si       si       si
3       si  1.25997  1.53251
4       no       no  1.59034
5       no       si       no
6       no       no       no
7       si       no       si
8       no       si       si
9  1.07919       no  1.27514

This way of working not only allows you to write less code, but it is also much more efficient, because the loops Pandas uses internally to traverse the data are implemented in C and run much faster than python loops than you. you pretended.

If you use functions lambda you can save yourself writing the function transformar , but I think it is more readable in this case to use an external function. By reference, with lambda it would look like this:

df.applymap(lambda n: "no" if n<0 else "si" if n<1 else x)