Regenerate indexes in a pandas dataframe after deleting rows

1

I have a dataframe in Pandas with 30,000 records.

len(data) 

returns 30000. At this df I delete certain rows and then, regenero (or so I think) you index them with:

data.drop(icond1,inplace=True)
data.reset_index(drop=True)
len(data)

returns 28070. Then I understand that if I do

data.index.values > 28070

should always return [False, False, ... False] and yet returns: [False, False, ..., True, True]. What do I do wrong? Does not delete the indexes old?

Any idea what I'm doing wrong? I want correlative indexes, not a df that has the same rows as before unless those that meet icond1 are inaccessible.

    
asked by isg 01.03.2018 в 16:32
source

1 answer

0

An important detail that you do not show is how you do the icond1 that you use to filter the values.

Suppose it is correctly done. The most likely source of error would be the line in which you reset the index, since that is not a " inplace " operation, but returns a new dataframe with the reconstructed index, but does not modify the original dataframe. You fix it by doing data = data.reset_index(...) , or by adding the inplace=True parameter.

As a demonstration that this works well, I attach the following example, in which I create a dataframe with a single column ("data") that contains correlative numbers from 0 to 3000. Then I use drop() to eliminate the rows that contain values between 50 (included) and 100 (excluded). The resulting dataframe will have 2950 elements. Finally I reindex.

>>> df = pd.DataFrame(list(range(3000)), columns=["data"])
>>> df.drop(df[(df.data >= 50) & (df.data < 100)].index, inplace=True)
>>> df.reset_index(drop=True, inplace=True)
>>> len(df)
2950

And what you correctly assumed is fulfilled:

>>> any(df.index.values > 2949)
False

Note: When faced with an additional user question, I clarify the following. The function reset_index() removes the values of the current index, replacing them with an integer that starts at 0 and grows correlatively. By default it creates a new dataframe (it does not alter the original) and returns it. Also by default it adds a new column to the resulting dataframe, in which it copies the values that the index had before, and calls this column "index" .

The option inplace=True is so that instead of returning a new dataframe, make the changes on the own dataframe, and the option drop=True is so that it does not create the new column called "index" with the values of the old index.

    
answered by 01.03.2018 / 16:53
source