I am creating a Dataframe with the following information:
import numpy as np
import pandas as pd
from time import time
start_time = time()
columns = 60
Data = pd.DataFrame(np.random.randint(low=0, high=10, size=(700000, 3)), columns=['a', 'b', 'c'])
Data['f'] = (Data.index % 60) + 1
Data['column_-1'] = 100
for i in range(columns):
Data['column_' + str(i)] = np.where( # Condicion 1
Data['f'] == 1,
1000 + i,
np.where( # Condicion 2
i < Data['f'],
0,
np.where( # Condicion 3
Data['a'] > Data['b'],
Data['column_' + str(-1)] * Data['c'],
Data['column_' + str(-1)]
)
)
)
elapsed_time = time() - start_time
print("Elapsed time: %.10f seconds." % elapsed_time)
Elapsed time: 1.0710000992 seconds.
I want to know if there is a better way to do it, generating the columns dynamically and improving the speed of the script, thanks.