Since what you want to do is a bit complex, I can not figure out how to achieve it through "vectorized" operations (so to speak), that is, functional programming like map()
, assign()
, etc. that operates with all the rows of the dataframe at the same time.
It only occurs to me to iterate through the rows and do the conversion "by hand", which is not very efficient and may take time if your dataframe is very large.
Based on the example you provide and assuming that all rows meet the requirement that they contain as many columns with non-zero value (from the fourth column, which is "C5") as comma-separated elements exist in the column "PERCENTAGE", the following code would do what you ask.
I have commented, because it is not obvious to see how it works:
from collections import OrderedDict
# Crearemos una lista en la que cada elemento será una fila del
# dataframe que queremos obtener como resultado. Cada una de esas filas
# será de momento un diccionario (OrderedDict)cuyas claves son los nombres
# de las columnas y los valores los de las correspondientes celdas
new_rows = []
# Iteramos sobre cada fila del dataframe original
for indice, data in df.iterrows():
# Del campo PERCENTAGE separamos por las comas y convertimos
# a float cada trozo. El resultado es una tupla con tantos elementos
# como porcentajes hallados
percts = tuple(float(p) for p in data.PERCENTAGE.split(","))
# El diccionario es de tipo OrderedDict() para que las columnas
# salgan al final en el orden que me interesa
extraido = OrderedDict()
# Insertamos las "columnas" en el orden deseado
extraido["total"] = data["total"]
extraido["run"] = data["run"]
# Contador de "zonas" a crear
contador = 1
for c in data.index[4:]:
# Para cada columna a partir de la cuarta...
if data[c]>0:
# Si esa columna tiene un dato distinto de cero, hemos encontrado
# una zona. Añadimos al diccionario los datos para esa zona
extraido["Zone{}".format(contador)] = c
extraido["Area{}".format(contador)] = data[c]
extraido["Pct{}".format(contador)] = percts[contador-1]
contador += 1
# Añadimos este diccionario como nueva "fila" al resultado
new_rows.append(extraido)
When executing the previous code (being df
your input dataframe), we will obtain a list of dictionaries like this:
>>> new_rows
[OrderedDict([('total', 0.59),
('run', 0.4),
('Zone1', 'M1'),
('Area1', 0.401),
('Pct1', 100.0)]),
OrderedDict([('total', 0.88),
('run', 0.6),
('Zone1', 'M1'),
('Area1', 0.602),
('Pct1', 100.0)]),
...
This will be converted to the final dataframe, with the precaution of using fillna(0)
to fill in zero all non-existent values, and changing the name of the index to be "OBJ" as in the input dataframe:
resultado = pd.DataFrame(new_rows).fillna(0)
resultado.index.name = "OBJ"
The result is:
total run Zone1 Area1 Pct1 Zone2 Area2 Pct2 Zone3 Area3 Pct3
OBJ
0 0.59 0.40 M1 0.401 100.000 0 0.00000 0.00 0 0.00000 0.0
1 0.88 0.60 M1 0.602 100.000 0 0.00000 0.00 0 0.00000 0.0
2 0.52 0.51 M1 0.511 100.000 0 0.00000 0.00 0 0.00000 0.0
3 1.05 0.98 M1 0.979 100.000 0 0.00000 0.00 0 0.00000 0.0
4 0.58 0.66 C5 0.001 0.088 M1 0.65600 0.12 R7A 0.00096 99.0
5 0.55 0.88 M1 0.878 0.620 R7A 0.00556 99.37 0 0.00000 0.0
6 1.04 0.86 M1 0.851 0.460 R7A 0.00400 99.53 0 0.00000 0.0