I assume you want each row to be a territorial unit (district, neighborhood, etc.) and each column an attribute or variable measured in that territorial unit. This would be (approximately) tidydata, in Wickham's sense. If you are looking for another structure for the data you should edit your question to make it explicit.
There is a first approach to the problem, which at least advances the work somewhat by producing a transposed data.frame and with reasonably appropriate names. Definitely requires more cleaning, better names, etc. Clearing Excel files is very complicated because each one has the structure given by who did it and an ad hoc solution is always necessary, so you have a few hours ahead;)
Clarifications
I had to transform the file out of R to xlsx, I could not read it as .xls
The script works with the first excel sheet (CENTER). However if works well and -mainly- if all the sheets have the same structure, you could reproduce it on each sheet and then merge each part into a complete data.frame.
Code
The most important function is transpose
of data.table
. The rest is handling row and column names.
library(readxl)
library(data.table)
datos <- read_excel("excel_maldito/IndicadoresDistritos2017_2.xlsx", sheet = "CENTRO", skip = 1) #Importante skip = 1 para mantener nombres de columna.
datos <- rbind(names(datos), datos) #Paso los nombres de columna como primera fila, si no transpose los pierde.
datos_t <- data.table::transpose(datos) #Transpongo los datos: ahora las columnas son filas.
colnames(datos_t) <- datos_t[1, ] #Paso primera fila a nombres de columna, de este modo los conservé
datos_t <- datos_t[2:nrow(datos_t),] #Y elimino esa primera fila, ahora es redundante
datos_t_limpio <- datos_t[ ,!sapply(lapply(datos_t, is.na), all)] #Elimino todas las columnas que solo contienen NA.
names(datos_t_limpio)[307] <- "Capítulo Presupuestario" #En en los datos produce un NA
datos_t_limpio <- datos_t_limpio[1:16, ] #ATENCION: podría fallar en otras horas con más o menos unidades territoriales
Earrings
-
Excel has absolute values and proportions. A decision should be made about what to do with it. If the proportions can be calculated from the absolute values they could be eliminated.
-
I'm not sure that the budget data has the same structure as the rest. It should be verified and deleted / reformatted depending on the case.
-
I'm not sure that they paint the rows X__10
to X__12
. I think they are repeated or have irrelevant information, that's why I eliminate them, but I would have to verify it carefully and see what happens in the remaining sheets that may have more or less neighborhoods .
Then you tell me if you approach the solution. Clearing excels is painful, luck and patience.