error when uploading with fread

0

At the end of the last variable there is extra text and the file does not upload, the following error occurs:

  

Expecting 26 cols, but line 15475 contains text after processing all cols. Try again with fill = TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep = '' and / or (unescaped) '\ n' characters within unbalanced unescaped quotes has failed. If quote = '' does not help, please file an issue to figure out if the logic could be improved.

As an example, I add a CSV file that generates a similar error

Name,Company,Serving,Calories,Fat,Sodium,Carbs,Fiber,Sugars,Protein
AppleJacks,K,1,117,0.6,143,27,0.5,15,1
Boo Berry,G,1,118,0.8,211,27,0.1,14,1
Cap'n Crunch,Q,0.75,144,2.1,269,31,1.1,16,1.3
Cinnamon Toast, Crunch,G,0.75,169,4.4,408,32,1.7,13.3,2.7

And we load it by:

df <- fread(file="test.csv")
    
asked by Diego Vargas 11.07.2017 в 18:05
source

1 answer

0

The error is quite clear, you are trying to load a file of values separated by commas where there is at least one record that does not respect the same number of columns as the rest. This may be because there are texts that include , in them, which would eventually be interpreted as a new field. The , is often not a good separator, it is preferable to use another character such as the pipe | as long as we can control the writing.

fread is a very flexible and optimal function for importing data from a file delimited by some character. In your case, the error already mentions a useful way of, at least, reviewing where the problem is, which is through the parameter: fill=TRUE , let's see:

library("data.table")
df <- fread(file="test.csv", fill=TRUE)
df

And now we can see the problem better:

             Name Company Serving Calories   Fat Sodium Carbs Fiber Sugars Protein V11
1:     AppleJacks       K       1   117.00   0.6  143.0    27   0.5   15.0     1.0  NA
2:      Boo Berry       G       1   118.00   0.8  211.0    27   0.1   14.0     1.0  NA
3:   Cap'n Crunch       Q    0.75   144.00   2.1  269.0    31   1.1   16.0     1.3  NA
4: Cinnamon Toast  Crunch       G     0.75 169.0    4.4   408  32.0    1.7    13.3 2.7
5:                                      NA    NA     NA    NA    NA     NA      NA  NA

If we see row 4 we can deduce that the text of the column "Name" should be "Cinnamon Toast, Crunch" and "Crunch" place it in the second column, seeing the displacement of the data is easier to realize.

As a rule to generate when generating delimited files and not to have later problems, we should consider:

  • Use a not-so-common character as a separator
  • Use the text field commendation, using: fwrite(df, file="test.csv", quote=TRUE)
answered by 15.07.2017 в 22:23