Read CSV with thousands of records and insert into batch database

1

I need to read and insert a CSV of more than 100,000 records. I have achieved that if it is of a small size of rows I do what I want, but when I work with the size that I mention, it blocks the page.

My idea was to do this in batches, for example 500 in 500 records. How could I do it? What should I modify?

What I have so far:

jQuery.ajax({
            url: 'procesarcsv.php',
            type: 'POST',
            dataType: 'html',
            data: formdata,
            async: true,
            success: function(response){
                alert(response);                
            },
            error: function(jqXHR, textStatus, errorThrown){
                alert("error "+ errorThrown);
            }
        });

This call ajax, calls a .php. I imagine I should repeat this call as many batches as possible

public function procesarcsv(){

        $batchsize = 10;
        $handle = fopen("/carpeta/micsv.csv","r");

        for ($i=0; $i < $batchsize; $i++) {     

            $line = fgetcsv($handle);    
            $col1 = $line[0];
            $col2 = $line[1];
            $col3 = $line[2];
            $col4 = $line[3];
            $col5 = $line[4];
            $col6 = $line[5];
            $col7 = $line[6];
            $json = "'$col1', '$col2', '$col3', '$col4', '$col5', '$col6', '$col7'";
            echo "JSON = ". $json . "<br>";
            $this->insertIntoDDBB($json); //va insertando cada línea en la bbdd

        }
}

How can I read and batch insert a CSV with thousands of records?

    
asked by Norak 18.09.2018 в 18:13
source

1 answer

0

The most basic way to implement it is to have the program read the first N rows "false", only to go forward the file pointer. Then we have to handle 3 variables:

  • Variable $offset : how many rows to skip
  • Variable $batchsize : how many rows to process
  • Variable $ultima_fila : the last row to be read is inferred from the other two.

And 3 states:

  • Status "I omit the line" when I have not reached $offset
  • Status "process the line" when I reached the $ offset y he procesado menos de $ batchsize '
  • Status "I leave the loop" when I reached $ultima_fila

Let's say that your class has a procesaFila method where it generates the variable $json , to simplify the example.

public function procesarcsv($offset = 0, $batchsize = 10){
  $position=0;
  $ultima_fila = $offset + $batchsize;

  $handle = fopen("/carpeta/micsv.csv","r");
  while( ($line = fgetcsv($handle)) !== FALSE ) {
     if($position <= $offset) {
       continue; // las primeras N iteraciones nos saltamos el resto
     }
     $json = $this->procesaFila($line);
     echo "JSON = ". $json . "<br>";
     $this->insertIntoDDBB($json);
     $position++;
     if($position >= $ultima_fila ) {
       break; // si ya pasamos el límite 
     }
  } 
  fclose($handle);
}   

The method is not very efficient. I'm going to try something better and edit the answer.

Edit: Here a more efficient way

I'm going to use linux commands. If you are on windows this alternative will not serve you directly.

Basically, in Linux we can take a file of 100,000 rows and get the rows 10000 to 10100 by doing

head -10100 archivo.csv | tail -100

Basically: take the first 10100 lines and pass them to tail, which will read only the last 100.

Then, using the same previous footprint

public function procesarcsv($offset = 0, $batchsize = 10){
  $position=0;
  $ultima_fila = $offset + $batchsize;

  $cmd = sprintf('head -%s /carpeta/micsv.csv | tail -%s', $ultima_fila, $batchsize);

  $descriptorspec = [
    0 => ['pipe', 'r'], // stdin
    1 => ['pipe', 'w'], // stdout
    2 => ['pipe', 'w'], // stderr
  ];

  $process = proc_open($cmd, $descriptorspec, $pipes);

  if (is_resource($process)) {
      while ($s = fgets($pipes[1])) {
        $line = str_getcsv($s);
        $json = $this->procesaFila($line);
        echo "JSON = ". $json . "<br>";
        $this->insertIntoDDBB($json);
      }
  }
  fclose($pipes[0]);
  fclose($pipes[1]);
  fclose($pipes[2]);
  proc_close($process);
}
    
answered by 18.09.2018 в 22:18