Optimize a MySQL query that puts the server at 100% CPU

1

I have a query that makes the MariaDB server occupy 100% CPU for more than 10 minutes and finally the page gives timeout.

The query in question is the following:

SELECT 
  formasdepago.nombre as 'Forma de pago', 
  SUM(transaccion) AS Total 
FROM contabilidad 
INNER JOIN formasdepago ON formasdepago.id = contabilidad.formadepago 
WHERE contabilidad.anulado = 0  
Group By formadepago 

The corresponding tables are the following:

Accounting :

CREATE TABLE 'contabilidad' (
    'anulado' TINYINT(1) NOT NULL DEFAULT '0',
    'cuentacorriente' INT(11) NOT NULL DEFAULT '0',
    'cuotas' DECIMAL(10,2) NOT NULL DEFAULT '0',
    'empleado' INT(11) NOT NULL DEFAULT '0',
    'fecha' DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    'formadepago' INT(11) NOT NULL DEFAULT '1',
    'id' INT(11) UNSIGNED ZEROFILL NOT NULL AUTO_INCREMENT,
    'idgasto' INT(11) NOT NULL DEFAULT '0',
    'interesfinanciero' DECIMAL(10,2) NOT NULL DEFAULT '0',
    'interesformadepago' DECIMAL(10,2) NOT NULL DEFAULT '0',
    'ticketid' INT(11) NOT NULL DEFAULT '0',
    'tipodetransaccion' VARCHAR(50) NOT NULL,
    'transaccion' DECIMAL(10,2) NOT NULL DEFAULT '0',
    'valordecuota' DECIMAL(10,2) NOT NULL DEFAULT '0',
    'valorfinal' DECIMAL(10,2) NOT NULL DEFAULT '0',
    PRIMARY KEY ('id'),
    INDEX 'fecha' ('fecha'),
    INDEX 'ticketid' ('ticketid'),
    INDEX 'idgasto' ('idgasto'),
    INDEX 'formadepago' ('formadepago'),
    FULLTEXT INDEX 'tipodetransaccion' ('tipodetransaccion')
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=13975947
;

Payment method :

CREATE TABLE 'formasdepago' (
    'id' INT(11) NOT NULL AUTO_INCREMENT,
    'habilitado' INT(1) NOT NULL DEFAULT '1',
    'eliminado' INT(1) NOT NULL DEFAULT '0',
    'simbolo' VARCHAR(5) NOT NULL,
    'diferencia' DECIMAL(11,2) NOT NULL DEFAULT '0',
    'imagen' MEDIUMBLOB NULL DEFAULT NULL,
    'nombre' VARCHAR(50) NOT NULL,
    'preciodelista' INT(1) NOT NULL DEFAULT '0',
    'numeropreciodelista' INT(5) NOT NULL DEFAULT '1',
    'financiacion' INT(5) NOT NULL DEFAULT '0',
    'imagenmini' BLOB NULL DEFAULT NULL,
    PRIMARY KEY ('id'),
    INDEX 'id' ('id'),
    INDEX 'eliminado' ('eliminado'),
    INDEX 'habilitado' ('habilitado'),
    FULLTEXT INDEX 'nombre' ('nombre')
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=8
;

Accounting has 1.5 GB of information, while payment only takes 3 records.

With such a large amount of data, I separate them by date to make a smaller query, but I do not understand why the delay is.

Could someone give me some indication of what the server is doing so that it takes so long? I do not understand, really. I hope someone helps me, thank you very much.

    
asked by Bernardo Harreguy 30.06.2018 в 22:46
source

1 answer

1

Your query can not be optimized without reducing the number of records added per date, using parallelism as I will expose at the end or another similar system.

The problem you have is that you are going through millions of records to make the sum of each and every one of them (except those that have anulado to 0, which will be few), so the time it takes the server in showing you the results will depend on the available hardware.

As a general rule you will be limited by the speed of a single core of the CPU because the query will be resolved in a single thread / process above the read speed of the hard disk or the amount of RAM.

In an environment where I have generated 8 million random records I have executed the query before and after adding anulado as an index:

SELECT
  SUM(transaccion)
FROM contabilidad
WHERE
  anulado = 0

The execution time of the query has been, in both cases, 32 seconds.

If we make a DESCRIBE it appears:

id select_type table        type possible_keys key     key_len ref   rows    Extra
1  SIMPLE      contabilidad ref  anulado       anulado 1       const 4025130

Which means that you have to travel more than 4 million records (field rows ). Approximately 126k sums of records per second.

Your query, unmodified, in my environment takes 36 seconds. It supposes, approximately, 12% more than doing the sum of all the values without relating them to the table of payment methods.

The only way to improve response times, and only if your server has multiple cores, is to run in parallel the sums of each of the payment methods:

# Prueba base:
$ time mysql pruebas -e 'SELECT SUM(transaccion) Total FROM contabilidad WHERE contabilidad.anulado = 0'
+---------------+
| Total         |
+---------------+
| 1147206229.96 |
+---------------+

real    0m32.477s
user    0m0.000s
sys 0m0.000s

# Prueba completa:
$ time mysql pruebas -e "SELECT 
>   formasdepago.nombre as 'Forma de pago', 
>   SUM(transaccion) AS Total 
> FROM contabilidad 
> INNER JOIN formasdepago ON formasdepago.id = contabilidad.formadepago 
> WHERE contabilidad.anulado = 0  
> Group By formadepago"
+---------------+--------------+
| Forma de pago | Total        |
+---------------+--------------+
| VISA          | 382738111.90 |
| PAYPAL        | 382286232.86 |
| METÁLICO      | 382181885.20 |
+---------------+--------------+

real    0m36.378s
user    0m0.000s
sys 0m0.000s

# Paralelismo:
$ time ( for i in 1 2 3 ; do time mysql pruebas -e 'SELECT formadepago, SUM(transaccion) Total FROM contabilidad WHERE contabilidad.anulado = 0 AND formadepago = '$i & done ; for pid in $(jobs -p); do wait $pid; done )
formadepago Total
2   382286232.86
real    0m9.414s
user    0m0.000s
sys 0m0.000s

formadepago Total
1   382738111.90
real    0m9.414s
user    0m0.000s
sys 0m0.000s

formadepago Total
3   382181885.20
real    0m9.415s
user    0m0.000s
sys 0m0.000s

real    0m9.416s
user    0m0.000s
sys 0m0.000s

As you can see, the time has been reduced to only 9 seconds.

In PHP the parallelism could be obtained using Thread .

    
answered by 03.07.2018 / 12:30
source