Seconds_Behind_Master grows constantly on MariaDB / MySQL slave server

1

I have a slave server configured running on Debian 9 in MariaDB 10.1. I configured it and started to work and everything was fine, what worries me is that every time the variable Seconds_Behind_Maste is delayed. He started with 1 second delay and is already 16 and each time he delays more. This is the consecutive reading of the log, as you will see more and more delays. Can somebody help me ? Greetings ..

  • READING 1.
  •     show slave status\G
        *************************** 1. row ***************************
                       Slave_IO_State: Waiting for master to send event
                          Master_Host: oculto por seguridad
                          Master_User: oculto por seguridad
                          Master_Port: 3306
                        Connect_Retry: 60
                      Master_Log_File: log-bin.005272
                  Read_Master_Log_Pos: 135257692
                       Relay_Log_File: mysql-relay-bin.000008
                        Relay_Log_Pos: 57338823
                Relay_Master_Log_File: log-bin.005270
                     Slave_IO_Running: Yes
                    Slave_SQL_Running: Yes
                      Replicate_Do_DB: BdProduccion1
                  Replicate_Ignore_DB: 
                   Replicate_Do_Table: 
               Replicate_Ignore_Table: 
              Replicate_Wild_Do_Table: 
          Replicate_Wild_Ignore_Table: 
                           Last_Errno: 0
                           Last_Error: 
                         Skip_Counter: 0
                  Exec_Master_Log_Pos: 415674922
                      Relay_Log_Space: 850673341
                      Until_Condition: None
                       Until_Log_File: 
                        Until_Log_Pos: 0
                   Master_SSL_Allowed: No
                   Master_SSL_CA_File: 
                   Master_SSL_CA_Path: 
                      Master_SSL_Cert: 
                    Master_SSL_Cipher: 
                       Master_SSL_Key: 
                **Seconds_Behind_Master: 16912**
        Master_SSL_Verify_Server_Cert: No
                        Last_IO_Errno: 0
                        Last_IO_Error: 
                       Last_SQL_Errno: 0
                       Last_SQL_Error: 
          Replicate_Ignore_Server_Ids: 
                     Master_Server_Id: 50
                       Master_SSL_Crl: 
                   Master_SSL_Crlpath: 
                           Using_Gtid: No
                          Gtid_IO_Pos: 
              Replicate_Do_Domain_Ids: 
          Replicate_Ignore_Domain_Ids: 
                        Parallel_Mode: conservative
                            SQL_Delay: 0
                  SQL_Remaining_Delay: NULL
              Slave_SQL_Running_State: Unlocking tables
    

    In the second reading I was already on Seconds_Behind_Master: 187234, can someone help me and tell me what's up with this?

        
    asked by Alberto Blanco Cala 05.10.2018 в 20:42
    source

    2 answers

    1

    Thanks for the reply. The problem is that the disk of the slave is HDD and the Master is SSD so the writing in one is very superior to the other. Setting innodb_flush_log_at_trx_commit = 2 solved the problem.

    Controls the balance between strict ACID compliance for commit operations and higher performance that is possible when commit-related I / O operations are rearranged and done in batches. You can achieve better performance by changing the default value but then you can lose transactions in a crash.

    The default setting of 1 is required for full ACID compliance. Logs are written and flushed to disk at each transaction commit.

    With a setting of 0, logs are written and flushed to disk once per second. Transactions for which logs have not been flushed can be lost in a crash.

    With a setting of 2, logs are written after each transaction commit and flushed to disk once per second. Transactions for which logs have not been flushed can be lost in a crash.

    For settings 0 and 2, once-per-second flushing is not 100% guaranteed. Flushing may occur more frequently due to DDL changes and other internal InnoDB activities that cause logs to be flushed independently of the innodb_flush_log_at_trx_commit setting, and sometimes less due to scheduling issues. If logs are flushed once per second, up to one second of transactions can be lost in a crash. If logs are flushed more or less frequently than once per second, the amount of transactions that can be lost varies accordingly.

    DDL changes and other internal InnoDB activities flush the log independently of the innodb_flush_log_at_trx_commit setting.

    InnoDB crash recovery works regardless of the innodb_flush_log_at_trx_commit setting. Transactions are either applied entirely or erased entirely.

    For the greatest possible durability and consistency in a replication setup using InnoDB with transactions, use innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1 in your master server my.cnf file. innodb_flush_log_at_trx_commit Another detail Sometimes the latency is caused by the speed of the link and other issues related to the network. In this case try to set: slave_compressed_protocol = 1 in my.cnf Description: If set to 1 (0 is the default), will use compression for the slave / master protocol if both master and slave support this. Commandline: --slave-compressed-protocol Scope: Global Dynamic: Yes Data Type: boolean Default Value: 0 slave_compressed_protocol

        
    answered by 08.10.2018 в 18:40
    0

    It indicates that there is a delay in the Slave and it is not up to date, the value indicates the number of seconds that is delayed and this term we say lag , ideally it is always 0, If you start to increase this number it can be for the following reasons:

  • Slave hardware is inferior to the Master and is not capable of processing all incoming traffic. It is recommended that Master and Slave have in principle the same hardware and then you can adjust it. Check the following sub-points:

    1.1. The disk should not be saturated IOPS , you can use this percona tool pt-diskstats to find out.

    1.2. Make sure the Slave does not have the CPU saturated.

    1.3. Make sure Slave is only doing his Slave work and there are no other services running and resources are being subtracted.

  • There is a process that is blocking some table, for example:

    2.1. Slow queries that may be blocking the table, you can use slow_log to identify them. To see the blocked tables you can use SHOW OPEN TABLES .

    2.2. You may be doing an ALTER TABLE or using a tool pt-online-schema-change

  • There may be more additional writing activity such as a General Log, or an activated BinLog, may not be necessary.

  • You may have a lot of writing activity in the Master and the Slave can not process it in time, this may be due to a periodic process that is controlled. When this is the case I recommend you to do INSERT / UPDATE / DELETE controlled blocks and with spaces of time between them, for example: INSERT of 100 in 100 and with a waiting time of 1 second.

  • Once you know where the bottleneck generated by the lag is, we can know if we should increase hardware resources, add indexes, refactor some part of the application code or change some database configuration variable .

        
    answered by 05.10.2018 в 23:09