RRD miscalculates the averages (AVERAGE)

0

On a RRD file (Round Rubin Database), with Perl and rrdTools I am exporting to XML ( DUMP), removing all the data from a given date, restoring (RESTORE), resizing back to the initial values (RESIZE) and adding my own (UPDATE).

At the end, I compare both files and they are identical except for the modified data. If I make a dump, the XML also.

Four aggregations of 5 minutes, 30 minutes, 2 hours and a day are configured and, being the first and the second correct with respect to the modified and unmodified data, surprisingly the averages of two hours and the daily gives a data incorrect in the modified strips, greater than any of the participants in that average.

It's as if, in order to find the mean, instead of dividing by the number of addends, I did not take into account any of them and, turning it over and over again, I can not find what possible combination of addends and divisor it is using find where you are getting involved.

Of course, the rest of the file does well the rest of media in the data areas that I have not modified (although perhaps if cut and replaced, that is, added after a swath of data modified but identical to the originals) .

Has something similar happened to someone? Is there any configuration value that I have to take into account?

Thank you very much.

    
asked by Malyssia 28.02.2017 в 09:29
source

1 answer

0

One possible cause is the loss of information due to the presence of unknown data in the smallest time periods. If there is not enough known data, rrdtool will discard the entire period.

By default, in many systems, a discarding of the considered period is defined if the data number is less than 50%. Example taken from a rrddump:

<rra>                                                    
        <cf>AVERAGE</cf>                                 
        <pdp_per_row>1</pdp_per_row> <!-- 300 seconds -->

        <params>                                         
        <xff>5.000000000e-01</xff>                       
        </params>                                        

The xff factor defines which part of the consolidation interval can be composed of unknown data (UNKNOWN) and still consider the consolidated value as known. That is to say: the percentage of minimum data that is needed in a certain period to consider that it has enough data to make the consolidation for the greater period. If it is set to 50%, then that is the maximum of unknown data allowed. If that mark is exceeded, the entire period is marked as unknown for superior consolidation.

More information in man rrdcreate

    
answered by 01.03.2017 в 10:26