Aggregation

functions for aggregating data in preprocessing.

The motivation is to control the flow of incoming metrics, and have a tool to aggregate it to a reasonable level. It can be useful on data that the server accepts passively.

For example, when processing logs, visualization with a frequency of once a minute might be enough whatever the income log rate is. Aggregation can be performed with one of 5 functions - min, max, avg, sum, count.

For example, "calculate the sum of the field in 1 minute", "count the number of data items in 30 seconds", "find the maximum / minimum / average". All functions except count require numeric input. Count can work with any input data, this function is useful to count the frequency of incoming date.

Setting up preprocessing

A new group of functions "Aggregation" is available in preprocessing steps. It has a single function - "Aggregate over time", which has two parameters - the first parameter is the time in seconds during which the aggregation is performed, the second is one of the aggregation functions

preprocessing_agg

How it works, features

When configuring aggregation, you must make sure that processing always reaches the aggregation rule, even if the filtering did not return matches in the previous steps.

This is due to the internal implementation of preprocessing - if the rule has not been executed, the history of accumulated values ​​for it is lost, in which case the aggregation will not be able to work properly. Therefore, the aggregation rules support a special value - the string value "NaN". Accordingly, in the filtering rules, you need to indicate that you need to return a predefined value if there were no matches and specify this value "NaN".

When NaN comes to the input of the aggregation rule, the function does nothing, but the history is saved.

How to use: examples

By making a dependent log item with validation of the value through JsonPath or Regexp filtering and enabling the count function in the second step, you can count the number of https responses with a 200 code.

Or, for example, the number of requests for a specific UserAgent.

Aggregation might be used to reduce incoming metrics flow rate or to make it predictable or reasonable.

Sum functions are suitable for bytes, average - for average backend response time. Count for frequency, minimum and maximum are rather useful for defining limits and to be used in triggers.