What is Glaber?

General information. History. Similarities and differences from Zabbix

General information

Glaber is a fork based on the Zabbix source code that we develop and maintain. The project scales well and is great for monitoring relatively static infrastructures. Most of the installations I know monitor large networks and lots of hardware and their availability, and recent changes to Glaber allow it to be used for highly dynamic monitoring as well

History

At the beginning of 2018, a large IT company in which I headed the IT department was on the verge of another upgrade of the monitoring infrastructure.

Then we had Zabbix running on 21 servers, mostly proxy servers. All that worked unsatisfactorily - in case of massive problems, the monitoring typically went down due to the load, and the speed and frequency of equipment polling were inappropriate.

As part of an in-company hackathon, I made a prototype over the weekend which used ClickHouse to store history.

At the time it was just a fix, but as a "proof of concept" it showed well.

In those days, I had an understanding that all I had to do was tell the world about these improvements, and the world itself would make such a cool integration and upgrade.

We spoke at conferences, we even went to the Zabbix summit and spoke there. But the world didn't hear us. Good, as they say:

"if you want to do it well - do it yourself"

In 2019, a set of accumulated changes was called "Glaber project". At this point, several people joined the project - some helped with the CI/CD, some started using the project, and some told others about it.

Since the beginning of 2021, the project has formed a definite community, and about 20 working installations have been launched. And from the start of 2021 I was able to devote most of my time to the project, began to assemble a team and implement what I had long thought about:

Глайбер-активность

I have heard of other metrics collection and visualization systems. There is reason to believe that some of them are better than Zabbix. But there are several reasons why Zabbix and Glaber are very relevant:

All-in-one system (from collection to visualization)
Well-designed triggers, alerting and escalation system
And a convenient API for external integration

Similarities between Glaber and Zabbix

Visually, the systems are very similar. Glaber's interface is the same as Zabbix's and there is no need to get used to the new one.

They have the same SQL base. The monitoring configuration database is the same for Zabbix and Glaber in terms of tables and fields.

There is forward and backward compatibility except for history data (history* tables) which Glaber doesn't use because it works with external history storages

You can run Glaber on the same database as Zabbix. And vice versa. Configurations and settings will be preserved.

Architecture and philosophy differences between Zabbix and Glaber

Zabbix behaves like a stateless application. The server projects all changes in the monitoring structure into the SQL database. Zabbix is very much like a financial application with transactions: everything collected will be processed and written. It's clear, it's reliable, but it requires a lot of resources. Everything is stored in the database. Everything is done to make sure that the collected data gets into the database.

Glaber prioritizes operational monitoring. All measures are taken to ensure that the state of the infrastructure is reflected as quickly and accurately as possible in the API/UI, while the database is secondary. Glaber may postpone deliberately heavy tasks for the sake of overall speed. For example, to get a quick start, the computation of some triggers may be delayed.

Key features of Glaber (technical differences from Zabbix)

Glaber works with modern history repositories:

780х440-глайбер-пишет--бд2

Сейчас поддерживаются ClickHouse, VictoriaMetrics, InfluxDB. Для поддержки новой базы данных можно написать плагин-адаптер на любом языке, изменение кода самого сервера не требуется.

More advanced metrics storage API

Work with external modules - vorkers is supported
Operation with several history storages at the same time
Filtering by data type and by data type (history or trends) is possible. For example, you can write to two different ClickHouse databases at once (if you don't want to build a full-fledged cluster, but need a reserve):

Глайбер-дублирует-в-кликхаус

There are various scenarios you can configure to work with history. For example, store history in VictoriaMetrica and trends and string data in ClickHouse.

Full integration of external data stores:

Data is not only uploaded to external systems, but also read from them, so you can view graphs and API metrics from any storage system supported in the native UI, not only in external systems.

Fast operation with operational data

Glaber sends operational data about the state of objects from the server memory and does not write them to SQL. This has two important advantages:

WEB interfaces work quickly
The SQL base is not slowed down by the load

This is important, because we often don't know until the mass crashes that we're stymied by SQL.

Modularity

Using workers - adapters allows you to easily increase the functionality, supported protocols and methods of capturing metrics.

модульность

Glaber implements the idea of constantly running external services that the server runs. Such services are called vorkers and can be written in any language. In Glaber itself there are workers written as separate applications in Go and C.

The main function of workers is to adapt one or another Glaber API to a particular task, format, or system.

Now workers:

Implement interfaces to different history repositories
Work as external scripts
Work as servers to receive new types of data, such as logs

For example, workers are used in the "effective ping" task. In this case the following utility works as a vorkerа glbmap, which uses elevated permissions to work with RAW socket.

Can collect, digitize and store logs

And not only logs.

Glaber can be a "server" for any type of data.

By using server vorkers, you can receive data from any protocol. The server takes control of the work of the vorkers, which reduces the amount of work to create and maintain external metrics converters.ик.

Advanced preprocessing and autocreation of hosts using LLD and templates allow you to set up systems where you can simply "pour" data to the monitoring server, while all the necessary objects the server will create itself by templates.

Логи-в-графики

In particular, starting with release 2.4.0, Glaber has a worker release that accepts data via the Syslog protocol and thus implements log reception. Advanced preprocessing functions allow you to "digitize" logs into graphs.

Has advanced data preprocessing capabilities

Глайбер-препроцессинг3

In Glaber, new functions have been added to the preprocessing:

Multiple preprocessors can be run, which removes the overall limit of 70-90 thousand new metrics per second per system
Preprocessors prioritize tasks depending on the load. If the processing queue becomes too large, data processing is suspended, preventing uncontrolled memory consumption

In preprocessing we have introduced functions of data aggregation by time - max, min, sum, count, avg. Aggregation with count type allows you to implement frequency counting of data, including non-numeric data - this, for example, allows you to visualize logs.

It starts quickly

Glaber periodically unloads a single file dump of the history cache and the state of objects.

быстрый-старт

On startup, this file is read quickly and efficiently, avoiding the expense of reading historical data from the history repository.

Together with the read time limit function of the API history, this gives a very quick start and a transition to the settled mode of operation with warmed up cache.

High-performance pollers

Glaber uses re-written AGENT, SNMP, ICMP, external vorkers, and vorkers in server mode.

Working with these protocols and services is done efficiently:

In asynchronous mode
Uses its own configuration caches
Does not use SQL
Focuses on heavy loads

Reduced overhead and fewer resources allow you to achieve metrics capture rates several hundred times higher than similar synchronous pollers in Zabbix, with minimal configuration work not blocking the entire monitoring operation.

Glaber cluster

Глайбер-кластер2

Glaber can operate in cluster mode, distributing the load among the servers. If one of the servers fails, the remaining servers redistribute its load. For cluster operation, I came up with the concept of "monitoring domains" - a new way to configure and how the servers in the cluster interact.

Remark: complex clusters are the first to "crash". So even though it is possible to run in cluster mode, I recommend just running two independent servers.

A significant limitation is still the presence of the SQL database with a large number of updates and the overall complexity of the system. For the cluster to work correctly, it is necessary to ensure synchronization of SQL databases, which is sometimes a non-trivial task.

Might be better to scale.

Most of the measures in the previous paragraphs lead to a significant reduction in resources and increased performance. That is why at one time we transformed 21 monitoring servers into 2. Here are some specific numbers.

One of the installations I've accompanied collects about 40,000 new values per second. That's 6 million unique metrics and about 100 thousand devices. The configuration runs on a single server, a single-processor machine with Xeon1280 and 32G of memory, combining work with a SQL server. History storage is done outside of this server, but it is potentially possible to store history locally as well. For ClickHouse and VictoriaMetrics the workloads are minuscule.

The tests show very good opportunities for growth. On a server with 16 inline Xeon in Glaber it was possible to process more than 250 thousand metrics per second (from reception to uploading to history storage).

Q&A: About stability, support, project survivability, sources

Q1: A frequent question I get from Glaber users is "if you get googled and quit the project, what do we do?

I really like the project and the question of Glaber development is more a question of meaning and motivation. I do not fixate on the development of Glaber on myself, and I understand that there must be a community, a team. That's why I am actively looking for people to join the team. Now Glaber is supported by several companies.

Q2: This is opensource, right?

Yes, you can download the source code on gitlab. https://gitlab.com/mikler/glaber. There are also installation instructions for Debian and RedHat-based systems

Q3: Is it hard to maintain Glaber on your own?

No, it's not. A specialist who is familiar with Zabbix installation and knows how to use Google and StackOverflow will be able to solve most issues. As a rule, after the initial installation, there are few of them - because often the main source of problems is an overloaded SQL-base, while in Glaber the database is loaded orders of magnitude less.

Q4: What if Zabbix implements a super feature that Glaber doesn't have right now?

Glaber updates the Zabbix source code on which it is based about every six months. Right now it's 5.2.4. If there is something useful in the fresh release that is in high demand, we will update unscheduled. Modularity allows you to minimize changes to the system core, so the merge is usually fairly simple.

Q5: What if something comes out in Zabbix that duplicates Glaber functionality?

All things being equal, I will give preference to mainline, i.e. Zabbix code, providing a migration mechanism.

But so far it's a theory, and in three years there hasn't been a single crossover on new functionality between Glaber and Zabbix. By the way, I think it's related to what I wrote in "philosophy" - we're looking at different development paths and therefore doing different things.

Some links:

Project on gitlab: https://gitlab.com/mikler/glaber
Telegram group: @glaber_group

Or you can - write me personally in telegram @makurov or in mail makurov@gmail.com