What is Glaber?
General information. History. Similarities and differences from Zabbix
General information
Glaber is a fork based on the Zabbix source code that we develop and maintain. The project scales well and is great for monitoring relatively static infrastructures. Most of the installations I know monitor large networks and lots of hardware and their availability, and recent changes to Glaber allow it to be used for highly dynamic monitoring as well
History
At the beginning of 2018, a large IT company in which I headed the IT department was on the verge of another upgrade of the monitoring infrastructure.
Then we had Zabbix running on 21 servers, mostly proxy servers. All that worked unsatisfactorily - in case of massive problems, the monitoring typically went down due to the load, and the speed and frequency of equipment polling were inappropriate.
As part of an in-company hackathon, I made a prototype over the weekend which used ClickHouse to store history.
At the time it was just a fix, but as a "proof of concept" it showed well.
In those days, I had an understanding that all I had to do was tell the world about these improvements, and the world itself would make such a cool integration and upgrade.
We spoke at conferences, we even went to the Zabbix summit and spoke there. But the world didn't hear us. Good, as they say:
"if you want to do it well - do it yourself"
In 2019, a set of accumulated changes was called "Glaber project". At this point, several people joined the project - some helped with the CI/CD, some started using the project, and some told others about it.
Since the beginning of 2021, the project has formed a definite community, and about 20 working installations have been launched. And from the start of 2021 I was able to devote most of my time to the project, began to assemble a team and implement what I had long thought about:
I have heard of other metrics collection and visualization systems. There is reason to believe that some of them are better than Zabbix. But there are several reasons why Zabbix and Glaber are very relevant:
- All-in-one system (from collection to visualization)
- Well-designed triggers, alerting and escalation system
- And a convenient API for external integration
Similarities between Glaber and Zabbix
Visually, the systems are very similar. Glaber's interface is the same as Zabbix's and there is no need to get used to the new one.
They have the same SQL base. The monitoring configuration database is the same for Zabbix and Glaber in terms of tables and fields.
There is forward and backward compatibility except for history data (history* tables) which Glaber doesn't use because it works with external history storages
You can run Glaber on the same database as Zabbix. And vice versa. Configurations and settings will be preserved.
Architecture and philosophy differences between Zabbix and Glaber
Zabbix behaves like a stateless application. The server projects all changes in the monitoring structure into the SQL database. Zabbix is very much like a financial application with transactions: everything collected will be processed and written. It's clear, it's reliable, but it requires a lot of resources. Everything is stored in the database. Everything is done to make sure that the collected data gets into the database.
Glaber prioritizes operational monitoring. All measures are taken to ensure that the state of the infrastructure is reflected as quickly and accurately as possible in the API/UI, while the database is secondary. Glaber may postpone deliberately heavy tasks for the sake of overall speed. For example, to get a quick start, the computation of some triggers may be delayed.
Key features of Glaber (technical differences from Zabbix)
Glaber works with modern history repositories:
Сейчас поддерживаются ClickHouse, VictoriaMetrics, InfluxDB. Для поддержки новой базы данных можно написать плагин-адаптер на любом языке, изменение кода самого сервера не требуется.
More advanced metrics storage API
- Work with external modules - vorkers is supported
- Operation with several history storages at the same time
- Filtering by data type and by data type (history or trends) is possible. For example, you can write to two different ClickHouse databases at once (if you don't want to build a full-fledged cluster, but need a reserve):
There are various scenarios you can configure to work with history. For example, store history in VictoriaMetrica and trends and string data in ClickHouse.
Full integration of external data stores:
Data is not only uploaded to external systems, but also read from them, so you can view graphs and API metrics from any storage system supported in the native UI, not only in external systems.
Fast operation with operational data
Glaber sends operational data about the state of objects from the server memory and does not write them to SQL. This has two important advantages:
- WEB interfaces work quickly
- The SQL base is not slowed down by the load
This is important, because we often don't know until the mass crashes that we're stymied by SQL.
Modularity
Using workers - adapters allows you to easily increase the functionality, supported protocols and methods of capturing metrics.
Glaber implements the idea of constantly running external services that the server runs. Such services are called vorkers and can be written in any language. In Glaber itself there are workers written as separate applications in Go and C.
The main function of workers is to adapt one or another Glaber API to a particular task, format, or system.
Now workers:
- Implement interfaces to different history repositories
- Work as external scripts
- Work as servers to receive new types of data, such as logs
For example, workers are used in the "effective ping" task. In this case the following utility works as a vorkerа glbmap, which uses elevated permissions to work with RAW socket.
Can collect, digitize and store logs
And not only logs.
Glaber can be a "server" for any type of data.
By using server vorkers, you can receive data from any protocol. The server takes control of the work of the vorkers, which reduces the amount of work to create and maintain external metrics converters.ик.
Advanced preprocessing and autocreation of hosts using LLD and templates allow you to set up systems where you can simply "pour" data to the monitoring server, while all the necessary objects the server will create itself by templates.
In particular, starting with release 2.4.0, Glaber has a worker release that accepts data via the Syslog protocol and thus implements log reception. Advanced preprocessing functions allow you to "digitize" logs into graphs.
Has advanced data preprocessing capabilities
In Glaber, new functions have been added to the preprocessing:
- Multiple preprocessors can be run, which removes the overall limit of 70-90 thousand new metrics per second per system
- Preprocessors prioritize tasks depending on the load. If the processing queue becomes too large, data processing is suspended, preventing uncontrolled memory consumption
In preprocessing we have introduced functions of data aggregation by time - max, min, sum, count, avg. Aggregation with count type allows you to implement frequency counting of data, including non-numeric data - this, for example, allows you to visualize logs.
It starts quickly
Glaber periodically unloads a single file dump of the history cache and the state of objects.
On startup, this file is read quickly and efficiently, avoiding the expense of reading historical data from the history repository.
Together with the read time limit function of the API history, this gives a very quick start and a transition to the settled mode of operation with warmed up cache.
High-performance pollers
Glaber uses re-written AGENT, SNMP, ICMP, external vorkers, and vorkers in server mode.
Working with these protocols and services is done efficiently:
- In asynchronous mode
- Uses its own configuration caches
- Does not use SQL
- Focuses on heavy loads
Reduced overhead and fewer resources allow you to achieve metrics capture rates several hundred times higher than similar synchronous pollers in Zabbix, with minimal configuration work not blocking the entire monitoring operation.
Glaber cluster
Glaber can operate in cluster mode, distributing the load among the servers. If one of the servers fails, the remaining servers redistribute its load. For cluster operation, I came up with the concept of "monitoring domains" - a new way to configure and how the servers in the cluster interact.
Remark: complex clusters are the first to "crash". So even though it is possible to run in cluster mode, I recommend just running two independent servers.
A significant limitation is still the presence of the SQL database with a large number of updates and the overall complexity of the system. For the cluster to work correctly, it is necessary to ensure synchronization of SQL databases, which is sometimes a non-trivial task.
Might be better to scale.
Most of the measures in the previous paragraphs lead to a significant reduction in resources and increased performance. That is why at one time we transformed 21 monitoring servers into 2. Here are some specific numbers.
One of the installations I've accompanied collects about 40,000 new values per second. That's 6 million unique metrics and about 100 thousand devices. The configuration runs on a single server, a single-processor machine with Xeon1280 and 32G of memory, combining work with a SQL server. History storage is done outside of this server, but it is potentially possible to store history locally as well. For ClickHouse and VictoriaMetrics the workloads are minuscule.
The tests show very good opportunities for growth. On a server with 16 inline Xeon in Glaber it was possible to process more than 250 thousand metrics per second (from reception to uploading to history storage).
Q&A: About stability, support, project survivability, sources
Q1: A frequent question I get from Glaber users is "if you get googled and quit the project, what do we do?
I really like the project and the question of Glaber development is more a question of meaning and motivation. I do not fixate on the development of Glaber on myself, and I understand that there must be a community, a team. That's why I am actively looking for people to join the team. Now Glaber is supported by several companies.
Q2: This is opensource, right?
Yes, you can download the source code on gitlab. https://gitlab.com/mikler/glaber. There are also installation instructions for Debian and RedHat-based systems
Q3: Is it hard to maintain Glaber on your own?
No, it's not. A specialist who is familiar with Zabbix installation and knows how to use Google and StackOverflow will be able to solve most issues. As a rule, after the initial installation, there are few of them - because often the main source of problems is an overloaded SQL-base, while in Glaber the database is loaded orders of magnitude less.
Q4: What if Zabbix implements a super feature that Glaber doesn't have right now?
Glaber updates the Zabbix source code on which it is based about every six months. Right now it's 5.2.4. If there is something useful in the fresh release that is in high demand, we will update unscheduled. Modularity allows you to minimize changes to the system core, so the merge is usually fairly simple.
Q5: What if something comes out in Zabbix that duplicates Glaber functionality?
All things being equal, I will give preference to mainline, i.e. Zabbix code, providing a migration mechanism.
But so far it's a theory, and in three years there hasn't been a single crossover on new functionality between Glaber and Zabbix. By the way, I think it's related to what I wrote in "philosophy" - we're looking at different development paths and therefore doing different things.
Some links:
-
Project on gitlab: https://gitlab.com/mikler/glaber
-
Telegram group: @glaber_group
Or you can - write me personally in telegram @makurov or in mail makurov@gmail.com