Using the ELK stack for log review and visualizations

by Mike Sweetman
20 May 2020

In cloud systems, when you use many cloud instances in your project, very often it is necessary to analyze the behavior of the system during deployment and maintenance.

Imagine a situation where the system is scaling in the cloud and then suddenly you need to quickly look through a hundred thousand lines of logs on a hundred instances and quickly find the error or evaluate the statistics of the cloud application. An amusing task, one might even say impossible. And here logging analysis mechanization tools such as ELK stack come to our aid.

What is a log? This is a special record that is created during the operation of a particular software module or function, and a journal entry is created with official information. Then, having enough of these log entries, you can analyze how the software works or localize the error or failure of a specific module.

What is the ELK stack?

ELK stands for Elasticsearch, Logstash, and Kibana. Previously, these were three independent products, but at some point, they began to belong to one company and develop in one direction. Each of these tools is a full-fledged independent open-source product, and together they form a powerful solution for a wide range of data collection, storage, and analysis tasks. This is now called Elastic Stack and comes with Beats.

Why do we need these logs in general? In fact, they allow you to solve very important tasks, especially if the logs are accumulated and processed by means such as ELK stack:

  • Make life easier for developers and system administrators during log analysis
  • Writing grep pipelines and parsers for each individual case requires effort
  • To expand the circle of users for analyzing logs and provide access to logs to trained managers and technical support.
  • Observe the dynamics and trends of the occurrence of secured events in the cloud system

Logstash

And in the modern ELK stack, a fourth component is added as needed. The Elastic Stack complex has four main components:

•    Elasticsearch: RESTful distributed search engine that stores all the collected data.
•    Logstash: an Elastic complex data processing element that sends incoming data to Elasticsearch.
•    Kibana: a web interface for searching and visualizing magazines.
•    Beats: Compact, single-purpose data transfer elements that can send data from hundreds or thousands of computers to Logstash or Elasticsearch.

Elasticsearch as a log search tool

Elasticsearch is a replicated free software search engine. Elasticsearch provides a horizontally scalable search and supports multithreading.

Search indexes can be divided into segments, each segment can have several replicas, several segments can be placed on each node, and each node acts as a coordinator for delegating operations to the correct segment, and rebalancing and routing are performed automatically. Related data is often stored in the same index, which consists of one or more primary segments and possibly multiple replicas.

Elasticsearch
 
All functions of the Lucene library, which is part of Elasticsearch, are available through application interfaces in JSON and Java. Another feature of Elasticsearch is the so-called “gateway”, which provides long-term index retention; for example, the index can be restored from the gateway in the event of a server failure. The system supports real-time HTTP GET requests.

As indicated on the official Elasticsearch website:  “What exactly can I use Elasticsearch for? Numbers, text, geo, structured, unstructured. All data types are welcome. Full-text search just scratches the surface of how companies around the world are relying on Elasticsearch to solve a variety of challenges. See a full list of solutions built directly on the Elastic Stack.”

Elasticsearch2
 
Image source

Logstash to log the data

Logstash is a free and open-source data pipeline software that handles large amounts of data, including event logs. It allows you to collect, analyze, filter, and normalize data. It has over 200 plugins that allow you to connect a large number of different types of sources or data streams.

Commonly used in conjunction with Elasticsearch and Kibana: Logstash collects data (such as logs), processes it, converts it to JSON format, and saves it to Elasticsearch, and Kibana is used as a customer front end to Elasticsearch.

As mentioned here “Logstash dynamically ingests data of all shapes, sizes, and sources. Data is often scattered or siloed across many systems in many formats. Logstash supports a variety of inputs that pull in events from a multitude of common sources, all at the same time. Easily ingest from your logs, metrics, web applications, data stores, and various AWS services, all in continuous, streaming fashion.”

Logstash dynamically transforms and prepares data regardless of format or complexity:

  • Derive structure from unstructured data with grok.
  • Decipher geo coordinates from IP addresses.
  • Anonymize PII data, exclude sensitive fields completely.
  • Ease overall processing, independent of the data source, format, or schema.

Visualizing log data with Kibana

Kibana is an open-source plugin for data visualization with Elasticsearch. It provides visualization of content indexed by the Elasticsearch cluster. Users can create bar charts, line and point charts, or diagrams, and maps on big data.

From the ELK platform, Kibana receives data for visualization, including data displayed on information boards.

A tool like Kibana is key in rendering logs because it allows you to see a large amount of data in a graphical form that cannot be done in a textual representation of information.

As mentioned on the Kibana web site: “Kibana core ships with the classics: histograms, line graphs, pie charts, sunbursts, and more. And, of course, you can search across all of your documents.”

Kibana

Beats

Beats is software for lightweight specialized agents that receive data and then transfer it to Elasticsearch. The best thing about Beats is the libbeat environment, which makes it easy to create custom beats for any type of data that needs to be sent to Elasticsearch. Thanks to this flexibility, the number of beats available in general, is growing rapidly. Even in previous versions of ELK Stack, users find that when you connect Beats, there are improved capabilities for collecting data, in our case logs.

Putting Beats on the log stack offers a number of useful benefits and features. In fact, then you do not have to depend on the type of input information for the logging system and you can collect data from almost anywhere. See the figure below.

Beats

Main problems to log data with ELK

The main problem with the Elastic stack may be a temporary shutdown and a long restart after a rebooting. This process is prolonged when using Docker. Logstash-indexer may hang at times, meaning nothing happens in the logs or unsuccessful attempts to send data to ES are observed.

When designing a system, you need to pay attention to all messages falling into Kibana. If the mapping of the field name or its type (number, string, array, object, etc.) is violated, Kibana may incorrectly display this in graphs.

“Robust, Reliable, Predictable” is quite difficult to implement in ELK. The system requires attention and careful tuning if you want all the logs to be displayed as required.

Elastic Cloud

In the modern world, the tendency to transfer everyone and everything to the clouds is developing, so Elastic Cloud allows us to deploy hosted Elasticsearch and Kibana on AWS, GCP, and Azure.

And you can do a fully loaded deployment on the cloud provider you choose. Many companies are switching to Elastic Search in the cloud. This is convenient and you can make a cluster for the search engine logs, again, maintaining a logging system in the cloud is now easier than on-premises.

In addition to the cloud solution, of course, it is worth remembering the containers. Alternatively, if necessary, you can place all the components of the ELK stack in a container or in separate containers. The overhead will not be very large and you will not lose in productivity, but how much productivity will be gained from using a container that can be expanded, minimized, transferred to new servers at any time. You can read about setting up Elasitsearch in a container here.

You don’t have to bother at all and immediately roll up the finished container from the repository with the ELK stack installed. Here's an example of one of them. Or read an article about it here.

Conclusion

The ELK stack in many projects is simply irreplaceable. Managers can monitor the bursts of errors on the frontend after the next release and come to the developers with a certain set of information. And developers and devs, in turn, find correlations with errors in the application, see their most important data, necessary for debugging and accurate localization of errors. It is also possible to instantly build various reports.

Our experts will set up an Elastic Stack on your system. We can make logging for your system reliable and informative. Nowadays, when the majority of specialists switch to cloud systems in their projects, and the maintenance of information systems already goes into the remote mode, it is very important to have timely and adequate information about the state of the system. This was especially true in situations such as quarantine when most workers were forced to work from home. We provide services for the development, configuration, and maintenance of ELK and other cloud solutions.

Related articles.