Merikanto

一簫一劍平生意,負盡狂名十五年

Centralize Log with rsyslog & Logstash

Making sense of the millions of log lines your organization generates can be a daunting challenge. On one hand, these log lines provide a view into application performance, server performance metrics, and security. On the other hand, log management and analysis can be very time consuming, which may hinder adoption of these increasingly necessary services.

Open-source software, such as rsyslog, Elasticsearch, and Logstash provide the tools to transmit, transform, and store log data. In this post, we will see how to create a centralized rsyslog server to store log files from multiple systems, and then use Logstash to send them to an Elasticsearch server.



Summary

We will see how to centralize logs generated or received by syslog, specifically the variant known as rsyslog. Syslog, and syslog-based tools like rsyslog, collect important information from the kernel and many of the programs that run to keep UNIX-like servers running.

As syslog is a standard, and not just a program, many software projects support sending data to syslog. By centralizing this data, you can more easily audit security, monitor application behavior, and keep track of other vital server information.

From a centralized, or aggregating rsyslog server, you can then forward the data to Logstash, which can further parse and enrich your log data before sending it on to Elasticsearch. We will setup:

  • A single, client (or forwarding) rsyslog server
  • A single, server (or collecting) rsyslog server, to receive logs from the rsyslog client
  • A Logstash instance to receive the messages from the rsyslog collecting server
  • An Elasticsearch server to receive the data from Logstash

We will also use Digital Ocean’s services. Create the following Droplets with private networking enabled:

  • Ubuntu Droplet named rsyslog-client
  • Ubuntu Droplet (1 GB or greater) named rsyslog-server where centralized logs will be stored and Logstash will be installed
  • Ubuntu Droplet with Elasticsearch installed

And don’t forget to set up the initial Ubuntu Server.



Determine Private IP

In this section, you will determine which private IP addresses are assigned to each Droplet. This information will be needed through the post.

On each Droplet, find its IP addresses with the ifconfig command:

1
sudo ifconfig -a

The -a option is used to show all interfaces. The primary Ethernet interface is usually called eth0. In this case, however, we want the IP from eth1, the private IP address. These private IP addresses are not routable over the Internet and are used to communicate in private LANs — in this case, between servers in the same data center over secondary interfaces.

The output will look similar to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Output from ifconfig -a

eth0 Link encap:Ethernet HWaddr 04:01:06:a7:6f:01
inet addr:123.456.78.90 Bcast:123.456.78.255 Mask:255.255.255.0
inet6 addr: fe80::601:6ff:fea7:6f01/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:168 errors:0 dropped:0 overruns:0 frame:0
TX packets:137 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:18903 (18.9 KB) TX bytes:15024 (15.0 KB)

eth1 Link encap:Ethernet HWaddr 04:01:06:a7:6f:02
inet addr:10.128.2.25 Bcast:10.128.255.255 Mask:255.255.0.0
inet6 addr: fe80::601:6ff:fea7:6f02/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6 errors:0 dropped:0 overruns:0 frame:0
TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:468 (468.0 B) TX bytes:398 (398.0 B)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

The section to note here is eth1 and within that inet addr. In this case, the private network address is 10.128.2.25. This address is only accessible from other servers, within the same region, that have private networking enabled.

Be sure to repeat this step for all 3 Droplets. Save these private IP addresses somewhere secure. They will be used throughout this post.



Set BindIP for Elasticsearch

The post on ELK shows you how to set the bind address to localhost so that other servers can’t access the service. However, we need to change this so Logstash can send it data over its private network address.

We will bind Elasticsearch to its private IP address. Elasticsearch will only listen to requests to this IP address.

On the Elasticsearch server, edit the configuration file:

1
sudo vim /etc/elasticsearch/elasticsearch.yml

Find the line that contains network.bind_host. If it is commented out, uncomment it by removing the # character at the beginning of the line. Change the value to the private IP address for the Elasticsearch server so it looks like this:

1
network.bind_host: private_ip_address

Finally, restart Elasticsearch to enable the change.

1
sudo service elasticsearch restart

Warning: It is very important that you only allow servers you trust to connect to Elasticsearch. Using iptables is highly recommended. For this post, you only want to trust the private IP address of the rsyslog-server Droplet, which has Logstash running on it.



Centralized Server to Receive Data

In this section, we will configure the rsyslog-server Droplet to be the centralized server able to receive data from other syslog servers on port 514.

To configure the rsyslog-server to receive data from other syslog servers, edit /etc/rsyslog.conf on the rsyslog-server Droplet:

1
sudo vim /etc/rsyslog.conf

Find these lines already commented out in your rsyslog.conf:

1
2
3
4
5
6
7
# provides UDP syslog reception
#$ModLoad imudp
#$UDPServerRun 514

# provides TCP syslog reception
#$ModLoad imtcp
#$InputTCPServerRun 514

The first lines of each section ($ModLoad imudp and $ModLoad imtcp) load the imudp and imtcp modules, respectively. The imudp stands for input module udp, and imtcp stands for input module tcp. These modules listen for incoming data from other syslog servers.

The second lines of each section (``$UDPSerververRun 514and$TCPServerRun 514`) indicate that rsyslog should start the respective UDP and TCP servers for these protocols listening on port 514 (which is the syslog default port).

To enable these modules and servers, uncomment the lines so the file now contains:

1
2
3
4
5
6
7
# provides UDP syslog reception
$ModLoad imudp
$UDPServerRun 514

# provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 514

Save and close the rsyslog configuration file.

Restart rsyslog by running:

1
sudo service rsyslog restart

Your centralized rsyslog server is now configured to listen for messages from remote syslog (including rsyslog) instances.

Tip: To validate your rsyslog configuration file, you can run the sudo rsyslogd -N1 command.


Send Data Remotely with rsyslog

In this section, we will configure the rsyslog-client to send log data to the ryslog-server Droplet we configured in the last step.

In a default rsyslog setup on Ubuntu, you’ll find two files in /etc/rsyslog.d:

  • 20-ufw.conf
  • 50-default.conf

On the rsyslog-client, edit the default configuration file:

1
sudo vim /etc/rsyslog.d/50-default.conf

Add the following line at the top of the file before the log by facility section, replacing private_ip_of_ryslog_server with the private IP of your centralized server:

1
*.*                    @private_ip_of_ryslog_server:514

The first part of the line (.) means we want to send all messages. While it is outside the scope of this post, you can configure rsyslog to send only certain messages. The remainder of the line explains how to send the data and where to send the data. In our case, the @ symbol before the IP address tells rsyslog to use UDP to send the messages. Change this to @@ to use TCP. This is followed by the private IP address of rsyslog-server with rsyslog and Logstash installed on it. The number after the colon is the port number to use.

Restart rsyslog to enable the changes:

1
sudo service rsyslog restart

Congratulations! You are now sending your syslog messages to a centralized server!

Tip: To validate your rsyslog configuration file, you can run the sudo rsyslogd -N1 command.



Format Log Data to JSON

Elasticsearch requires that all documents it receives be in JSON format, and rsyslog provides a way to accomplish this by way of a template.

In this step, we will configure our centralized rsyslog server to use a JSON template to format the log data before sending it to Logstash, which will then send it to Elasticsearch on a different server.

Back on the rsyslog-server server, create a new configuration file to format the messages into JSON format before sending to Logstash:

1
sudo vim /etc/rsyslog.d/01-json-template.conf

Copy the following contents to the file exactly as shown:

1
2
3
4
5
6
7
8
9
10
11
12
13
template(name="json-template"
type="list") {
constant(value="{")
constant(value="\"@timestamp\":\"") property(name="timereported" dateFormat="rfc3339")
constant(value="\",\"@version\":\"1")
constant(value="\",\"message\":\"") property(name="msg" format="json")
constant(value="\",\"sysloghost\":\"") property(name="hostname")
constant(value="\",\"severity\":\"") property(name="syslogseverity-text")
constant(value="\",\"facility\":\"") property(name="syslogfacility-text")
constant(value="\",\"programname\":\"") property(name="programname")
constant(value="\",\"procid\":\"") property(name="procid")
constant(value="\"}\n")
}

Other than the first and the last, notice that the lines produced by this template have a comma at the beginning of them. This is to maintain the JSON structure and help keep the file readable by lining everything up neatly. This template formats your messages in the way that Elasticsearch and Logstash expect to receive them. This is what they will look like:

1
2
3
4
5
6
7
8
9
10
11
12
# Example JSON message

{
"@timestamp" : "2015-11-18T18:45:00Z",
"@version" : "1",
"message" : "Your syslog message here",
"sysloghost" : "hostname.example.com",
"severity" : "info",
"facility" : "daemon",
"programname" : "my_program",
"procid" : "1234"
}

Tip: The rsyslog.com docs show the variables available from rsyslog if you would like to custom the log data. However, you must send it in JSON format to Logstash and then to Elasticsearch.

The data being sent is not using this format yet. The next step shows out to configure the server to use this template file.


Centralized Server & Logstash


Now that we have the template file that defines the proper JSON format, let’s configure the centralized rsyslog server to send the data to Logstash, which is on the same Droplet for this post.

At startup, rsyslog will look through the files in /etc/rsyslog.d and create its configuration from them. Let’s add our own configuration file to extended the configuration.

On the rsyslog-server, create /etc/rsyslog.d/60-output.conf:

1
sudo vim /etc/rsyslog.d/60-output.conf

Copy the following lines to this file:

1
2
3
4
# This line sends all lines to defined IP address at port 10514,
# using the "json-template" format template

*.* @private_ip_logstash:10514;json-template

The *.* at the beginning means to process the remainder of the line for all log messages. The @ symbols means to use UDP (Use @@ to instead use TCP). The IP address or hostname after the @ is where to forward the messages. In our case, we are using the private IP address for rsyslog-server since the rsyslog centralized server and the Logstash server are installed on the same Droplet. This must match the private IP address you configure Logstash to listen on in the next step.

The port number is next. This post uses port 10514. Note that the Logstash server must listen on the same port using the same protocol. The last part is our template file that shows how to format the data before passing it along.

Do not restart rsyslog yet. First, we have to configure Logstash to receive the messages.



Receive JSON Messages in Logstash

In this step you will install Logstash, configure it to receive JSON messages from rsyslog, and configure it to send the JSON messages on to Elasticsearch.

Logstash requires Java 7 or later. Next, install the security key for the Logstash repository:

1
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Add the repository definition to your /etc/apt/sources.list file:

1
2
echo "deb http://packages.elastic.co/logstash/2.3/debian stable main" \
| sudo tee -a /etc/apt/sources.list

Note: Use the echo method described above to add the Logstash repository. Do not use add-apt-repository as it will add a deb-src entry as well, but Elastic does not provide a source package. This will result in an error when you attempt to run apt-get update.

Update your package lists to include the Logstash repository:

1
sudo apt-get update

Finally, install Logstash:

1
sudo apt-get install logstash

Now that Logstash is installed, let’s configure it to listen for messages from rsyslog.

The default installation of Logstash looks for configuration files in /etc/logstash/conf.d. Edit the main configuration file:

1
sudo vim /etc/logstash/conf.d/logstash.conf

Then, add these lines to /etc/logstash/conf.d/logstash.conf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# This input block will listen on port 10514 for logs to come in.
# host should be an IP on the Logstash server.
# codec => "json" indicates that we expect the lines we're receiving to be in JSON format
# type => "rsyslog" is an optional identifier to help identify messaging streams in the pipeline.

input {
udp {
host => "logstash_private_ip"
port => 10514
codec => "json"
type => "rsyslog"
}
}

# This is an empty filter block. You can later add other filters here to further process
# your log lines

filter { }

# This output block will send all events of type "rsyslog" to Elasticsearch at the configured
# host and port into daily indices of the pattern, "rsyslog-YYYY.MM.DD"

output {
if [type] == "rsyslog" {
elasticsearch {
hosts => [ "elasticsearch_private_ip:9200" ]
}
}
}

The syslog protocol is UDP by definition, so this configuration mirrors that standard.

In the input block, set the Logstash host address by replacing logstashprivateip with the private IP address of rsyslog-server, which also has Logstash installed on it.

The input block configure Logstash to listen on port 10514 so it won’t compete with syslog instances on the same machine. A port less than 1024 would require Logstash to be run as root, which is not a good security practice.

Be sure to replace elasticsearchprivateip with the private IP address of your Elasticsearch Droplet. The output block shows a simple conditional configuration. Its object is to only allow matching events through. In this case, that is only events with a “type” of “rsyslog”.

Test your Logstash configuration changes:

1
sudo service logstash configtest

It should display Configuration OK if there are no syntax errors. Otherwise, try and read the error output to see what’s wrong with your Logstash configuration.

When all these steps are completed, you can start your Logstash instance by running:

1
sudo service logstash start

Also restart rsyslog on the same server since it has a Logstash instance to forward to now:

1
sudo service rsyslog restart

To verify that Logstash is listening on port 10514:

1
netstat -na | grep 10514

You should see something like this:

1
udp6       0      0 10.128.33.68:10514     :::*  

You will see the private IP address of rsyslog-server and the 10514 port number we are using to listen for rsyslog data.

Tip: To troubleshoot Logstash, stop the service with sudo service logstash stop and run it in the foreground with verbose messages:

1
/opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf --verbose

It will contain usual information such as verifying with IP address and UDP port Logstash is using:

1
Starting UDP listener {:address=>"10.128.33.68:10514", :level=>:info}


Verify Elasticsearch Input

Earlier, we configured Elasticsearch to listen on its private IP address. It should now be receiving messages from Logstash. In this step, we will verify that Elasticsearch is receiving the log data.

The rsyslog-client and rsyslog-server Droplets should be sending all their log data to Logstash, which is then passed along to Elasticsearch. Let’s generate a security message to verify that Elasticsearch is indeed receiving these messages.

On rsyslog-client, execute the following command:

1
sudo tail /var/log/auth.log

You will see the security log on the local system at the end of the output. It will look similar to:

1
2
May  2 16:43:15 rsyslog-client sudo:    merikanto : TTY=pts/0 ; PWD=/etc/rsyslog.d ; USER=root ; COMMAND=/usr/bin/tail /var/log/auth.log
May 2 16:43:15 rsyslog-client sudo: pam_unix(sudo:session): session opened for user root by merikanto(uid=0)

With a simple query, you can check Elasticsearch:

Run the following command on the Elasticsearch server or any system that is allowed to access it. Replace elasticsearch_ip with the private IP address of the Elasticsearch server. This IP address must also be the one you configured Elasticsearch to listen on earlier in this post.

1
curl -XGET 'http://elasticsearch_ip:9200/_all/_search?q=*&pretty'

In the output you will see something similar to the following:

1
2
3
4
5
6
7
{
"_index" : "logstash-2016.05.04",
"_type" : "rsyslog",
"_id" : "AVR8fpR-e6FP4Elp89Ww",
"_score" : 1.0,
"_source":{"@timestamp":"2016-05-04T15:59:10.000Z","@version":"1","message":" merikanto : TTY=pts/0 ; PWD=/home/merikanto ; USER=root ; COMMAND=/usr/bin/tail /var/log/auth.log","sysloghost":"rsyslog-client","severity":"notice","facility":"authpriv","programname":"sudo","procid":"-","type":"rsyslog","host":"10.128.33.68"}
},

Notice that the name of the Droplet that generated the rsyslog message is in the log (rsyslog-client).

With this simple verification step, our centralized rsyslog setup is complete and fully operational.