In this post ...
Retrieve and query the remote linux machine message logging
GOTO
Previous post: Set up Virtual Machine
Next post: Create a monitoring system for a remote linux machine and show the informations with a dashboard
Github repository
Note:
$ --> execute the command as cloudera user
# --> execute the command as root user
Remote server = Big Data server = localhost . In this example the remote machine to monitoring is the Big Data server in order to simplified
the architecture.
Setting up Flume Syslog source - Step 1
Objective
Send logging message, intercept and print the messages by Flume agent.
Instruction
setup rsyslog in order to redirect all events to the receiver host we will configure rsyslog for the sender machine. Add to /etc/rsyslog.conf the following lines at end of the files
In order to modify root file you can execute the following command:
$ sudo gedit /etc/rsyslog.conf
The default root password for Cloudera VM is "cloudera".# Max memory space
$ActionQueueMaxDiskSpace 2g
# Save in-memory data if flume shuts down
$ActionQueueSaveOnShutdown on
# Declare queue a in-memory linked list. There is some processing
# overhead, but ensures that memory is only allocated in cases where
# it is needed
$ActionQueueType LinkedList
# Infinite retries on insert failure
$ActionResumeRetryCount -1
# Foward all logging message to the Big Data server
*.* @@localhost:5140
Create a directory from a user directory and change the current directory
$ mkdir --parent ~cloudera/monitoring-with-big-data-technologies-part-two/step-1
$ cd ~cloudera/monitoring-with-big-data-technologies-part-two/step-1
Create flume-env.sh to give more memory and pre-allocate for Flume agent
# Flume environment configuration.
# Copyright (C) 2015 Fabio Pirola <fabio@pirola.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# Give Flume more memory and pre-allocate
export JAVA_OPTS="-Xms50m -Xmx512m"
Create flume.properties that describe the Flume agent that receive syslog event
# Flume agent configuration.
# Copyright (C) 2015 Fabio Pirola <fabio@pirola.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# Name the components on this agent
syslogAgentStep1.sources = r1
syslogAgentStep1.sinks = s1
syslogAgentStep1.channels = c1
# Describe/configure the source
# I'll be using TCP based Syslog source
syslogAgentStep1.sources.r1.type = syslogtcp
# Host name or IP address to bind to
syslogAgentStep1.sources.r1.host = localhost
# Port # to bind to
syslogAgentStep1.sources.r1.port = 5140
# Describe/configure the channel
# Use a channel which buffers events in memory
syslogAgentStep1.channels.c1.type = memory
# The maximum number of events stored in the channel
syslogAgentStep1.channels.c1.capacity = 1000
# The maximum number of events the channel will take from a source
# or give to a sink per transaction
syslogAgentStep1.channels.c1.transactionCapacity = 100
# Describe the sink
syslogAgentStep1.sinks.s1.type = logger
# Maximum number of bytes of the Event body to log
syslogAgentStep1.sinks.s1.maxBytesToLog = 500
# Binding source/channel/sink
syslogAgentStep1.sources.r1.channels = c1
syslogAgentStep1.sinks.s1.channel = c1
Start Flume agent with the following command
$ flume-ng agent --conf "./" --conf-file "./flume.properties" -n syslogAgentStep1 -Dflume.root.logger=INFO,console
Wait until you see in console the message "Syslog TCP Source starting"
Restart rsyslog log agent
$ sudo service rsyslog restart
Send a test message using the logger command to test it
$ logger -t test 'Testing Flume with rsyslog!'
Final result
In the command tab that you will see something like this
15/11/09 14:30:46 INFO source.SyslogTcpSource: Syslog TCP Source starting...
15/11/09 14:31:08 INFO sink.LoggerSink: Event: { headers:{timestamp=1447108264000, Severity=6, host=quickstart, Facility=0, priority=6} body: 6B 65 72 6E 65 6C 3A 20 69 6D 6B 6C 6F 67 20 35 kernel: imklog 5 }
15/11/09 14:31:08 INFO sink.LoggerSink: Event: { headers:{timestamp=1447108264000, Severity=6, host=quickstart, Facility=5, priority=46} body: 72 73 79 73 6C 6F 67 64 3A 20 5B 6F 72 69 67 69 rsyslogd: [origi }
15/11/09 14:31:13 INFO sink.LoggerSink: Event: { headers:{timestamp=1447108273000, Severity=5, host=quickstart, Facility=1, priority=13} body: 74 65 73 74 3A 20 54 65 73 74 69 6E 67 20 46 6C test: Testing Fl }
Setting up Flume Syslog source - Step 2
Objective
Send logging message, intercept and store to HDFS logging message by Flume agent.
Instruction
Stop previous Flume agent by pressing Ctrl + C
Create a directory from a user directory and change the current directory
$ mkdir ~cloudera/monitoring-with-big-data-technologies-part-two/step-2
$ cd ~cloudera/monitoring-with-big-data-technologies-part-two/step-2
Copy previous file to step-2 folder
$ cp ../step-1/* ./
Replace in flume.properties the string syslogAgentStep1 with syslogAgentStep2
$ sed -i 's/syslogAgentStep1/syslogAgentStep2/g' flume.properties
Edit flume.properties substitute sink part with
delete from line 39 to line 42
# Describe the sink
syslogAgentStep1.sinks.s1.type = logger
# Maximum number of bytes of the Event body to log
syslogAgentStep1.sinks.s1.maxBytesToLog = 500
insert @ line 39
# Describe the sink
# Save to Hadoop HDFS
syslogAgentStep2.sinks.s1.type = hdfs
# HDFS directory path
syslogAgentStep2.sinks.s1.hdfs.path = /monitoring/syslog/%Y-%m-%d
# Name prefixed to files created by Flume in hdfs directory
syslogAgentStep2.sinks.s1.hdfs.filePrefix = syslog_%Y_%m_%d_
# Prefix that is used for temporal files that flume actively writes into
# In order to avoid error on reading temporary data from
# Hive, file actively writes will be prefixed by '_' character.
# Files start with '_' are hidden for hadoop so in this way
# Hive doesn't query these temporary files.
syslogAgentStep2.sinks.s1.hdfs.inUsePrefix = _
# File size to trigger roll, in bytes
syslogAgentStep2.sinks.s1.hdfs.rollSize = 131072
# Timeout after which inactive files get closed
syslogAgentStep2.sinks.s1.hdfs.idleTimeout = 60
# Text file format
syslogAgentStep2.sinks.s1.hdfs.fileType = DataStream
# Number of threads per HDFS sink
syslogAgentStep2.sinks.s1.hdfs.threadsPoolSize = 10
# Store header and text in a plain format
syslogAgentStep2.sinks.s1.serializer = header_and_text
Create log4j.properties file, used in order to configure log4j
# Flume agent configuration.
# Copyright (C) 2015 Fabio Pirola <fabio@pirola.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
flume.root.logger=INFO,logfile,console
flume.log.dir=/home/cloudera/monitoring-with-big-data-technologies-part-two/log
flume.log.file=flume-step2.log
#
log4j.logger.org.apache.flume.lifecycle = INFO
log4j.logger.org.jboss = WARN
log4j.logger.org.mortbay = INFO
log4j.logger.org.apache.avro.ipc.NettyTransceiver = WARN
log4j.logger.org.apache.hadoop = INFO
# Define the root logger to the system property "flume.root.logger".
log4j.rootLogger=${flume.root.logger}
# Stock log4j rolling file appender
# Default log rotation configuration
log4j.appender.logfile=org.apache.log4j.RollingFileAppender
log4j.appender.logfile.MaxFileSize=100MB
log4j.appender.logfile.MaxBackupIndex=10
log4j.appender.logfile.File=${flume.log.dir}/${flume.log.file}
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d{ISO8601} | %-5p | [%t] (%C.%M:%L) %x - %m%n
# console
# Add "console" to flume.root.logger above if you want to use this
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Threshold=WARN
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{ISO8601} (%t) [%p - %l] %m%n
#
log4j.logger.com.cloudera.cdk.morphline=TRACE
Create directory into HDFS where syslog message will be saved
$ hadoop fs -mkdir -p /monitoring/syslog/
Start Flume agent with the following command
$ flume-ng agent --conf "./" --conf-file "./flume.properties" -n syslogAgentStep2 -Dlog4j.configuration=file:./log4j.properties
Monitoring Flume agent log with the following command from another tab
$ tail -F ~cloudera/monitoring-with-big-data-technologies-part-two/log/flume-step2.log
Send from another tab a test message using the logger command to test it
$ logger -t test 'Testing Flume with rsyslog!'
Final result
You will find stored on hdfs in a plain format. In order to display the content of all files you can use the following command:
$ hadoop fs -cat /monitoring/syslog/`date +%Y-%m-%d`/*
{timestamp=1447108650000, Severity=5, host=quickstart, Facility=1, priority=13} test: Testing Flume with rsyslog!
Setting up Flume Syslog source - Step 3
Objective
Send logging message, intercept and store to HDFS logging message by Flume agent in avro format. Query the informatioon stored by Hive.
Instruction
Stop previous Flume agent by pressing Ctrl + C
Create a directory from a user directory and change the current directory
$ mkdir ~cloudera/monitoring-with-big-data-technologies-part-two/step-3
$ cd ~cloudera/monitoring-with-big-data-technologies-part-two/step-3
Copy previous file to step-3 folder
$ cp ../step-2/* ./
Replace in flume.properties and log4j.properties with the new flume agent
$ sed -i 's/syslogAgentStep2/syslogAgentStep3/g' flume.properties
$ sed -i 's/flume-step2.log/flume-step3.log/g' log4j.properties
Create file SyslogEvent.avsc that describe avro schema
{"namespace": "org.pirola",
"type": "record",
"name": "SyslogEvent",
"fields": [
{"name": "facility", "type": ["null", "int"]},
{"name": "severity", "type": ["null", "int"]},
{"name": "priority", "type": ["null", "int"]},
{"name": "timestamp", "type": "long"},
{"name": "host", "type": "string"},
{"name": "message", "type": "string"}
]
}
Remove all directory and files from /monitoring/syslog/
$ hadoop fs -rm -R /monitoring/syslog/*
Create a logical abstractions on top of persistence layer
kite-dataset create dataset:hdfs:/monitoring/syslog/ --schema SyslogEvent.avsc
Create morphline.conf file that describe morphline workflow
morphlines : [
{
# Name used to identify a morphline. E.g. used if there are multiple
# morphlines in a morphline config file
id : syslogToAvro
# Import all morphline commands in these java packages and their
# subpackages. Other commands that may be present on the classpath are
# not visible to this morphline.
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
commands : [
{
# Parse input attachment and emit a record for each input line
readLine {
charset : UTF-8
}
}
{ logInfo { format : "Received event: {}", args : ["@{}"] } }
{
toAvro {
schemaFile: /home/cloudera/monitoring-with-big-data-technologies-part-two/step-3/SyslogEvent.avsc
mappings: {"facility":"Facility", "severity" : "Severity"}
}
}
{
writeAvroToByteArray: {
format: containerlessBinary
}
}
]
}
]
Modify flume configuration, flume.properties file:
Add @ line 29 interceptors in order to use morphline for parsing and transformation
# List of interceptors
syslogAgentStep3.sources.r1.interceptors = attach-schema morphline
# Add the schema for our record sink
syslogAgentStep3.sources.r1.interceptors.attach-schema.type = static
syslogAgentStep3.sources.r1.interceptors.attach-schema.key = flume.avro.schema.url
syslogAgentStep3.sources.r1.interceptors.attach-schema.value = file:/home/cloudera/monitoring-with-big-data-technologies-part-two/step-3/SyslogEvent.avsc
# Morphline interceptor configuration
syslogAgentStep3.sources.r1.interceptors.morphline.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
syslogAgentStep3.sources.r1.interceptors.morphline.morphlineFile = /home/cloudera/monitoring-with-big-data-technologies-part-two/step-3/morphline.conf
syslogAgentStep3.sources.r1.interceptors.morphline.morphlineId = syslogToAvro
Change serializer @ line 70-71
# Store header and text in avro format
syslogAgentStep3.sinks.s1.serializer =org.apache.flume.sink.hdfs.AvroEventSerializer$Builder
Start Flume agent with the following command
$ flume-ng agent --conf "./" --conf-file "./flume.properties" -n syslogAgentStep3 -Dlog4j.configuration=file:./log4j.properties
Monitoring Flume agent log with the following command from another tab
$ tail -F ~cloudera/monitoring-with-big-data-technologies-part-two/log/flume-step3.log
Send from another tab a test message using the logger command to test it
$ logger -t test 'Testing Flume with rsyslog!'
Open hive shell
$ beeline
... Hive beeline will be open ...
beeline> !connect jdbc:hive2://localhost:10000 cloudera cloudera org.apache.hive.jdbc.HiveDriver
Create hive table in order to query the information
CREATE EXTERNAL TABLE syslog_event
COMMENT "Syslog event"
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
'avro.schema.url' = 'file:///home/cloudera/monitoring-with-big-data-technologies-part-two/step-3/SyslogEvent.avsc'
)
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/monitoring/syslog/';
Final result
You will find stored on hdfs in a plain format. In order to display the content of all files you can use the following commands:
0: jdbc:hive2://localhost:10000> set hive.cli.print.header = true;
No rows affected (0.027 seconds)
0: jdbc:hive2://localhost:10000> set mapred.input.dir.recursive = true;
No rows affected (0.022 seconds)
0: jdbc:hive2://localhost:10000> set hive.mapred.supports.subdirectories = true;
No rows affected (0.03 seconds)
0: jdbc:hive2://localhost:10000> select * from syslog_event;
+------------------------+------------------------+------------------------+-------------------------+--------------------+--------------------------------------------------+--+
| syslog_event.facility | syslog_event.severity | syslog_event.priority | syslog_event.timestamp | syslog_event.host | syslog_event.message |
+------------------------+------------------------+------------------------+-------------------------+--------------------+--------------------------------------------------+--+
| 9 | 6 | 78 | 1447108801000 | quickstart | CROND[8590]: (root) CMD (/usr/lib64/sa/sa1 1 1) |
| 1 | 5 | 13 | 1447109257000 | quickstart | test: Testing Flume with rsyslog! |
+------------------------+------------------------+------------------------+-------------------------+--------------------+--------------------------------------------------+--+
2 rows selected (4.04 seconds)
0: jdbc:hive2://localhost:10000>
GOTO
Previous post: Set up Virtual Machine
Next post: Create a monitoring system for a remote linux machine and show the informations with a dashboard
Github repository
Comments
No comments yet.Add Comment