Homepage/ blog/ 2015/ 11/ Monitoring with big-data technologies - Part Two

In this post ...
Retrieve and query the remote linux machine message logging

GOTO

Previous post: Set up Virtual Machine
Next post: Create a monitoring system for a remote linux machine and show the informations with a dashboard
Github repository


Note:

$ --> execute the command as cloudera user
# --> execute the command as root user
Remote server = Big Data server = localhost . In this example the remote machine to monitoring is the Big Data server in order to simplified the architecture.


Setting up Flume Syslog source - Step 1

Objective

Send logging message, intercept and print the messages by Flume agent. Monitoring with big data technologies briefing - Part One - Step One

Remote server = Big Data server = localhost . In this example the remote machine to monitoring is the Big Data server in order to simplified the architecture.

Instruction

  1. Start Cloudera quickstart Virtual Machine

  2. setup rsyslog in order to redirect all events to the receiver host we will configure rsyslog for the sender machine. Add to /etc/rsyslog.conf the following lines at end of the files

    In order to modify root file you can execute the following command:
    $ sudo gedit /etc/rsyslog.conf
    The default root password for Cloudera VM is "cloudera".

     

    # Max memory space $ActionQueueMaxDiskSpace 2g # Save in-memory data if flume shuts down $ActionQueueSaveOnShutdown on # Declare queue a in-memory linked list. There is some processing # overhead, but ensures that memory is only allocated in cases where # it is needed $ActionQueueType LinkedList # Infinite retries on insert failure $ActionResumeRetryCount -1 # Foward all logging message to the Big Data server *.* @@localhost:5140


  3. Create a directory from a user directory and change the current directory

     

    $ mkdir --parent ~cloudera/monitoring-with-big-data-technologies-part-two/step-1 $ cd ~cloudera/monitoring-with-big-data-technologies-part-two/step-1


  4. Create flume-env.sh to give more memory and pre-allocate for Flume agent

     

    # Flume environment configuration. # Copyright (C) 2015 Fabio Pirola <fabio@pirola.org> # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>.
    # Give Flume more memory and pre-allocate export JAVA_OPTS="-Xms50m -Xmx512m"


  5. Create flume.properties that describe the Flume agent that receive syslog event

     

    # Flume agent configuration. # Copyright (C) 2015 Fabio Pirola <fabio@pirola.org> # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>.
    # Name the components on this agent syslogAgentStep1.sources = r1 syslogAgentStep1.sinks = s1 syslogAgentStep1.channels = c1
    # Describe/configure the source # I'll be using TCP based Syslog source syslogAgentStep1.sources.r1.type = syslogtcp # Host name or IP address to bind to syslogAgentStep1.sources.r1.host = localhost # Port # to bind to syslogAgentStep1.sources.r1.port = 5140
    # Describe/configure the channel # Use a channel which buffers events in memory syslogAgentStep1.channels.c1.type = memory # The maximum number of events stored in the channel syslogAgentStep1.channels.c1.capacity = 1000 # The maximum number of events the channel will take from a source # or give to a sink per transaction syslogAgentStep1.channels.c1.transactionCapacity = 100
    # Describe the sink syslogAgentStep1.sinks.s1.type = logger # Maximum number of bytes of the Event body to log syslogAgentStep1.sinks.s1.maxBytesToLog = 500
    # Binding source/channel/sink syslogAgentStep1.sources.r1.channels = c1 syslogAgentStep1.sinks.s1.channel = c1


  6. Start Flume agent with the following command

     

    $ flume-ng agent --conf "./" --conf-file "./flume.properties" -n syslogAgentStep1 -Dflume.root.logger=INFO,console


  7. Wait until you see in console the message "Syslog TCP Source starting"

  8. Restart rsyslog log agent

     

    $ sudo service rsyslog restart


  9. Send a test message using the logger command to test it

     

    $ logger -t test 'Testing Flume with rsyslog!'


Final result

In the command tab that you will see something like this

 

15/11/09 14:30:46 INFO source.SyslogTcpSource: Syslog TCP Source starting... 15/11/09 14:31:08 INFO sink.LoggerSink: Event: { headers:{timestamp=1447108264000, Severity=6, host=quickstart, Facility=0, priority=6} body: 6B 65 72 6E 65 6C 3A 20 69 6D 6B 6C 6F 67 20 35 kernel: imklog 5 } 15/11/09 14:31:08 INFO sink.LoggerSink: Event: { headers:{timestamp=1447108264000, Severity=6, host=quickstart, Facility=5, priority=46} body: 72 73 79 73 6C 6F 67 64 3A 20 5B 6F 72 69 67 69 rsyslogd: [origi } 15/11/09 14:31:13 INFO sink.LoggerSink: Event: { headers:{timestamp=1447108273000, Severity=5, host=quickstart, Facility=1, priority=13} body: 74 65 73 74 3A 20 54 65 73 74 69 6E 67 20 46 6C test: Testing Fl }



Setting up Flume Syslog source - Step 2

Objective

Send logging message, intercept and store to HDFS logging message by Flume agent. Monitoring with big data technologies briefing - Part One - Step Two

Instruction

  1. Stop previous Flume agent by pressing Ctrl + C

  2. Create a directory from a user directory and change the current directory

     

    $ mkdir ~cloudera/monitoring-with-big-data-technologies-part-two/step-2 $ cd ~cloudera/monitoring-with-big-data-technologies-part-two/step-2


  3. Copy previous file to step-2 folder

     

    $ cp ../step-1/* ./


  4. Replace in flume.properties the string syslogAgentStep1 with syslogAgentStep2

     

    $ sed -i 's/syslogAgentStep1/syslogAgentStep2/g' flume.properties


  5. Edit flume.properties substitute sink part with

    • delete from line 39 to line 42

       

      # Describe the sink syslogAgentStep1.sinks.s1.type = logger # Maximum number of bytes of the Event body to log syslogAgentStep1.sinks.s1.maxBytesToLog = 500


    • insert @ line 39

       

      # Describe the sink # Save to Hadoop HDFS syslogAgentStep2.sinks.s1.type = hdfs # HDFS directory path syslogAgentStep2.sinks.s1.hdfs.path = /monitoring/syslog/%Y-%m-%d # Name prefixed to files created by Flume in hdfs directory syslogAgentStep2.sinks.s1.hdfs.filePrefix = syslog_%Y_%m_%d_ # Prefix that is used for temporal files that flume actively writes into # In order to avoid error on reading temporary data from # Hive, file actively writes will be prefixed by '_' character. # Files start with '_' are hidden for hadoop so in this way # Hive doesn't query these temporary files. syslogAgentStep2.sinks.s1.hdfs.inUsePrefix = _ # File size to trigger roll, in bytes syslogAgentStep2.sinks.s1.hdfs.rollSize = 131072 # Timeout after which inactive files get closed syslogAgentStep2.sinks.s1.hdfs.idleTimeout = 60 # Text file format syslogAgentStep2.sinks.s1.hdfs.fileType = DataStream # Number of threads per HDFS sink syslogAgentStep2.sinks.s1.hdfs.threadsPoolSize = 10 # Store header and text in a plain format syslogAgentStep2.sinks.s1.serializer = header_and_text


  6. Create log4j.properties file, used in order to configure log4j

     

    # Flume agent configuration. # Copyright (C) 2015 Fabio Pirola <fabio@pirola.org> # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>.
    flume.root.logger=INFO,logfile,console flume.log.dir=/home/cloudera/monitoring-with-big-data-technologies-part-two/log flume.log.file=flume-step2.log # log4j.logger.org.apache.flume.lifecycle = INFO log4j.logger.org.jboss = WARN log4j.logger.org.mortbay = INFO log4j.logger.org.apache.avro.ipc.NettyTransceiver = WARN log4j.logger.org.apache.hadoop = INFO
    # Define the root logger to the system property "flume.root.logger". log4j.rootLogger=${flume.root.logger}
    # Stock log4j rolling file appender # Default log rotation configuration log4j.appender.logfile=org.apache.log4j.RollingFileAppender log4j.appender.logfile.MaxFileSize=100MB log4j.appender.logfile.MaxBackupIndex=10 log4j.appender.logfile.File=${flume.log.dir}/${flume.log.file} log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d{ISO8601} | %-5p | [%t] (%C.%M:%L) %x - %m%n
    # console # Add "console" to flume.root.logger above if you want to use this log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.Threshold=WARN log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{ISO8601} (%t) [%p - %l] %m%n # log4j.logger.com.cloudera.cdk.morphline=TRACE


  7. Create directory into HDFS where syslog message will be saved

     

    $ hadoop fs -mkdir -p /monitoring/syslog/


  8. Start Flume agent with the following command

     

    $ flume-ng agent --conf "./" --conf-file "./flume.properties" -n syslogAgentStep2 -Dlog4j.configuration=file:./log4j.properties


  9. Monitoring Flume agent log with the following command from another tab

     

    $ tail -F ~cloudera/monitoring-with-big-data-technologies-part-two/log/flume-step2.log


  10. Send from another tab a test message using the logger command to test it

     

    $ logger -t test 'Testing Flume with rsyslog!'


Final result

You will find stored on hdfs in a plain format. In order to display the content of all files you can use the following command:

 

$ hadoop fs -cat /monitoring/syslog/`date +%Y-%m-%d`/* {timestamp=1447108650000, Severity=5, host=quickstart, Facility=1, priority=13} test: Testing Flume with rsyslog!



Setting up Flume Syslog source - Step 3

Objective

Send logging message, intercept and store to HDFS logging message by Flume agent in avro format. Query the informatioon stored by Hive.

Monitoring with big data technologies briefing - Part One - Step Three

Instruction

  1. Stop previous Flume agent by pressing Ctrl + C

  2. Create a directory from a user directory and change the current directory

     

    $ mkdir ~cloudera/monitoring-with-big-data-technologies-part-two/step-3 $ cd ~cloudera/monitoring-with-big-data-technologies-part-two/step-3


  3. Copy previous file to step-3 folder

     

    $ cp ../step-2/* ./


  4. Replace in flume.properties and log4j.properties with the new flume agent

     

    $ sed -i 's/syslogAgentStep2/syslogAgentStep3/g' flume.properties $ sed -i 's/flume-step2.log/flume-step3.log/g' log4j.properties


  5. Create file SyslogEvent.avsc that describe avro schema

     

    {"namespace": "org.pirola", "type": "record", "name": "SyslogEvent", "fields": [ {"name": "facility", "type": ["null", "int"]}, {"name": "severity", "type": ["null", "int"]}, {"name": "priority", "type": ["null", "int"]}, {"name": "timestamp", "type": "long"}, {"name": "host", "type": "string"}, {"name": "message", "type": "string"} ] }


  6. Remove all directory and files from /monitoring/syslog/

     

    $ hadoop fs -rm -R /monitoring/syslog/*


  7. Create a logical abstractions on top of persistence layer

     

    kite-dataset create dataset:hdfs:/monitoring/syslog/ --schema SyslogEvent.avsc


  8. Create morphline.conf file that describe morphline workflow

     

    morphlines : [ { # Name used to identify a morphline. E.g. used if there are multiple # morphlines in a morphline config file id : syslogToAvro
    # Import all morphline commands in these java packages and their # subpackages. Other commands that may be present on the classpath are # not visible to this morphline. importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
    commands : [ { # Parse input attachment and emit a record for each input line readLine { charset : UTF-8 } } { logInfo { format : "Received event: {}", args : ["@{}"] } } { toAvro { schemaFile: /home/cloudera/monitoring-with-big-data-technologies-part-two/step-3/SyslogEvent.avsc mappings: {"facility":"Facility", "severity" : "Severity"} } } { writeAvroToByteArray: { format: containerlessBinary } } ] } ]


  9. Modify flume configuration, flume.properties file:

    • Add @ line 29 interceptors in order to use morphline for parsing and transformation

       

      # List of interceptors syslogAgentStep3.sources.r1.interceptors = attach-schema morphline # Add the schema for our record sink syslogAgentStep3.sources.r1.interceptors.attach-schema.type = static syslogAgentStep3.sources.r1.interceptors.attach-schema.key = flume.avro.schema.url syslogAgentStep3.sources.r1.interceptors.attach-schema.value = file:/home/cloudera/monitoring-with-big-data-technologies-part-two/step-3/SyslogEvent.avsc # Morphline interceptor configuration syslogAgentStep3.sources.r1.interceptors.morphline.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder syslogAgentStep3.sources.r1.interceptors.morphline.morphlineFile = /home/cloudera/monitoring-with-big-data-technologies-part-two/step-3/morphline.conf syslogAgentStep3.sources.r1.interceptors.morphline.morphlineId = syslogToAvro


    • Change serializer @ line 70-71

       

      # Store header and text in avro format syslogAgentStep3.sinks.s1.serializer =org.apache.flume.sink.hdfs.AvroEventSerializer$Builder


  10. Start Flume agent with the following command

     

    $ flume-ng agent --conf "./" --conf-file "./flume.properties" -n syslogAgentStep3 -Dlog4j.configuration=file:./log4j.properties


  11. Monitoring Flume agent log with the following command from another tab

     

    $ tail -F ~cloudera/monitoring-with-big-data-technologies-part-two/log/flume-step3.log


  12. Send from another tab a test message using the logger command to test it

     

    $ logger -t test 'Testing Flume with rsyslog!'


  13. Open hive shell

     

    $ beeline ... Hive beeline will be open ... beeline> !connect jdbc:hive2://localhost:10000 cloudera cloudera org.apache.hive.jdbc.HiveDriver


  14. Create hive table in order to query the information

     

    CREATE EXTERNAL TABLE syslog_event COMMENT "Syslog event" ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'avro.schema.url' = 'file:///home/cloudera/monitoring-with-big-data-technologies-part-two/step-3/SyslogEvent.avsc' ) STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/monitoring/syslog/';


Final result

You will find stored on hdfs in a plain format. In order to display the content of all files you can use the following commands:

 

0: jdbc:hive2://localhost:10000> set hive.cli.print.header = true; No rows affected (0.027 seconds) 0: jdbc:hive2://localhost:10000> set mapred.input.dir.recursive = true; No rows affected (0.022 seconds) 0: jdbc:hive2://localhost:10000> set hive.mapred.supports.subdirectories = true; No rows affected (0.03 seconds) 0: jdbc:hive2://localhost:10000> select * from syslog_event; +------------------------+------------------------+------------------------+-------------------------+--------------------+--------------------------------------------------+--+ | syslog_event.facility | syslog_event.severity | syslog_event.priority | syslog_event.timestamp | syslog_event.host | syslog_event.message | +------------------------+------------------------+------------------------+-------------------------+--------------------+--------------------------------------------------+--+ | 9 | 6 | 78 | 1447108801000 | quickstart | CROND[8590]: (root) CMD (/usr/lib64/sa/sa1 1 1) | | 1 | 5 | 13 | 1447109257000 | quickstart | test: Testing Flume with rsyslog! | +------------------------+------------------------+------------------------+-------------------------+--------------------+--------------------------------------------------+--+ 2 rows selected (4.04 seconds) 0: jdbc:hive2://localhost:10000>



GOTO

Previous post: Set up Virtual Machine
Next post: Create a monitoring system for a remote linux machine and show the informations with a dashboard
Github repository


Happy coding ;)

Comments

No comments yet.

Add Comment

* Required information
What is the day after Friday?
 
Enter answer:
 
I have read and understand the privacy policy. *
 
I have read and agree to the terms and conditions. *
 
 
Powered by Commentics