4.3. Logs User Guide

4.3.1. Prerequisites

  • Require 3 VMs to setup K8s

  • $ sudo yum install ansible

  • $ pip install openshift pyyaml kubernetes (required for ansible K8s module)

  • Update IPs in all these files (if changed)

    Path

    Description

    ansible-server/group_vars/all.yml

    IP of K8s apiserver and VM hostname

    ansible-server/hosts

    IP of VMs to install

    ansible-server/roles/logging/files/persistentVolume.yaml

    IP of NFS-Server

    ansible-server/roles/logging/files/elastalert/ealert-rule-cm.yaml

    IP of alert-receiver

4.3.2. Architecture

../../_images/setup.png

4.3.3. Installation - Clientside

4.3.3.1. Nodes

  • Node1 = 10.10.120.21

  • Node4 = 10.10.120.24

4.3.3.2. How installation is done?

  • TD-agent installation

    $ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh

  • Copy the TD-agent config file in Node1

    $ cp tdagent-client-config/node1.conf /etc/td-agent/td-agent.conf

  • Copy the TD-agent config file in Node4

    $ cp tdagent-client-config/node4.conf /etc/td-agent/td-agent.conf

  • Restart the service

    $ sudo service td-agent restart

4.3.4. Installation - Serverside

4.3.4.1. Nodes

Inside Jumphost - POD12
  • VM1 = 10.10.120.211

  • VM2 = 10.10.120.203

  • VM3 = 10.10.120.204

4.3.4.2. How installation is done?

Using Ansible:
  • K8s
    • Elasticsearch: 1 Master & 1 Data node at each VM

    • Kibana: 1 Replicas

    • Nginx: 2 Replicas

    • Fluentd: 2 Replicas

    • Elastalert: 1 Replica (get duplicate alert, if increase replica)

  • NFS Server: at each VM to store elasticsearch data at following path
    • /srv/nfs/master

    • /srv/nfs/data

4.3.4.3. How to setup?

  • To setup K8s cluster and EFK: Run the ansible-playbook ansible/playbooks/setup.yaml

  • To clean everything: Run the ansible-playbook ansible/playbooks/clean.yaml

4.3.4.4. Do we have HA?

Yes

4.3.5. Configuration

4.3.5.1. K8s

4.3.5.1.1. Path of all yamls (Serverside)

ansible-server/roles/logging/files/

4.3.5.1.2. K8s namespace

logging

4.3.5.1.3. K8s Service details

$ kubectl get svc -n logging

4.3.5.2. Elasticsearch Configuration

4.3.5.2.1. Elasticsearch Setup Structure

../../_images/elasticsearch.png

4.3.5.2.2. Elasticsearch service details

Service Name: logging-es-http
Service Port: 9200
Service Type: ClusterIP

4.3.5.2.3. How to get elasticsearch default username & password?

  • User1 (custom user):
    Username: elasticsearch
    Password: password123
  • User2 (by default created by Elastic Operator):
    Username: elastic
    To get default password:
    $ PASSWORD=$(kubectl get secret -n logging logging-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
    $ echo $PASSWORD

4.3.5.2.4. How to increase replica of any index?

$ curl -k -u “elasticsearch:password123” -H ‘Content-Type: application/json’ -XPUT “https://10.10.120.211:9200/indexname*/_settings” -d ‘
{
“index” : {
“number_of_replicas” : “2” }
}’

4.3.5.2.5. Index Life

30 Days

4.3.5.3. Kibana Configuration

4.3.5.3.1. Kibana Service details

Service Name: logging-kb-http
Service Port: 5601
Service Type: ClusterIP

4.3.5.4. Nginx Configuration

4.3.5.4.1. IP

The IP address with https. Ex: “10.10.120.211:32000”

4.3.5.4.2. Nginx Setup Structure

../../_images/nginx.png

4.3.5.4.3. Ngnix Service details

Service Name: nginx
Service Port: 32000
Service Type: NodePort

4.3.5.4.4. Why NGINX is used?

Securing ELK using Nginx

4.3.5.4.5. Nginx Configuration

Path: ansible-server/roles/logging/files/nginx/nginx-conf-cm.yaml

4.3.5.5. Fluentd Configuration - Clientside (Td-agent)

4.3.5.5.1. Fluentd Setup Structure

../../_images/fluentd-cs.png

4.3.5.5.2. Log collection paths

  • /tmp/result*/*.log

  • /tmp/result*/*.dat

  • /tmp/result*/*.csv

  • /tmp/result*/stc-liveresults.dat.*

  • /var/log/userspace*.log

  • /var/log/sriovdp/*.log.*

  • /var/log/pods/**/*.log

4.3.5.5.3. Logs sent to

Another fluentd instance of K8s cluster (K8s Master: 10.10.120.211) at Jumphost.

4.3.5.5.4. Td-agent logs

Path of td-agent logs: /var/log/td-agent/td-agent.log

4.3.5.5.5. Td-agent configuration

Path of conf file: /etc/td-agent/td-agent.conf
If any changes is made in td-agent.conf then restart the td-agent service, $ sudo service td-agent restart

4.3.5.5.6. Config Description

  • Get the logs from collection path

  • Convert to this format
    {
    msg: “log line”
    log_path: “/file/path”
    file: “file.name”
    host: “pod12-node4”
    }
  • Sends it to fluentd

4.3.5.6. Fluentd Configuration - Serverside

4.3.5.6.1. Fluentd Setup Structure

../../_images/fluentd-ss.png

4.3.5.6.2. Fluentd Service details

Service Name: fluentd
Service Port: 32224
Service Type: NodePort

4.3.5.6.3. Logs sent to

Elasticsearch service (Example: logging-es-http at port 9200)

4.3.5.6.4. Config Description

  • Step 1
    • Get the logs from Node1 & Node4

  • Step 2

    log_path

    add tag (for routing)

    /tmp/result.*/.*errors.dat

    errordat.log

    /tmp/result.*/.*counts.dat

    countdat.log

    /tmp/result.*/stc-liveresults.dat.tx

    stcdattx.log

    /tmp/result.*/stc-liveresults.dat.rx

    stcdatrx.log

    /tmp/result.*/.*Statistics.csv

    ixia.log

    /tmp/result.*/vsperf-overall*

    vsperf.log

    /tmp/result.*/vswitchd*

    vswitchd.log

    /var/log/userspace*

    userspace.log

    /var/log/sriovdp*

    sriovdp.log

    /var/log/pods*

    pods.log

  • Step 3
    Then parse each type using tags.
    • error.conf: to find any error

    • time-series.conf: to parse time series data

    • time-analysis.conf: to calculate time analyasis

  • Step 4

    host

    add tag (for routing)

    pod12-node4

    node4

    worker

    node1

  • Step 5

    Tag

    elasticsearch

    node4

    index “node4*”

    node1

    index “node1*”

4.3.6. Elastalert

4.3.6.1. Send alert if

  • Blacklist
    • “Failed to run test”

    • “Failed to execute in ‘30’ seconds”

    • “(‘Result’, ‘Failed’)”

    • “could not open socket: connection refused”

    • “Input/output error”

    • “dpdk|ERR|EAL: Error - exiting with code: 1”

    • “Failed to execute in ‘30’ seconds”

    • “dpdk|ERR|EAL: Driver cannot attach the device”

    • “dpdk|EMER|Cannot create lock on”

    • “dpdk|ERR|VHOST_CONFIG: * device not found”

  • Time
    • vswitch_duration > 3 sec

4.3.6.2. How to configure alert?

  • Add your rule in ansible/roles/logging/files/elastalert/ealert-rule-cm.yaml (Elastalert Rule Config)
    name: anything
    type: <check-above-link> #The RuleType to use
    index: node4* #index name
    realert:
    minutes: 0 #to get alert for all cases after each interval
    alert: post #To send alert as HTTP POST
    http_post_url: # Provide URL
  • Mount this file to elastalert pod in ansible/roles/logging/files/elastalert/elastalert.yaml.

4.3.6.3. Alert Format

{“type”: “pattern-match”, “label”: “failed”, “index”: “node4-20200815”, “log”: “error-log-line”, “log-path”: “/tmp/result/file.log”, “reson”: “error-message” }

4.3.7. Data Management

4.3.7.1. Elasticsearch

4.3.7.1.1. Q&As

Where data is stored now? Data is stored in NFS server with 1 replica of each index (default). Path of data are following:

  • /srv/nfs/data (VM1)

  • /srv/nfs/data (VM2)

  • /srv/nfs/data (VM3)

  • /srv/nfs/master (VM1)

  • /srv/nfs/master (VM2)

  • /srv/nfs/master (VM3)

If user wants to change from NFS to local storage, can he do it? Yes, user can do this, need to configure persistent volume. (ansible-server/roles/logging/files/persistentVolume.yaml)

Do we have backup of data? Yes. 1 replica of each index

When K8s restart, the data is still accessible? Yes (If data is not deleted from /srv/nfs/data)

4.3.8. Troubleshooting

4.3.8.1. If no logs receiving in Elasticsearch

  • Check IP & port of server-fluentd in client config.

  • Check client-fluentd logs, $ sudo tail -f /var/log/td-agent/td-agent.log

  • Check server-fluentd logs, $ sudo kubectl logs -n logging <fluentd-pod-name>

4.3.8.2. If no notification received

  • Search your “log” in Elasticsearch.

  • Check config of elastalert

  • Check IP of alert-receiver