Tag: Logstash configuration

  • Data Analysis: Normalizing Logs for a SIEM

    Log normalization transforms heterogeneous, unstructured telemetry from diverse network appliances into a standardized, unified data schema. This structural alignment enables Security Information and Event Management (SIEM) engines to execute cross-platform correlation, perform high-speed time-series indexing, and trigger automated incident response playbooks without data taxonomy conflicts.

    The Log Normalization Pipeline

    The normalization process operates as a multi-stage data pipeline consisting of ingestion, parsing, schema mapping, and enrichment. When disparate systems—such as perimeter firewalls, identity providers, and cloud control planes—transmit telemetry, they utilize proprietary, vendor-specific logging formats. An intrusion detection system might label an attacking IP address as src_ip, while a web application firewall logs the exact same metric as ClientAddress. Without normalization, security analysts must write complex, disjointed queries with multiple OR operators to search for a single indicator across these distinct platforms.

    To resolve this fragmentation, the SIEM ingestion node (such as Logstash, Vector, or a Splunk Heavy Forwarder) intercepts the raw data stream. The parsing engine applies Regular Expressions (Regex) or Grok patterns to dissect the raw unstructured text strings, extracting discrete variables into key-value pairs.

    Following extraction, the engine executes schema mapping. Security data architects configure the pipeline to map the extracted vendor-specific fields to a universal taxonomy, such as the Elastic Common Schema (ECS) or the Splunk Common Information Model (CIM). In the previous example, the ingestion node translates both src_ip and ClientAddress uniformly into source.ip. This standardization allows threat hunters to write a single, highly efficient query that interrogates the entire enterprise data lake simultaneously. Mastering data architecture and schema alignment forms a critical analytical foundation evaluated in the Ultimate Guide to CompTIA SecurityX (CAS-005).

    Once mapped, the pipeline enriches the normalized payload. The system queries external databases at wire-speed to append contextual metadata. For example, it translates a normalized destination.ip into an Autonomous System Number (ASN) or flags it against dynamic Threat Intelligence Platform (TIP) feeds before committing the final JSON object to the storage index.

    Parsing and Mapping with Grok

    Security engineers define normalization logic using programmatic configuration files. The following Logstash filter configuration utilizes a Grok pattern to parse a raw Cisco ASA syslog message, extract the relevant networking parameters, and map them precisely to the Elastic Common Schema (ECS).

    logstash

    filter {
      # Intercept the raw syslog message and extract discrete fields using Grok
      grok {
        match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %%{CISCOTAG:cisco_tag}: %{GREEDYDATA:cisco_message}" }
      }
      
      # Execute schema mapping: Translate proprietary Cisco fields to ECS standard fields
      mutate {
        rename => {
          "hostname" => "observer.hostname"
          "cisco_tag" => "event.code"
        }
        add_field => { 
          "event.dataset" => "cisco.asa" 
          "event.module" => "firewall"
        }
      }
      
      # Standardize the timestamp format to ISO8601 for accurate time-series indexing
      date {
        match => [ "timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
        target => "@timestamp"
        remove_field => [ "timestamp" ]
      }
    }
    

    Additional Reading

    https://www.elastic.co/guide/en/ecs/current/ecs-reference.html
    https://docs.splunk.com/Documentation/CIM/latest/User/Overview
    https://csrc.nist.gov/publications/detail/sp/800-92/final