You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
cyberseb 42edcd3a2e first fully functional version of 3 years ago
.gitignore first fully functional version of 3 years ago
LICENSE initial commit 3 years ago added a line about the speed of the consolidation script 3 years ago first fully functional version of 3 years ago added OS independent home folder resolution 3 years ago

AWS ELB Access Log Parser

This repository includes Python scripts that can consolidate and analyze AWS ELB Access Logs.

What are AWS ELB Access Logs?

AWS (Amazon Web Services) ELB (Elastic Load Balancer) Access Logs include a log line per request processed by the ELB. These log lines include a lot of fields depending on the setup and configuration of the ELB.

Unfortunately, the AWS ELB Access Logs are stored in S3 buckets in a rather verbose folder structure with folders for year, month, and day. On top of that they are *.log.gz files, which saves a lot of money for storing those logs but makes it harder to process them.

What can these Scripts do?

To get to the juice of the data within AWS ELB Access Logs, the process needs to go through the following steps:

1. Download the Log Files

For this step it is recommended to use 3rd-party apps (e.g. Cyberduck -

2. Consolidate and Parse the Logs iterates through all *.log.gz files recursively within the given folder. It reads all lines of each *.log.gz file without having to extract it. Each read line is being parsed for the following fields:

  • TS (timestamp)
  • PROT (protocol)
  • SRCIP (source IP address and port)
  • COMMAND (http request)
  • AGENT (user agent string)

The parsed data is then being written to a single CSV file, which can be analyzed easily in the next step.

The script is fairly fast - I processed 600k lines (gathered from 18k *.log.gz files) in about 30 seconds on my 2014 MBP.

3. Analyze Log Data