By now we’re all well aware that there is a virtually limitless number of logging and monitoring solutions available on the market. Visit the Amazon Web Services (“AWS”) Marketplace, and you’ll find plenty of options. In fact, it gets really crazy when you start examining security monitoring versus application performance monitoring, often with solutions performing one role better than the other, or even just one of the roles altogether. What’s interesting to me is the lack of common Enterprise logging and monitoring solutions available in the AWS Marketplace. Obviously you can deploy instances to handle implementations of solutions like ArcSight, McAfee, LogRhythm, or NetIQ, but Splunk is the only well-known commercial provider with solutions available in the marketplace.
Now, that’s just the commercial side… what about open source? Let’s cover a few terms first, for those new to centralized logging.
Shipper – a system agent that collects and forwards, or ships, system and application logs to a centralized server.
Collector / Broker – a message broker is a system that collects and queues logs as an intermediary step to indexing the logs centrally for analysis, monitoring, and alerting. Its primary purpose is to ensure you don’t lose messages when or if your indexer falls behind, crashes, or otherwise becomes unavailable to receive logs.
Collector / Indexer – a system used to collect, parse, and store logs for searching, analysis, monitoring, and alerting.
Dashboard / Visualizer – the dashboard is used to aid in log analysis by providing a search interface, and in some solutions alerting.
Open source logging and monitoring solutions abound, and like the well-known solutions missing from the AWS Marketplace, are typically implemented on purpose-built instances within your AWS Virtual Private Cloud (You are using a VPC, right?) So what comprises an open source, centralized logging and monitoring solution?
Log shippers like Nxlog, Logstash, Lumberjack, and Fluentd. Brokers like Redis, RabbitMQ, and ZeroMQ. Indexers like Elasticsearch and… well, Elasticsearch seems to be the industry-standard as far as open source goes, but there are also a lot of folks using centralized syslog-ng, or Rsyslog. Dashboards, such as Graylog2, Kibana (for Elasticsearch visibility, I like ElasticHQ), and security agents like OSSEC complete the architecture.
So, with all of these solutions available, why do I run into so many clients already in AWS, or moving to AWS, that have insufficient logging and monitoring, or worse, no logging and monitoring at all in their Cloud environment? Because Logging and Monitoring is Hard. Don’t get me wrong, it doesn’t require a rocket scientist on staff to get one or more of these commercial or open source solutions deployed. There’s preparation, communication, research, and other steps that have to be taken to properly implement logging and monitoring. I spent over a week researching available solutions, and building out proofs-of-concept in my virtualized lab to determine which solutions met my needs. That is the most critical point one should take away from this article; there is no right or wrong way to implement logging and monitoring in your AWS Cloud. As with all things IT, there is more than one way to accomplish your technical and business objectives. The trick is to find the right way for your organization.
Let’s look at some of the decision criteria that will come into play; this is not an exhaustive list:
- What expertise is available from my current staff – network engineering, development (if so, which languages), information security, incident handling, etc.?
- Do we have experience with a particular commercial solution? A particular open source solution?
- Should I train existing staff, or hire staff with the relevant experience?
- Should I forget about managing this myself altogether and go with a Managed Services Provider?
- Have we defined and documented the metrics we care about, and established a policy and process around ensuring this data is available and utilized?
- Have we defined and documented our business objectives behind logging and monitoring?
- Have we defined and documented regulatory mandates related to logging and monitoring? How do we keep our requirements and this documentation current?
- Have we determined roles and responsibilities involved in supporting the logging and monitoring initiative?
- Have we defined and documented technical requirements for our logging and monitoring solution? How do we architect our solution?
- Have we researched available options, and documented their strengths and weaknesses with regard to operating in our environment or culture?
- How do we facilitate a demonstration, proof-of-concept, or evaluation of the targeted solutionWhat do we log? Where do we store logs?
- What do we log? Where do we store logs?
- How do we alert appropriate personnel a problem has been detected?
After extensive research, and comparison of features and functionality, I decided upon a Hybrid ELK Stack for this case study. The ELK Stack is comprised of Elasticsearch Logstash and Kibana. I also added Graylog2 to support alerting, and OSSEC for file integrity and host intrusion prevention. There are numerous guides on the Interwebs to assist with deploying these solutions, so I will not go into installation and configuration in this post. I may write another article later to cover installation and configuration, but I’ve included links to all of the resources I used to get up and running at the bottom of this post. Note that, although this entire process covered a full week, the bulk of the final deployment was completed in about ~12 hours. I built the final environment on AWS’ Free Tier, but didn’t even complete rolling out the dashboards before the Logstash Shipper/Logstash Collector/Elasticsearch Indexer combination on the central server decimated the t1.micro instance (Ubuntu 12.04) I deployed it on (Java consumed all available memory). Rather than tune the overwhelmed box in an attempt to stabilize it, I took advantage of being in AWS and scaled up to a m1.small instance – problem solved. In total, I spent less than $5 bucks on my, admittedly limited, proof-of-concept.
Figure 1: Kibana 3… Dead Sexy!
Take a look at the components I selected:
- Log Shipper – Logstash on Linux servers, Nxlog on Windows servers. Although Logstash is cross-platform, and is perfectly capable of shipping Windows Event Logs, IIS and MSSQL logs, the author of Nxlog convinced me Why Nxlog is better for Windows.
- Broker – This case study doesn’t incorporate the use of a Broker. I was originally going to include RabbitMQ in the architecture, but version dependencies led me down a path that was in danger of kludging up the whole study. In a production environment, you definitely need to use a broker to provide scalability and resiliency, but I pushed onward without including it.
- Indexer – Elasticsearch. Ridiculously easy decision for me, since Windows servers are in my test environment, and I was interested in testing something other than syslog.
- Dashboard / Visualization – Kibana 3 is dead sexy, and I’m an eye-candy kind of guy. I’d gone into this planning to just use Graylog2, since it is a great visualization tool itself, plus includes alerting capability, but after seeing screenshots of the new and improved Kibana 3.x, I couldn’t help deploying it, too. Regarding alerting, Nagios is often used in concert with Graylog2 for its ability to “roll up” alerts. If you’re interested in configuring email alerting/alarms for your Graylog2 deployment, Larry Smith has a great blog post to get you started here. Last, I also installed the ElasticHQ plugin to monitor my cluster of one’s health.
- As an aside, I also deployed OSSEC to the Linux and Windows servers for file integrity monitoring and intrusion prevention.
Figure 2: ElasticHQ… Elasticsearch cluster health, and a whole lot more!
A note about the final deployment; ultimately the redesigned, recently released Graylog2 v0.20.1 didn’t work out like I’d hoped. Everything was running smoothly, and based on configuration guidance and the absence of error output, it seemed I was setup properly, but I never saw the data from Elasticsearch in Graylog2. I spent the last few moments I had allocated to this project experimenting with some alternate configurations, and finally strayed so far from my working example that I had to give up. So, after a week of research and implementation time, a diagram of what we have can be seen in Figure 3.
Figure 3: AWS Logging and Monitoring PoC Architecture
This was a trivial setup – I’m using a single box for a local Logstash shipper, an Elasticsearch index, MongoDB for Graylog2, and three different web interfaces. In a production system, ensure you use a more appropriate architecture including separating each component, utilizing multiple availability zones, inserting a broker to receive messages from log shippers, utilizing SSL, etc. etc.
Although I didn’t have enough time to sort out Graylog2, and get some alerting configured, I’m pleased with the overall outcome of my Security Visibility experiment. I found OSSEC to be an excellent “partner” in my quest for visibility, despite only utilizing and documenting the file integrity portion of its functionality.
Figure 4: OSSEC Web UI
Nxlog works perfectly for shipping Windows event logs, and of course, the lovely Kibana ties everything together and puts a nice bow on the concept of visualization.
Figure 4: Analyzing events with Kibana 3
Although this was not a terribly difficult experiment, from a technical perspective, I still wondered, “Is there another | quicker | better way to gain security visibility in AWS? Well, yes, and no. Yes, there’s an easy way to get security visibility, plus AWS automation to boot – no, because despite this gem of AWS security visibility, I will still recommend a centralized logging and monitoring platform in AWS. So, what’s this solution, you ask? CloudPassage Halo. But wait, there’s more! Halo has an API that’s made it possible for several SIEM solutions to integrate with it, sharing the Halo security visibility love in a centralized way within your existing, or planned, logging and monitoring deployment.
Halo has enough features and functionality to warrant its own blog post, so I won’t go into those here. Suffice it to say, anyone looking for security visibility, automation, or both in AWS should definitely have a look at what CloudPassage has to offer.
Figure 5: Windows security events captured by Halo
Logging and monitoring is hard, but there are more than enough commercial and open source tools available to fit any size organization, with any size of budget. Attaining security visibility and appropriate incident handling isn’t just the right thing to do from a best practice perspective; many standards, regulations, and laws mandate them. So, regardless of the type of solution or solutions you select, choose and implement something, and gain insight into security incidents you may not have any idea are happening. After all, inadequate visibility is better than no visibility at all.
For additional information on this subject and the opportunity to ask questions, please click here to register for our Webinar titled: Security Visibility in the Cloud – Logging and Monitoring in AWS occurring on May 1st, 2pm (EST).