For a while now I have been writing about how to analyze and optimize Hadoop
jobs beyond just tweaking MapReduce options. The other day I took a look at
some of our Outage Analyzer Hadoop jobs and put words into action.
A simple analysis of the Outage Analyzer jobs with Compuware APM 5.5
identified three hotspots and two potential Hadoop problems in one of our
biggest jobs. It took the responsible developer a couple of hours to fix it
and the result is a 2x improvement overall and a 6x improvement on the Reduce
part of the job. Let's see how we achieved that.
About Outage Analyzer
Outage Analyzer is a free service provided by Compuware that displays in
real-time any availability problems with the most popular third-party content
providers on the Internet. It's available at http://www.outageanalyzer.com.
It uses real time analytical process technologies to do anomaly d... (more)
Production Monitoring is about ensuring the stability and health of our
system, that also includes the application. A lot of times we encounter
production systems that concentrate on System Monitoring, under the
assumption that a stable system leads to stable and healthy applications. So
let’s see what System Monitoring can tell us about our Application.
Let’s take a very simple two-tier Web Application:
A simple two tier web application
This is a simple multi-tier eCommerce solution. Users are concerned about bad
performance when they do a search. Let's see what we can find out a... (more)
Setting up Application Performance Monitoring is a big task, but like
everything else it can be broken down into simple steps. You have to know
what you want to achieve and subsequently where to start. So let’s start at
the beginning and take a top-down approach
Know What You Want
The first thing to do is to be clear of what we want when monitoring the
application. Let’s face it: we “do not want to” ensure CPU utilization
to be below 90 percent or a network latency of under one millisecond. We are
also not really interested in garbage collection activity or whether the
database ... (more)
Last time I explained logical and organizational prerequisites to a
successful production level application performance monitoring. I originally
wanted to look at the concrete metrics we need on every tier, but was asked
how you can correlate data in a distributed environment, so this will be the
first thing that we look into. So let’s take a look at the technical
prerequisites of successful production monitoring.
Collecting data from distributed environment
The first problem that we have is the distributed nature of most
applications. In order to isolate response time problems or... (more)
(Note: If you’re interested in WebSphere in a production environment, check
out Michael's upcoming webinar with The Bon-Ton Stores)
Most articles about Garbage Collection ignore the fact that the Sun Hotspot
JVM is not the only game in town. In fact whenever you have to work with
either IBM WebSphere or Oracle WebLogic you will run on a different runtime.
While the concept of Garbage Collection is the same, the implementation is
not and neither are the default settings or how to tune it. This often leads
to unexpected problems when running the first load tests or in the worst case... (more)