For a while now I have been writing about how to analyze and optimize Hadoop
jobs beyond just tweaking MapReduce options. The other day I took a look at
some of our Outage Analyzer Hadoop jobs and put words into action.
A simple analysis of the Outage Analyzer jobs with Compuware APM 5.5
identified three hotspots and two potential Hadoop problems in one of our
biggest jobs. It took the responsible developer a couple of hours to fix it
and the result is a 2x improvement overall and a 6x improvement on the Reduce
part of the job. Let's see how we achieved that.
About Outage Analyzer
Outage Analyzer is a free service provided by Compuware that displays in
real-time any availability problems with the most popular third-party content
providers on the Internet. It's available at http://www.outageanalyzer.com.
It uses real time analytical process technologies to do anomaly d... (more)
In a recent article we have shown how the Java Garbage Collection MXBean
Counters have changed for the Concurrent Mark-and-Sweep Collector. It now
reports all GC runs instead of just major collections. That prompted me to
think about what a major GC actually is or what it should be. It is actually
quite hard to find any definition of major and minor GCs. This well-known
Java Memory Management Whitepaper only mentions in passing that a full
collection is sometimes referred to as major collection.
One of the more popular definitions is that a major GC is a stop-the-w... (more)
In the past few weeks I visited several Cloud and Big Data conferences that
provided me with a lot of insight. Some people only consider the technology
side of Big Data technologies like Hadoop or Cassandra. The real driver
however is a different one. Business analysts have discovered Big Data
technologies as a way to leverage tons of existing data and ask questions
about customer behavior and all sorts relationships to drive business
strategy. By doing that they are pushing their IT departments to run ever
bigger Hadoop environments and ever faster real-time systems.
What's int... (more)
The other day I was looking at a web application that was using MongoDB as
its central database. We were analyzing the application for potential
performance problems and inside five minutes I detected what I must consider
to be a MongoDB anti pattern and had a 40% impact on response time. The funny
thing: It was a Java best practice that triggered it.
Analyzing the Application
The first thing I always do is look at the topology of an application to get
a feel for it.
Overall Transaction Flow of the Application
As we see it's a modestly complex web application and it's using Mongo... (more)
Anyone who ever monitored or analyzed an application uses or has used
averages. They are simple to understand and calculate. We tend to ignore just
how wrong the picture is that averages paint of the world. To emphasis the
point let me give you a real-world example outside of the performance space
that I read recently in a newspaper.
The article was explaining that the average salary in a certain region in
Europe was 1900 Euro's (to be clear this would be quite good in that
region!). However when looking closer they found out that the majority,
namely 9 out of 10 people, only ea... (more)