Weather Patterns Using Hadoop and R – Part 1

In the post Cloud Weather Reporting with R and Hadoop the meaning of synoptic cloud data was not explored. This post looks at cloud patterns and their impact on world weather.

The cloud weather analytics covered mainly comprises aggregate frequency data (e.g. cloud location, coverage and precipitation types) and time series data (e.g. when precipitation happens and for how long). This post focusses on frequency data analytics, while the next post will focus on time dependant analytics.

Frequency Based Analytics

Frequency based properties are evaluated through MapReduce by extracting relevant weather data using the following R script as Mapper:

# Mapper script
#! /usr/bin/env Rscript

con <- file("stdin", open = "r")
while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0)
 #output latitude, cloud types and precipitation level
 #latitude is coded as 0 for tropical, 1 for mid-latitudes, 2 for polar regions

The Reducer script from a previous post gives counts for each combination of cloud type and precipitation level, from which most frequency based weather analytics can be derived. For example, stratocumulus frequency is the aggregate count of where the low cloud type code is 4,5 or 8, divided by the total number of reports in the dataset.

Ideally from a data mining perspective, error prone work such as interpretation of WMO codes and further data analytics can be done using an R script and, as this is outside Hadoop, can be tested and amended quickly.


This data mining reveals a few properties of world weather, in addition to a few local weather patterns.

Location of Clouds

Nimbostratus is “common in middle latitudes”[1] – in the dataset 85% of nimbostratus occurrences are mid-latitude. Cumulonimbus is “common in tropics and temperate regions, rare at poles”[1], somewhat reflected in the dataset where 23% of occurrences of cumulonimbus are in tropical regions and 4% are in polar regions.

Likelihood of Precipitation

100% of occurrences of nimbostratus occur with precipitation, as WMO codes define precipitation bearing altostratus or altocumulus to be nimbostratus. Indeed nimbostratus is derived from ‘nimbus’, Latin for rain.

Conversely, high cloud types (cirrus, cirrostratus and cirrocumulus) generate no precipitation[1] so any perceived correlation with precipitation levels is meaningless. For remaining cloud types, precipitation occurrence levels range from 3% for cumulus to 38% for stratus. Indeed cumulus is associated with “generally no precipitation”[1].

Types of Precipitation

Cumulonimbus typically results in “Heavy downpours, hail”[1] – in the dataset 91% of precipitation associated with cumulonimbus is in the thunderstorm range. Indeed in the tropics, cumulonimbus often forms from hot weather through convection during monsoon season.

Conversely, nimbostratus is rarely seen with thunderstorms, more with “moderate to heavy rain or snow”[1] – 54% of nimbostratus occurrences happen with rain. Similarly, stratocumulus is usually accompanied by “occasional light rain, snow”[1] – 47% of cumulonimbus occurrences happen with rain.

Cloud Embedding

Cumulonimbus can be embedded with nimbostratus, and in such instances may result in heavy rain, but this is rare[2]. Indeed, further data mining reveals that only 5.9% of the occurrences of nimbostratus show this embedding.

Cloud Coverage Levels

The most significant cloud coverage comes from cirrus, stratocumulus and cumulus. For the dataset cirrus cloud cover is 22%, stratocumulus cloud cover is 24% and altocumulus cloud cover is 25%. This compares quite well to satellite data of 20 to 25% for cirrus[3], 25% for stratocumulus[2] and 20 to 25% for altocumulus[4].


The data mining results compare quite well with established cloud facts and world weather satellite data, particularly on location and coverage of clouds.

In the next post more complex data mining is presented i.e. time dependant analytics. For instance how to determine which clouds cause most prolonged rainfall? And how is this analytics done efficiently, given the amount of re-sorting of data involved?


  1. ^ a b c d e f Common Cloud Names, Shapes, and Altitudes
  2. ^ a Cloudwatching
  3. ^ Cirrus Clouds and Climate
  4. ^ Comparisons and analyses of aircraft and satellite observations for wintertime mixed-phase clouds

2 thoughts on “Weather Patterns Using Hadoop and R – Part 1

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s