Speed up your map plotting in R

I have been forever annoyed at how long it takes to plot data on a large shapefile. And this is a domain where doesn’t matter if you’re working with MapInfo or R. Just zooming the figure takes ages. But a couple of days ago I bumped into an excellent blogpost by John Myles White that solved the problem in the most Daoistic way, reminiscent of Steve Jobs: Do away with the shapefile!

To have a quick overview of one’s data, there is no need for a shapefile. Just plot latitude and longitude directly into a scattergraph.

That’s it.

Here’s how it is done. For our example, let’s plot a map showing the ratio of Foreign-born vs. Australian residents, by postcode, in all of Australia. As a first step, I made a search and found a ready-to-use file with all Australian postcodes and their geographical coordinates. Next, I got some Census data from the Australian Bureau of Statistics, specifically counts of Australian-born and foreign-born residents sorted by postcode. Quick munge and plot (code below), and voilĂ !

scatter Admittedly this is a quick fix: One has to guess the shape of Australia and the location of its major cities from their relative place on the map. Even with this limitation, though, I find this solution really appealing. Instead of waiting for many minutes each time I redraw or resize a graph, I can get a complete overview of my dataset instantaneously(!).

require(ggplot2)

setwd('/Users/myuser/R/map')
poa <- read.csv('data/2011_BCP_ALL_for_AUST_short-header/2011 Census BCP All Geographies for AUST/POA/AUST/2011Census_B01_AUST_POA_short.csv', as.is=TRUE)
poa$poa <- as.integer(sub('[a-zA-Z]{3}','', x=poa$region_id))

# Ratio Aus born vs elsewhere born
ratio <- data.frame('poa'=poa$poa,'Freq'=poa$Birthplace_Australia_P/poa$Birthplace_Elsewhere_P)

zipcodes <- read.csv('data/pc_full_lat_long.csv')
zipcodes <- zipcodes[!duplicated(zipcodes$Pcode),]
zipcodes <- zipcodes[!zipcodes$Lat == 0,]
zipcodes <- zipcodes[!zipcodes$Long == 0,]
poacoord <- data.frame('poa'=zipcodes$Pcode, 'lat'=zipcodes$Lat, 'long'=zipcodes$Long)

popcount <- na.omit(merge(ratio, poacoord, all=FALSE))
popcount <- popcount[rev(order(popcount$Freq)),] # Place the small bubbles on top
popcount$cat <- sapply(popcount$Freq, function(x) if (x < 6) {'> 1:6'} else {'< 1:6'})

ggplot(popcount, aes(x=long, y=lat, colour=cat)) +
scale_size(range = c(1,20), name = 'Population') +
geom_point() +
coord_equal()

And once we get the data presentation right, we can add the shapefile and go get some tea.

# must have gpclib installed
require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
require("plyr")
gpclibPermit() # required for fortify method
setwd('/Users/myuser/R/map/data/shp')
amap <- readOGR(dsn=".", layer="aust_cd66states")
amap@data$id = rownames(amap@data)
amap.points = fortify(amap, region='id')
amap.df = join(amap.points, amap@data, by='id')

ggplot() +
geom_polygon(data=amap.df, aes(long,lat,group=group)) +
geom_path(data=amap.df, aes(long,lat,group=group), color="grey") +
geom_point(data=popcount, aes(x=long, y=lat, colour=cat)) +
scale_size(range = c(1,20), name = "Population") +
scale_colour_discrete(name="ratio of Foreign vs\nAustralian-born") +
coord_equal()

shp

Advertisements

About dmvianna

Past neuroscientist, now data analyst.
This entry was posted in ABS data, Australia, Data Analysis, ggplot2, Mapping, R and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s