Sunday, September 25, 2016

Using datashader to visualize billions of data points

Visualization is the best way to explore and communicate insights about data. But many times, we find it is difficult to visualize if we have a lot of data points, the traditional visualization techniques break down. datashader is a tool developed by Continuum Analytics come to rescue. It is extremly easy and fast to use it to plot even billions of data points (see the examples or a video from Scipy 2016). 
This week, we will use datashader to plot the global seismicity for earthquakes larger than M2.0. To speed up the process, I downloaded the earthquake locations/times from the ANSS Catalog from 1898 to 2016, and save them as a csv file in the data folder (You can also grab the data directly using the APIs from my previous blog post). All together, we only have 1120244 earthquakes
And with a couple of lines, you can see we get a very nice looking map with the global seismicity. You can find the script and data at Qingkai's Github.
import pandas as pd
import datashader as ds
import datashader.transfer_functions as tf
from functools import partial
from datashader.utils import export_image
from datashader.colors import colormap_select, Greys9, Hot, viridis, inferno
# read in the earthquake data from file, note that I only read
# in 5 columns with Origin time, Latitude, Longitude, Depth, and
# Magnitude of the earthquakes
df_eq = pd.read_csv('./data/catalog.csv', usecols=[0,1,2,3,4])
# now let's plot them using a black background
background = "black"
export = partial(export_image, background = background)
cm = partial(colormap_select, reverse=(background!="black"))

cvs = ds.Canvas(plot_width=1600, plot_height=1000)
agg = cvs.points(df_eq, 'Longitude', 'Latitude')
export(tf.interpolate(agg, cmap = cm(Hot,0.2), how='eq_hist'),"global_seismicity")

1 comment: