A few weeks back I made a blog post with this nice little .gif below, of change over time in Median Melbourne Property Prices ($) from 2005-2016 - see my previous blog on 29 Sep 2016 :
Well I’ve just come back to looking at that data set and this time I’ve plotted the % change per annum and overall, and also absolute $ change from 2005-2016 on some interactive plots.
These plots allow you to zoom in, hover over a suburb to see more info, or click on a suburb to open a new window and explore that suburb in more detail.
The R code I used to make the plots below is here.
Explore below, it’s interesting to see that SYNDAL has the greatest per annum and overall % growth, however it’s TOORAK that has by far has the highest absolute $ growth over the same period of time.
First make up some fake revenue data for a company with a number of shops operating in each State from 2012 to 2015:
### Install/load required packages#List of R packages required for this analysis:required_packages<-c("ggplot2","stringr","plotly","dplyr")#Install required_packages:new.packages<-required_packages[!(required_packages%in%installed.packages()[,"Package"])]if(length(new.packages))install.packages(new.packages)#Load required_packages:lapply(required_packages,require,character.only=TRUE)#Set decimal points and disable scientific notationoptions(digits=3,scipen=999)#Make up some fake datadf<-data_frame(state=rep(c("New South Wales","Victoria","Queensland","Western Australia","South Australia","Tasmania"),36))%>%group_by(state)%>%mutate(year=c(rep(2012,9),rep(2013,9),rep(2014,9),rep(2015,9)))%>%group_by(state,year)%>%mutate(`store ID`=str_c("shop_#",as.character(seq_along(state))))%>%group_by(state,year,`store ID`)%>%mutate(`Revenue ($)`=ifelse(state=="New South Wales",sample(x=c(1000000:9000000),1),ifelse(state=="Victoria",sample(x=c(1000000:7000000),1),ifelse(state=="Queensland",sample(x=c(1000000:5000000),1),ifelse(state=="Western Australia",sample(x=c(100000:2000000),1),ifelse(state=="South Australia",sample(x=c(100000:900000),1),ifelse(state=="Tasmania",sample(x=c(100000:2000000),1),NA)))))))
Now visualise this data using ggplot:
ggplot(df,aes(state,`Revenue ($)`,colour=state,label=`store ID`))+geom_boxplot()+geom_point()+theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.title.y=element_text(face="bold",size=12),axis.text.y=element_text(angle=0,vjust=0.5,size=11),legend.title=element_text(size=12,face="bold"),legend.text=element_text(size=12,face="bold"),plot.title=element_text(face="bold",size=14))+ggtitle("Store Revenue per State from 2012 to 2015")+facet_wrap(~year)
Now make the plot reactive to the user’s mouse by wrapping plotly’s ggplotly() function around it:
p<-ggplotly(ggplot(df,aes(state,`Revenue ($)`,colour=state,label=`store ID`))+geom_boxplot()+geom_point()+theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.title.y=element_text(face="bold",size=12),axis.text.y=element_text(angle=0,vjust=0.5,size=10),legend.title=element_text(size=12,face="bold"),legend.text=element_text(size=12,face="bold"))+facet_wrap(~year))##Publish to plotly# plotly_POST(p, filename = "dans_plotly_example")
This type of simple plot made using plotly and ggplot2 in R are great because they have some basic “reactivity” to user input, (e.g. hover mouse over data point and lable appears with info. about data point like “store ID”” for example), but they do not need to be hosted on a server - they are simple enough to be knitted into a stand-alone HTML document.
So I wrote some R code to import the data from excel spreadsheet, tidy it, and then make this animated plot which I think is cool because you can get some insight from it in a much faster than just looking through the raw data in an excel spreadsheet. I thought it was interesting how much faster Houses are going up in value compared to Apartments. Also interesting how most of the price increases are in on the SouthEast side of Melbourne. I will drill into this dataset further when I get time, zooming in to certain suburbs, plotting vacant land over time (also included in the raw data), etc etc.
I managed to find a few spare hours this weekend so I’m trying out Python for the first time. I usually use Matlab and R for data processing, visualisation and statistics, but I wanted to give Python a try, since some of my friends at Vokke seem to really love it.
It’s early days so I haven’t actually managed to produce anything useful with Python yet, but I thought I’d start to document the steps I’m taking to learn Python for data science, from the point of view of a Matlab and R user.
First off, I downloaded and installed Anaconda which includes a distribution of Python, plus all the popular python packages you might need for data science.
Then I searched for an IDE that I like the feel of. Anaconda comes with a couple of IDE’s including one called “Spyder” which I thought seemed very good. However, I ended up deciding on using the Rodeo IDE for starters. The reason I decided on Rodeo is it is set out very similarly to the Rstudio and matlab IDEs, so I’m a little more comfortable with it to start with.
Third I started searching for the “python equivalents” to my favourite R packages for data science. I’m a major fan of most of Hadley Wickham’s’s R packages including ggplot2, dplyr, tidyr, lubridate, readr and readxl. So far for python I’ve found:
Pandas seems to be the popular package for manipulating data in python, but another package that seems closer to dplyr in R, is dplython which maintains the functional programing ideas of dplyr, including my favourite feature from magrittr and dplyr: the pipe-operator!
The python plotting packages seaborn, bokeh and matplotlib all seem really nice. Matplotlib in particular seems very familiar to the plotting system in matlab. But since I’ve recently become very comfortable using Hadley’s ggplot2 ‘grammar of graphics’ type plotting system, I think ggplot for python will suit me perfectly for starters!
…annnd that’s all I’ve got time for today, BUT I plan to keep updating this post with more info, as I come across it, that I think could be useful for somebody learning python for data science who is coming from a background of R and Matlab….so stay tuned!!