Box plot is an effective way to visualize the distribution of your data.It only takes a few lines of code in R to come up with a basic box plot.
If you are new to box plots, I would recommend you to watch this video to get an idea of range, mean and the four quartiles.
For this example I am using Social Security Payments dataset which can be downloaded from data.gov.au
<br> ssp<-read.csv("C:/data/dsselectoratedatamarch2014flat.csv",header=TRUE)<br> boxplot(disability_support_pension~state_of_commonwealth_electoral_<br> ,data=ssp<br> ,main="Disability Support Stats")<br> mtext("State",side=1,line=3)<br> mtext("Number of Disability Support Payments",side=2,line=3)<br>
Here I am plotting the number of disability support pension payments made for each state in a certain period using boxplot(). The axis are labelled using mtext(). The result is shown in Screen Capture 1.
As you can see from Screen Capture 1, the x-axis is labelled by state names in ascending order. However some state name are missing especially when the preceding label is longer.This can be corrected by aligning the labels on x-axis labels top down. This is done by setting the attribute las=2 . Also we can differentiate each box with a different colour using col attribute.
<br> boxplot(disability_support_pension~state_of_commonwealth_electoral_,data=ssp<br> ,main="Disability Support Stats",las=2<br> ,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan"))<br>
Screen Capture 2 looks better, however the labels in x-axis is now truncated. The best way to fix this is to alias the label names to an abbreviated value using names attribute.
<br> boxplot(disability_support_pension~state_of_commonwealth_electoral_,data=ssp<br> ,main="Disability Support stats",las=2<br> ,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan")<br> ,names =c("ACT","NSW","NT","QLD","SA","TAS","VIC","WA"))<br>
It’s important to note that the data values displayed in the axis is sorted in ascending order. Hence the corresponding column alias should follow the same order.
The box plots can be flipped about their axis by setting the attribute horizontal=TRUE. Just remember to change the mtext() axis labels as well.
<br> ssp<-read.csv("C:/data/dsselectoratedatamarch2014flat.csv",header=TRUE)<br> boxplot(disability_support_pension~state_of_commonwealth_electoral_<br> ,data=ssp<br> ,main="Disability Support Stats",las=1,horizontal=TRUE<br> ,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan")<br> ,names =c("ACT","NSW","NT","QLD","SA","TAS","VIC","WA"))<br> mtext("Number of Disability Support Payments",side=1,line=3)<br> mtext("State",side=2,line=3)<br>

You can also filter out certain sections of the box plot. For example, if I am only interested in eastern states, I can use subset() method the filter the data before passing on to boxplot()
<br> ssp<-read.csv("C:/data/dsselectoratedatamarch2014flat.csv",header=TRUE)<br> sspEast<-subset(ssp<br> ,state_of_commonwealth_electoral_== c("New South Wales","Queensland"))<br> boxplot(disability_support_pension~state_of_commonwealth_electoral_<br> ,data=sspEast<br> ,main="Disability Support Stats",las=1,horizontal=TRUE<br> ,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan")<br> ,names =c("ACT","NSW","NT","QLD","SA","TAS","VIC","WA"))<br> mtext("Number of Disability Support Payments",side=1,line=3)<br> mtext("State",side=2,line=3)<br>

That’s it. As you can see it only takes a few lines of code to visualize your data distribution using R.
[tweet https://twitter.com/paul_eng/status/653507313106612224 hide_thread=’true’ hide_media=’true’]