R: Box Plot

Box plot is an effective way to visualize the distribution of your data.It only takes a few lines of code in R to come up with a basic box plot.

If you are new to box plots, I would recommend you to watch this video to get an idea of range, mean and the four quartiles.

For this example I am using Social Security Payments dataset which can be downloaded from data.gov.au

<br>
ssp&lt;-read.csv("C:/data/dsselectoratedatamarch2014flat.csv",header=TRUE)<br>
boxplot(disability_support_pension~state_of_commonwealth_electoral_<br>
        ,data=ssp<br>
	,main="Disability Support Stats")<br>
mtext("State",side=1,line=3)<br>
mtext("Number of Disability Support Payments",side=2,line=3)<br>

Here I am plotting the number of disability support pension payments made for each state in a certain period using boxplot(). The axis are labelled using mtext(). The result is shown in Screen Capture 1.

Screen Capture 1 - Basic Box Plot
Screen Capture 1 – Basic Box Plot

As you can see from Screen Capture 1, the x-axis is labelled by state names in ascending order. However some state name are missing especially when the preceding label is longer.This can be corrected by aligning the labels on x-axis labels top down. This is done by setting the attribute las=2 . Also we can differentiate each box with a different colour using col attribute.

<br>
boxplot(disability_support_pension~state_of_commonwealth_electoral_,data=ssp<br>
,main="Disability Support Stats",las=2<br>
,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan"))<br>

Screen Capture 2 - Box Plot Version 2
Screen Capture 2 – Box Plot Version 2

Screen Capture 2 looks better, however the labels in x-axis is now truncated. The best way to fix this is to alias the label names to an abbreviated value using names attribute.

<br>
boxplot(disability_support_pension~state_of_commonwealth_electoral_,data=ssp<br>
,main="Disability Support stats",las=2<br>
,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan")<br>
,names =c("ACT","NSW","NT","QLD","SA","TAS","VIC","WA"))<br>

It’s important to note that the data values displayed in the axis is sorted in ascending order. Hence the corresponding column alias should follow the same order.

Screen Capture 3 - Box Plot Version 3
Screen Capture 3 – Box Plot Version 3

The box plots can be flipped about their axis by setting the attribute horizontal=TRUE. Just remember to change the mtext() axis labels as well.

<br>
ssp&lt;-read.csv("C:/data/dsselectoratedatamarch2014flat.csv",header=TRUE)<br>
boxplot(disability_support_pension~state_of_commonwealth_electoral_<br>
,data=ssp<br>
,main="Disability Support Stats",las=1,horizontal=TRUE<br>
,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan")<br>
,names =c("ACT","NSW","NT","QLD","SA","TAS","VIC","WA"))<br>
mtext("Number of Disability Support Payments",side=1,line=3)<br>
mtext("State",side=2,line=3)<br>

Screen Capture 4 - Horizontal Box Plot Version
Screen Capture 4 – Horizontal Box Plot

You can also filter out certain sections of the box plot. For example, if I am only interested in eastern states, I can use subset() method the filter the data before passing on to boxplot()

<br>
ssp&lt;-read.csv("C:/data/dsselectoratedatamarch2014flat.csv",header=TRUE)<br>
sspEast&lt;-subset(ssp<br>
,state_of_commonwealth_electoral_== c("New South Wales","Queensland"))<br>
boxplot(disability_support_pension~state_of_commonwealth_electoral_<br>
,data=sspEast<br>
,main="Disability Support Stats",las=1,horizontal=TRUE<br>
,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan")<br>
,names =c("ACT","NSW","NT","QLD","SA","TAS","VIC","WA"))<br>
mtext("Number of Disability Support Payments",side=1,line=3)<br>
mtext("State",side=2,line=3)<br>

Screen Capture 5 - Box Plot Subset
Screen Capture 5 – Box Plot Subset

That’s it. As you can see it only takes a few lines of code to visualize your data distribution using R.

[tweet https://twitter.com/paul_eng/status/653507313106612224 hide_thread=’true’ hide_media=’true’]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s