# From Fabbe’s R toolbox: How to make a publication-friendly correlation matrix with R

This tutorial will provide you with 4 straight forward steps on how to create a nice and neat publication-friendly correlation matrix with R Studio and Markdown. It has taken me a lot of time and googling around to figure out how to create the automated, well formatted tables that I wanted for my own manuscripts, and I am happy to share this with anyone who is interested.

The result will look something like this, and can be copy/pasted directly from R Studio to your article manuscript in Word:

The biggest advantage with using automated tables is that you do not need to copy/paste individual values from your statistics program to Word, which potentially is a major cause of errors, especially if you have large tables. Also, any re-calculations will be much quicker if you have an automated table, instead of once again making the table manually.

## Tutorial

You will find the whole R Studio code at the end of this tutorial. Feel free to use it as a template for your own analyses and let me know how it worked.

### Step 2: Get the data and pre-process as needed

Get dataset from a csv file

The data in this example is the results from the 2018 CrossFit Open competition, which I had access to for some strange reason. Believe it or not, but CrossFit is indeed a sport that people compete in and people who do look extremely ripped, like this guy Mat Frazer, for example. But hey, stop googeling pictures now, get back to business: the only thing you need to know for the sake of this tutorial is that competitors get a score and the lower the score the better.

In this data set we need to take care of some weird variable names with the colnames function, see above.

For further analysis, we make a seperate data frame “data_sub” of the variables of interest (age, weight, height and overall score in the CrossFit Open):

And, as I know that there are some very unlikable outliers let’s limit the data to more realistic values:

Step 3: Take a look at the data

Now, this is a little detour, but I always like to see how my data looks like. When it comes to correlation coefficients, we often assume that there is some kind of linear relationship between the variables, which is sometimes true, sometimes not. A graph can help us to see how the associations between variables looks like.

The chart.Correlation function from the PerformanceAnalytics package is an awesome way to get to know your data. It gives us this:

We get the correlation coefficients, distribution of the variables and trend line. Interestingly, there is a non-linear association of weight and overall score, with an optimal weight around 80kg for a minimal score. Lighter than this or heavier than this will, on average, give a worse score. Good to know!

That graph is produced by this code:

Anyhow, the above graph is not appropriate for publication. Let’s make a proper correlation matrix now.

Step 4: Creating our beautifully formatted table

We need to create a function for a table with stars that represent significance levels (***, ** and * for p <.001, .01, .05, respectively). The corstarsl function is a great way to do it (source here), however, requires a bit of coding. You can find the complete code below.

Finally, using the stargazer package and the following code

you will get this:

Now simply knit your document to html and copy/paste the table to Word. Voilá: done.

For more info on the very versatile stargazer package see https://www.jakeruss.com/cheatsheets/stargazer/ and for information regarding the corstarsl function see http://myowelt.blogspot.com/2008/04/beautiful-correlation-tables-in-r.html.

Hope this was useful, let me know how it worked!

The complete R markdown code:

``````---
title: "Correlation matrix example"
author: "Fabian Lenhard"
date: '2019-01-30'
output:
html_document:
df_print: paged
pdf_document: default
word_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk\$set(echo = TRUE, warning=FALSE, message=FALSE)
```

```{r get the data}
# Get dataset from csv file

# First we need to take care of some weird variable names
colnames(data)[colnames(data)=="Height_.m."] <- "Height"
colnames(data)[colnames(data)=="Weight_.kg."] <- "Weight"

# Now, we select variables of interest
data_sub <- subset(data, select  = c(Height, Weight, Age, Overall_score))

# And, as I know that there are some very unlikable outliers let's limit the data to more realistic values
attach(data_sub)
data_sub <- data_sub[ which(Height < 2.2 & Height > 1.20 & Weight < 200 & Weight > 30),]
detach(data_sub)

library(psych)
desctab <- describe(data_sub, skew = FALSE, ranges = FALSE)

#This is a nice graphical way to get to know your data. However, you may not want to puglish this table.
library(PerformanceAnalytics)
chart.Correlation(data_sub, method = "spearman")

# So, now we want to get published with a nice formatted table. We need to write a function first:

corstarsl <- function(x){
require(Hmisc)
x <- as.matrix(x)
R <- rcorr(x)\$r
p <- rcorr(x)\$P

## define notions for significance levels; spacing is important.
mystars <- ifelse(p < .001, "***", ifelse(p < .01, "** ", ifelse(p < .05, "* ", " ")))

## trunctuate the matrix that holds the correlations to two decimal
R <- format(round(cbind(rep(-1.11, ncol(x)), R), 2))[,-1]

## build a new matrix that includes the correlations with their apropriate stars
Rnew <- matrix(paste(R, mystars, sep=""), ncol=ncol(x))
diag(Rnew) <- paste(diag(R), " ", sep="")
rownames(Rnew) <- colnames(x)
colnames(Rnew) <- paste(colnames(x), "", sep="")

## remove upper triangle
Rnew <- as.matrix(Rnew)
Rnew[upper.tri(Rnew, diag = TRUE)] <- ""
Rnew <- as.data.frame(Rnew)

## remove last column and return the matrix (which is now a data frame)
Rnew <- cbind(Rnew[1:length(Rnew)-1])
return(Rnew)
}

cortab <- corstarsl(data_sub)
```

##Table 1: Descriptive statistics
```{r mylatextable, results = "asis"}
library(stargazer)

stargazer(data_sub, type = "html")

```

##Table 2: Correlation matrix
```{r mylatextable2, results = "asis"}

stargazer(cortab, type = "html", summary = FALSE, rownames = TRUE)

`````````