This is a summary of the workshop on the same topic.
R Markdown is an authoring framework for data science that enables easy creation of dynamic documents, presentations, and reports from R. It provides a notebook interface to connect data and run code as well as to generate reports that can be shared with an audience. R Markdown documents are fully reproducible.
As a language, R Markdown is an extension of the markdown syntax that has embedded R code chunks; as an R package, it is a library to process and convert .Rmd files into a number of formats (see more discussions here).
We can install the R Markdown package from CRAN. Make sure your copy of R is of the latest version.
install.packages("rmarkdown")
R Markdown supports a variety of static and dynamic output formats.
Check all the supported formats by R Markdown here.
Figure 1. Output formats. Screenshot of RStudio introduction video(1:01)
An R Markdown file is a simple plain text file that has the file extension .Rmd. It consists of three types of content: YAML metadata, text, and code chunks.
Figure 2. An Rmd file.
A report can be generated from an .Rmd file by simply clicking the “Knit” button in RStudio. The default output format of a knitted file is HTML.
To generate PDF output from R Markdown, you need to have a LaTeX distribution installed.
R Markdown documents start with a metadata section, the YAML metadata header, which can include document metadata such as title, author, date and output format. We can also control the appearance and style of a document by including a custom CSS file, specifying a theme, and adjusting the organization of its sections, among other options.
In the example below, we have set the title, author, date and output of the document. Besides, we have included several options for customization:
theme: readable
sets the readable HTML theme (check the theme gallery)highlight: textmate
specifies the syntax highlighting styletoc: true
& toc_float: true
adds a floating table of contentscss: contents.css
applies a pre-defined style sheet to the documentRead more about customizing output here.
Figure 3. Metadata at the top of an .Rmd file.
Texts can be formatted with Pandoc’s Markdown, which we discuss below.
R code chunks, embedded with the Markdown syntax, can be executed independently and interactively. Code outputs are rendered immediately beneath the inputs. A variety of objects such as text, tables, and graphics can be produced in a code chunk.
Figure 4. Code chunks.
To insert code chunk, we can:
```{r}```
.Below we discuss the code chunk in more detail.
On the upper right corner of a code chunk, we can see three little icons.
Figure 5. Chunk options.
The first icon can help us modify chunk options without typing code. Users have fine control over the outputs with the chunk options.
Figure 6. Modify Chunk Options.
There are a variety of chunk options for customizing components of a code chunk.
Figure 7. Chunk Options.
These chunk options include:
echo=FALSE
echo=TRUE
include=FALSE
eval=FALSE, include=FALSE
warning=TRUE
message=TRUE
fig.height=, fig.width=
The second icon calls R to run all chunks above the current one.
Figure 8. Run All Chunks Above.
The third icon tells R to run the current chunk.
Figure 9. Run Current Chunk.
To mark text as inline code, use a pair of backticks.
Input:
`code`
Output:
This is inline code
.
To create a code block, put code in a pair of triple backticks ```.
Input:
```
code
```
Output:
This is a code block.
In addition to R, an .Rmd file can execute code in many other languages, including:
To process a code chunk in another language, we should replace the r
at the start of the code chunk declaration with the name of that language. For instance,
`` `{python echo=FALSE,results=TRUE}
l = [1,2,45,'Hello World!']
for i in l:
print(l)
`` `
l = [1,2,45,'Hello World!']
for i in l:
print(l)
## [1, 2, 45, 'Hello World!']
## [1, 2, 45, 'Hello World!']
## [1, 2, 45, 'Hello World!']
## [1, 2, 45, 'Hello World!']
In an .Rmd file, we can create figures with code and insert images.
One thing we often do with figures is adjusting their sizes. For figures generated by code, there are several places to do that. We may include fig.height
and fig.weight
in the header, to start with.
Figure 10. Set figure size in header.
We may also set the figure height and the figure width as global options that apply to every chunk in the file by calling knitr::opts_chunk$set
in a code chunk, usually put at the beginning of our file.
In the case below, every figure in the document will have a width of 6 and a height of 4.
knitr::opts_chunk$set(fig.width = 6, fig.height = 4)
Note that what we pass to knitr::opts_chunk$set
can be overwritten in individual chunk headers.
Lastly, we can also set the figure height and width as chunk options:
Figure 11. Set figure size as chunk options
We may insert an image in an R Markdown file in several ways.
We can use the Markdown syntax, as shown below, to include a path, the width (optional), and a caption (optional). The syntax starts with an exclamation mark. The path can be a local path or a web url. We can set the image size in curve brackets {}
at the end.
![optional caption text](url)
![optional caption text](path)
![optional caption text](path){width=20px}
![optional caption text](path){width=20%}
![optional caption text](path){width=20%, height=40%}
Input:
![NYU Shanghai Library](https://i0.wp.com/oncenturyavenue.org/wp-content/uploads/2017/03/nyushlib.jpg?w=1280)
Output:
We may also center the image with <center> </center>
.
<center>
![](path)
NYU Shanghai Library
</center>
The other way to include an image is to use the knitr function knitr::include_graphics()
in a code chunk. The code chunk options include out.width
and out.height
to set the image width and height, fig.align
to set the alignment (center
, left
, right
, and default
), and fig.cap
to set the caption.
Figure 12. Embed image in code chunk.
R Markdown displays data frames and matrices as what we would see in the R console.
data("iris")
iris[1:6,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
If we need additional table formatting, we may use knitr::kable()
.
library(knitr)
kable(iris[1:6,], caption = 'This is a title')
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
To embellish tables with more advanced styling, we may use the package kableExtra, which provides a variety of functions to build LaTeX and HTML tables.
To create professional looking tables to summarize regression models, the package stargazer is recommended.
Additionally, we can also create tables with markdown by manually typing horizontal and vertical dashed lines.
Input:
First Header | Second Header
-------------|--------------
Content Cell | Content Cell
Content Cell | Content Cell
Output:
First Header | Second Header |
---|---|
Content Cell | Content Cell |
Content Cell | Content Cell |
Input:
Right Left Center Default
------- ------ ---------- -------
Cell Cell cell cell
Cell Cell cell cell
Output:
Right | Left | Center | Default |
---|---|---|---|
Cell | Cell | cell | cell |
Cell | Cell | cell | cell |
Notes on column alignment:
We can format the text in an R Markdown file with Pandoc’s Markdown, a set of ways to mark text to enable formatting. When we render an R Markdown file, it is first compiled to Markdown through the package knitr, and then converted to an output document (e.g., PDF, HTML, or Word) by Pandoc.
Now let’s take a look at how to mark up text into formatted text for some elements.
Section headers can be created on six levels, indicated by one to six pound signs.
Input:
# Header 1
## Header 2
### Header 3
#### Header 4
##### Header 5
###### Header 6
Output:
Unordered list items start with *
, -
, or +
. We can nest one list within another by indenting the sub-list.
Input:
* Item 1
* Item 2
+ Item 2.1
+ Item 2.2
- Item 2.21
- Item 2.22
Output:
Ordered list items start with numbers, which can also be nested.
Input:
1. Item 1
2. Item 2
3. Item 3
Output:
Hyperlinks are created using the syntax [text](link)
.
Input:
[R Markdown cheat sheet](https://shiny.rstudio.com/articles/rm-cheatsheet.html)
<https://shiny.rstudio.com/articles/rm-cheatsheet.html>
Output:
https://shiny.rstudio.com/articles/rm-cheatsheet.html
Blockquotes start with >
.
Input:
Einstein once said
> I never said that.
Output:
Einstein once said
I never said that.
A horizontal line starts with three or more asterisks or dashes.
Input:
******
------
Output:
Footnotes are put inside the square brackets after a caret ^[]
.
Input:
two footnotes [^1][^2]
Check the notes at the bottom of this page.
[^1]: This is the footnote.
[^2]: This is another footnote.
Output:
Here is a footnote reference.12 Check the note at the bottom of the page.
Italicized text can be created using a pair of asterisk or underscores.
Input:
*text*
_text_
Output:
text
text
Bold text can be created using a pair of double asterisks or double underscores.
Input:
**text**
__text__
Output:
text
text
A pair of carets (^) produce a superscript.
Input:
2^10^
Output:
210
A pair of tildes (~) turn text to a subscript.
Input:
H~2~O
Output:
H2O
The mathematical typesetting in R Markdown is based on LaTeX, a powerful tool to write mathematical equations and display mathematical notations. Read more about Latex here.
Note that in tables notations are compiled as inline code using a pair of $
while others are in display mode surrounded by a pair of $$
.
Inline LaTeX equations can be written in a pair of $
.
Input:
This is a math $expression$ in inline mode.
Output:
This is a math \(expression\) in inline mode.
Math expressions of the display mode can be written in a pair of $$
.
Input:
This is a math $$expression$$ in display mode.
Output:
This is a math \[expression\] in display mode.
Alternatively:
Input:
This is a math \[expression\] in display mode.
Output:
This is a math \[expression\] in display mode.
Input:
$$\hat{a}$$
Output:
\[\hat{a}\]
Input:
$$\bar{a}$$
Output:
\[\bar{a}\]
Input:
$$\tilde{a}$$
Output:
\[\tilde{a}\]
Input | Output |
---|---|
$\pi$ |
\(\pi\) |
$\Pi$ |
\(\Pi\) |
$h(\theta)$ |
\(h(\theta)\) |
$\Delta$ |
\(\Delta\) |
$\epsilon$ |
\(\epsilon\) |
$\alpha$ |
\(\alpha\) |
Input | Output |
---|---|
$\beta_0$ |
\(\beta_0\) |
$\theta_1x_1$ |
\(\theta_1x_1\) |
Input | Output |
---|---|
$p^{k}$ |
\(p^{k}\) |
$e^{-z}$ |
\(e^{-z}\) |
Example:
Input:
$y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \epsilon$
$h(\theta) = \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_3$
Output:
\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \epsilon\)
\(h(\theta) = \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_3\)
Input:
$$\sum_{i=1}^{m}$$
Output:
\[\sum_{i=1}^{m}\]
Example:
Input:
$\sum_{i=1}^{n}{(rating-\hat{rating})^2}$
$l(\theta)=\sum_{i=1}^{m}{[y^ilog(h_\theta(x^i)) + (1-y^i)log(1-h_\theta(x^i))]}$
Output:
\(\sum_{i=1}^{n}{(rating-\hat{rating})^2}\)
\(l(\theta)=\sum_{i=1}^{m}{[y^ilog(h_\theta(x^i)) + (1-y^i)log(1-h_\theta(x^i))]}\)
Format:
$$\frac{numerator}{denominator}$$
Input:
$$\frac{a+b}{b}$$
Output:
\[\frac{a+b}{b}\]
Input:
$$1 + \frac{a}{b}$$
Output:
\[1 + \frac{a}{b}\]
Input:
$$g(z) = \frac{1}{1+e^{-z}}$$
Output:
\[g(z) = \frac{1}{1+e^{-z}}\]
Example:
Input:
$J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}{(h(x^i)-y^i)^2}$
Output:
\(J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}{(h(x^i)-y^i)^2}\)
Format:
$$\sqrt[n]{expression}$$
Input:
$$\frac{-b + \sqrt{b^2 - 4ac}}{2a}$$
Output:
\[\frac{-b + \sqrt{b^2 - 4ac}}{2a}\]
Input:
$$\sqrt[3]{q + \sqrt{ q^2 - p^3 }}$$
Output:
\[\sqrt[3]{q + \sqrt{ q^2 - p^3 }}\]
Format:
$$\int^a_b$$
Input:
$$\int^a_b \frac{1}{3}x^3$$
Output:
\[\int^a_b \frac{1}{3}x^3\]
Input:
$$\frac{\partial u}{\partial t}$$
Output:
\[\frac{\partial u}{\partial t}\]
Example:
Input:
$\frac{\partial}{\partial \theta_j}J(\theta)=\frac{1}{m}\sum_{i=1}^{m}{(h(x^i)-y^i)x^i_j}$
Output:
\(\frac{\partial}{\partial \theta_j}J(\theta)=\frac{1}{m}\sum_{i=1}^{m}{(h(x^i)-y^i)x^i_j}\)
Input:
$$
\begin{matrix}
a & b \\
c & d
\end{matrix}
\quad
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix}
\quad
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
\quad
$$
Output:
\[ \begin{matrix} a & b \\ c & d \end{matrix} \quad \begin{pmatrix} a & b \\ c & d \end{pmatrix} \quad \begin{bmatrix} a & b \\ c & d \end{bmatrix} \quad \]