Experimental Economics: Data Workflow

: Data Presentation

Author

Matteo Ploner

Published

May 9, 2023

1 Data Representation

1.1 A Grammar for graphics

  • Wickham (2010) defines a layered grammar of graphics
    • Buiding a graph from multiple layers of data
  • The main layers of a graph are
    • data and aesthetic mappings
    • geometry objects
    • scales
    • facet specification
  • In addition we may have
    • statistical transformations
    • coordinate system

1.2 ggplot

  • We rely on the library ggplot2 to provide a graphical representation of data
    • Provide a data frame in the form of a tibble
    • Define the aesthetic mapping gathered from the data frame
      • x: x-dimension
      • y: y-dimension
      • fill: color to fill the graph
      • color: color of the graph
      • size: dimension of graph elements
      • label: labels in the graph
ggplot(data=“DATA”, mapping=aes(x=“X”,y=“Y”,...))    
  • This “sets the ground” for the graph
    • Still need to specify the exact “geometry” of the graph
dt = tibble(x=1:10,y=x^2)
dt
# A tibble: 10 × 2
       x     y
   <int> <dbl>
 1     1     1
 2     2     4
 3     3     9
 4     4    16
 5     5    25
 6     6    36
 7     7    49
 8     8    64
 9     9    81
10    10   100
ggplot(data=dt, aes(x=x,y=y))

1.3 Geometry

1.3.1 geom_point()

ggplot(data=dt, aes(x=x,y=y))+
  geom_point()

1.3.2 geom_line()

ggplot(data=dt, aes(x=x,y=y))+
geom_line()

1.3.3 geom_col()

ggplot(data=dt, aes(x=x,y=y))+
geom_col()

1.3.4 Combined

ggplot(data=dt, aes(x=x,y=y))+
  geom_col()+
  geom_line()+
  geom_point()

1.4 Markers

1.4.1 Lines

  • linetype controls the style of line
    • Can also be used inside aes(linetype=) when line type is conditional upon a viariable

1.4.2 Points

  • pch controls the style of points
    • Can be used inside aes(pch=) when point type is conditional upon a viariable

1.5 Markers: example

ggplot(data=dt, aes(x=x,y=y))+
    geom_point(pch=4)+
    geom_line(linetype=2)

1.6 Axes

  • Axes provide a guide to read the graph
  • Possible to control
    • axis dimensions
    • axis type
    • tick marks
    • tick mark labels
ggplot(data=dt, aes(x=x,y=y))+
    geom_point()+
        scale_x_continuous(
          limits=c(0,max(dt$x)),
          minor_breaks=seq(0,10,1),
          breaks=seq(0,10,1),
          labels=seq(0,10,1)
        )+
        scale_y_continuous(
          limits=c(0,max(dt$y)),
          minor_breaks=seq(0,max(dt$y),1),
          breaks=seq(0,max(dt$y),10),
          labels=seq(0,max(dt$y),10)
        )

2 Non graphical elements

2.1 Labels

  • We can specify labels of the graph
    • Title
    • Axis labels
    • Caption
ggplot(data=dt, aes(x=x,y=y,color=y))+
  geom_point()+
    labs(
      title="THIS IS THE TITLE",
      subtitle="SUBTITLE HERE",
      y = "This is the y-axis",
      x= "This is the x-axis",
      caption="CAPTION HERE"
      )

2.2 Themes

  • You can easily control the size, the orientation, and the color of non-graphical elements with theme
    • Axis
      • axis.title, axis.text, legend.key, legend.key.size …
    • Legend
      • legend.background, legend.margin, legend.spacing, legend.key.height, legend.key.width, legend.text, legend.text.align, legend.title, legend.position …
    • Facets
      • strip.background, strip.text …
ggplot(data=dt, aes(x=x,y=y,color=y))+
      geom_point()+
        theme(
              legend.position="right",
              axis.text=element_text(size=8),
              axis.title=element_text(size=14,face="bold"),
              legend.background = element_rect(fill="grey", size=2, linetype="solid")
              )

2.3 Colors

  • Colours can be used to fill a geometric element (e.g., bars, points) or to define its color (e.g., points, lines, …)
  • Colours can also be used to map variables to colors
    • aes(… color=VAR, fill=VAR)
  • Fill and color for a discrete variable
    • *scale_fill_brewer()** or *scale_color_brewer()** to use library(RColorBrewer) palettes
    • scale_fill_manual() or scale_color_manual() to manually define colors
  • Fill and color for a continuous variable
    • scale_fill_gradient() or scale_color_gradient() two-color gradient
    • scale_fill_gradientn() or scale_color_gradientn() n-color gradient, equally spaced

2.4 RColorBrewer palettes

2.5 Colors

  • Use color to provide a measure of the y-value
ggplot(data=dt, aes(x=x,y=y,color=y))+
  geom_point()+
    scale_colour_gradient(low="Blue",high="Red")

  • Use color to distinguish between Odd and Even x-values (discrete mapping)
# CODE HERE
ggplot(data=dt, aes(x=x,y=y,colour=as_factor(x%%2)))+
  geom_point()+
    scale_colour_brewer(palette="Set1")

2.6 Default themes

  • We can modify the theme of the graph vith theme
    • Overall look
    • Position of the legend
    • Font size
  • A gallery of themes can be found here
ggplot(data=dt, aes(x=x,y=y, color=x))+
  geom_point()+
  theme_dark()+
      labs(
      title="THIS IS THE TITLE"
      )

ggplot(data=dt, aes(x=x,y=y, color=x))+
  geom_point()+
  theme_bw()+
      labs(
      title="THIS IS THE TITLE"
      )

ggplot(data=dt, aes(x=x,y=y, color=x))+
  geom_point()+
  theme_classic()+
    labs(
      title="THIS IS THE TITLE"
      )

2.7 Facets

  • We can “split” different values into different indpendent panels
    • Facets conditiona upon a variable
  • As an example, divide odd and even outcomes
ggplot(data=dt, aes(x=x,y=y, color=x))+
  geom_point()+
  facet_wrap(~ifelse(x%%2==0,"Even","Odd"))

2.8 Combining layers

  • Now we can combine different layers of graphical and non-graphical elements to get the desired output
 g <- 
ggplot(data=dt, aes(x=x,y=y, color=as_factor(x%%2)))+
  geom_line(linetype=2, size=1, color="grey" )+
  geom_point(size=4, alpha=.5)+
    scale_x_continuous(
    limits=c(0,max(dt$x)),
    minor_breaks=seq(0,10,1),
    breaks=seq(0,10,1),
    labels=seq(0,10,1)
    )+
    scale_y_continuous(
    limits=c(0,max(dt$y)),
    minor_breaks=seq(0,max(dt$y),1),
    breaks=seq(0,max(dt$y),10),
    labels=seq(0,max(dt$y),10)
    )+
      facet_wrap(~ifelse(x%%2==0,"Even","Odd"))+
        scale_colour_brewer(palette="Set1")+
        theme_bw()+
          labs(
              title="My first graph",
              y = "This is the y-axis",
              x= "This is the x-axis",
              caption="Proudly made by me",
              color="Odd"
          )+ 
          theme(
                legend.position="bottom",
                axis.text=element_text(size=8),
                axis.title=element_text(size=14,face="bold"),
                legend.background = element_rect(fill="grey", 
                size=2, linetype="solid")
          ) 

3 eporting

3.1 R markdown

  • What is R Markdown?
    • It combines the simple style of Markdown and the powerful computational environment of R
    • R Markdown provides an authoring framework for data science.
    • A single R Markdown file to both save and execute code
    • Generate high quality reports that can be shared with an audience.
  • We rely on Quarto to create a report from R Markdown
    • A multi-language, next generation version of R Markdown from RStudio, with many new features and capabilities.
  • See the official guide here

3.2 R Markdown: elements

  • A R Markdown document is made of 3 main components
    1. Markdown textual elements
      • Contains the text comment/description of the output of the R chunk code
    2. R chunk code
      • Contains the R code to generate the desired output (table, graph, results…)
    3. YAML header
      • Contains information about the document and formatting styles

3.3 Textual elements

  • Markdown is a lightweight markup language with plain text formatting syntax
Plain text
*italics* and _italics_
**bold** and __bold__
superscript^2^
~~strikethrough~~
[link](www.rstudio.com)
inline equation: $A = \pi*r^{2}$

Plain text
italics and italics
bold and bold
superscript2
strikethrough
link
inline equation: \(A = \pi*r^{2}\)

3.4 What is a chunk code?

  • You can insert R code into a chunk code
  • The chunk has this generic format
``` {r}
Your code here
```
  • Several features of the code chunk can be controlled (see here for a detailed list)
    • eval
      • =FALSE \(\Rightarrow\) the chunk is not evaluated
    • echo
      • =FALSE \(\Rightarrow\) no source code is printed in the output, only the result of the code
    • include
      • =FALSE \(\Rightarrow\) the chunk is excluded from the output, but still evaluated
    • warning, message, and error
      • =FALSE \(\Rightarrow\) warnings, messages and errors are not printed in the output
  • One can also control figure dimensions (in inches) with fig.width and fig.height
``` {r} 
#| echo=FALSE
#| include=TRUE
#| eval=TRUE
#| fig.width=9 
#| fig.height=6

Your code here
```

3.5 What is the YAML header?

  • The YAML header contains the properties of the document
    • Title, author, date
    • The output format
      • html, pdf, word …
    • Reference to external sources
      • .bib for bibliography
      • .css for style elements
---
title: 'YOURTITLE'
subtitle: 'Yoursubtitle'
author: 'YOU'
date: 'today'
date-format: long
format: html
---

3.6 How to render your document

  • In RStudio v2022.07 and later fully supported

  • Click on Render
  • In Microsoft VS Code
    • Install the Quarto extension

  • Click on Render

3.7 An example of a R Markdown report

  • See
  • Source file: Report.qmd
  • Rendered file: Report.html

4 Appendix

4.1 Assignemnt

  • Create slides from Report.qmd
    • Hint: use the following YAML header
---
title: 'Risk preferences'
subtitle: 'Evidence from a MPL experiment'
author: 'Me'
date: 'today'
date-format: long
format: 
    revealjs: default
---

4.2 References

References

Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28.