Median Mean 3rd Qu. Max. NA's
0.06745 0.14271 0.22620 0.30409 0.35694 1.52230 4 Error t value Pr(>|t|)
(Intercept) 1.626e+00 3.996e-01 4.068 0.00021 ***
median_house_inc -4.351e-06 4.122e-06 -1.056 0.29729
share_vote_trump -2.006e+00 3.880e-01 -5.169 6.5e-06 ***
share_non_citizen -2.083e+00 1.149e+00 -1.814 0.07707 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1881 on 41 degrees of freedom
(6 observations deleted due to missingness)
Multiple R-squared: 0.4791, Adjusted R-squared: 0.441
F-statistic: 12.57 on 3 and 41 DF, p-value: 5.751e-06 Many types of analysis

Network .container[ .left-plot[
```r
# Load the libraries
library(igraph)
library(igraphdata)

# Load a sample network dataset
data("USairports")

# Plot the network graph
plot(USairports, vertex.size=5, vertex.label=NA, edge.arrow.size=0.3, main="US Airports Network")
```
]
.right-plot[
![](figures/networkexample-1.png)
]
]

Spatial .container[ .left-plot[
```r
# Load the library
library(tmap)

# Load the inbuilt dataset
data("World")

# Create a quick map
tm_shape(World) +
  tm_polygons("pop_est", title = "Population") +
  tm_layout(main.title = "World Population Map", frame = FALSE)
```
]
.right-plot[
![](figures/spatialexample-1.png)
]
]

Text .container[ .left-plot[
```r
# Load the libraries
library(wordcloud)
library(tm)

# Load a sample text dataset
data("crude")

# Create a term-document matrix
tdm <- TermDocumentMatrix(crude)

# Convert the matrix to a format for word cloud
m <- as.matrix(tdm)
word_freq <- sort(rowSums(m), decreasing=TRUE)

# Generate the word cloud
wordcloud(names(word_freq), word_freq, max.words=100)
```
]
.right-plot[
![](figures/textexample-1.png)
]
] Machine learning .container[ .left-plot[
```r
# Load the libraries
library(rpart)
library(ggparty)

# Load the dataset
data(iris)

# Create a decision tree model using rpart directly
model <- rpart(Species ~ ., data = iris)
```
]
.right-plot[
<img src="figures/mlexampleplot-1.png" width="302.4" />
]
]

More ML .container[ .left-code[
```r
library(caret)

# Load the iris dataset
data(iris)

# Split the data into training and testing sets (70% training, 30% testing)
set.seed(123)
trainIndex <- createDataPartition(iris$Species, p = 0.7, list = FALSE)
trainData <- iris[trainIndex, ]
testData <- iris[-trainIndex, ]

# Train a KNN model
model <- train(Species ~ ., data = trainData, method = "knn", tuneLength = 5)

# Predict on the test data
predictions <- predict(model, testData)

# Evaluate the model with a confusion matrix
confusionMatrix(predictions, testData$Species)
```

```
## Confusion Matrix and Statistics
##
##             Reference
## Prediction   setosa versicolor virginica
##   setosa         15          0         0
##   versicolor      0         14         1 virginica       0          1        14

Overall Statistics

              Accuracy : 0.9556
                95% CI : (0.8485, 0.9946)
   No Information Rate : 0.3333
   P-Value [Acc > NIR] : < 2.2e-16

                 Kappa : 0.9333

Mcnemar's Test P-Value : NA

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            0.9333           0.9333
Specificity                 1.0000            0.9667           0.9667
Pos Pred Value              1.0000            0.9333           0.9333
Neg Pred Value              1.0000            0.9667           0.9667
Prevalence                  0.3333            0.3333           0.3333
Detection Rate              0.3333            0.3111           0.3111
Detection Prevalence        0.3333            0.3333           0.3333
Balanced Accuracy           1.0000            0.9500           0.9500
```
]
.right-plot[
![](figures/mlexample2-1.png)
]
]

Sentiment analysis .container[ .left-plot[
```r
# Load the libraries
library(tidytext)
library(janeaustenr)
library(dplyr)
library(ggplot2)

# Load the dataset and convert it to a tidy format
data("austen_books")
tidy_books <- austen_books() %>%
  unnest_tokens(word, text)

# Perform sentiment analysis using the Bing lexicon
sentiments <- tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  group_by(book, sentiment) %>%
  count()
```
]
.right-plot[
<img src="figures/sentiexplot-1.png" width="302.4" />
]
] Plotting

.container[ .left-plot[
```r
# Load necessary library
library(ggplot2)

# Basic scatter plot
ggplot(hate_crimes, aes(x = median_house_inc, y = hate_crimes_per_100k_splc)) +
  geom_point() +
  labs(title = "Hate Crimes vs. Income",
       x = "Median Household Income",
       y = "Hate Crimes per 100k") +
  theme_minimal()
```
]
.right-plot[
![](figures/scatter1-1.png)
]
]

.container[ .left-plot[
```r
# Scatter plot with colour
ggplot(hate_crimes, aes(x = median_house_inc, y = hate_crimes_per_100k_splc, color = share_pop_metro)) +
  geom_point() +
  labs(title = "Hate Crimes vs.Income",
       x = "Median Household Income",
       y = "Hate Crimes per 100k",
       color = "Population Share (Metro)") +
  theme_minimal() +
  scale_color_gradient(low = "#FFCC33", high = "#660099")
```
]
.right-plot[
![](figures/scatter2-1.png)
]
]

.container[ .left-plot[
```r
# Scatter plot with regression line
ggplot(hate_crimes, aes(x = median_house_inc, y = hate_crimes_per_100k_splc, color = share_pop_metro)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(title = "Hate Crimes vs.Income",
       x = "Median Household Income",
       y = "Hate Crimes per 100k",
       color = "Population Share (Metro)") +
  theme_minimal() +
  scale_color_gradient(low = "#FFCC33", high = "#660099")
```
]
.right-plot[
![](figures/scatter3-1.png)
]
]

.container[ .left-plot[
```r
# add faceting
ggplot(hate_crimes, aes(x = median_house_inc, y = hate_crimes_per_100k_splc, color = share_pop_metro)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(title = "The importance of Trump Voters",
       x = "Median Household Income",
       y = "Hate Crimes per 100k",
       color = "Population") +
  theme_minimal() +
  scale_color_gradient(low = "#FFCC33", high = "#660099") +
  facet_wrap(~ cut(share_vote_trump, breaks = c(-Inf, 0.4, 0.6, 0.8, 1), labels = c("Low", "Medium", "High", "Very High")), labeller = label_both)
```
]
.right-plot[
![](figures/scatter4-1.png)
]
]
]

.container[ .left-plot[
```r
mean_hc <- mean(hate_crimes$hate_crimes_per_100k_splc, na.rm = TRUE)
mean_inc <- mean(hate_crimes$median_house_inc, na.rm = TRUE)

ggplot(hate_crimes, aes(x = median_house_inc, y = hate_crimes_per_100k_splc, color = share_pop_metro)) +
  geom_point(size = 3, alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") +
  labs(title = "Hate Crimes vs. Income",
       x = "Median Household Income",
       y = "Hate Crimes per 100k",
       color = "Population Share (Metro)") +
  theme_minimal() +
  scale_color_gradient(low = "#FFCC33", high = "#660099") +
  theme(plot.title = element_text(hjust = 0.5, size = 14),
        axis.title = element_text(size = 12),
        legend.position = "bottom") +
  geom_hline(yintercept = mean_hc, linetype = "dotted", color = "grey") +
  geom_vline(xintercept = mean_inc, linetype = "dotted", color = "grey")
```
]
.right-plot[
![](figures/scatter5-1.png)
]
]
]

# Reason 3: Reproducibility For open science

- **Enhances Transparency:** Open science promotes transparency in research methods and data, allowing for better scrutiny and validation of findings.
- **Fosters Collaboration:** Sharing data and methodologies encourages collaboration among criminologists, leading to more comprehensive studies.
- **Accelerates Innovation:** Open access to research can accelerate the development of new theories and approaches in criminology.
- Key Resources
  - **CrimRxiv:** An open-access preprint repository for the criminology community: [](
  - **Open Science Working Group at the ESC:**[](

For yourself (e.g., changes for reviewer or repeat analysis/batch processing) .container[ .left-plot[
```r
# Scatter plot with regression line
ggplot(hate_crimes, aes(x = median_house_inc, y = hate_crimes_per_100k_splc, color = share_pop_metro)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(title = "Hate Crimes vs.Income",
       x = "Median Household Income",
       y = "Hate Crimes per 100k",
       color = "Population Share (Metro)") +
  theme_minimal() +
  scale_color_gradient(low = "grey90", high = "black")
```
]
.right-plot[
![](figures/scatterbw-1.png)
]
] Less point and click = more transparency, less space for errors
- In 2010, economists Carmen Reinhart and Kenneth Rogoff published a paper titled "Growth in a Time of Debt."
- The paper claimed that countries with debt levels above 90% of GDP experienced slower economic growth.
- A spreadsheet error in their data analysis led to faulty conclusions.
- The findings influenced austerity measures globally, including in the EU and the US.
- Policymakers used the conclusions to justify harsh budget cuts and economic policies.
- See: [](

Beyond analysis...

- Slides/ reporting (demo next)
- Websites
- Books

Is it R?
<img src="figures/is_it_cake.png" width="20%" />

- these slides?
- [](
- [This ESC presentation](
- This book <img src="figures/cm_book.png" width="40%" />

Getting Started with R
- **[R for Data Science](**: A free online book that's an excellent resource for learning R and data science workflows.
- **[Hands-On Programming with R](**: Friendly intro to R language for non-programmers
- **[DataCamp: Introduction to R](**: Free course offering an interactive way to learn R basics. Visualisation
- **[Data Visualization: A practical introduction](**: Beautiful book on data visualisation
- **[ggplot2: Elegant Graphics for Data Analysis](**: A resource for those interested to understand the logic of the Grammar of Graphics that ggplot2 uses.

R Resources for Criminology/ Social Science
- **[Quantitative Social Science: An Introduction](**: My favourite social science stats textbook - also uses R
- **[Discovering Statistics Using R](**: Very comprehensive, from field of psychology
- **[A Beginner's Guide to Statistics for Criminology and Criminal Justice Using R](**: Written as a companion to classic stats book by Weisburd and Britt
- **[R for Criminologists](**: Our course material for PGT
- **[R for Criminologists](**: Our course material for UG

Additional Learning
- **[Stack Overflow](**: Get help from the R community by browsing or asking questions.
- **[R Bloggers](**: Collection of blogs about using R in a variety of disciplines, including criminology.
- Twitter (or maybe Blue Sky now? I don't know!!!)

# Demo

Exercise Code to copy:
```r
# Install necessary packages if they are not already installed
if(!require(ggplot2)) install.packages("ggplot2", dependencies = TRUE)
if(!require(fivethirtyeight)) install.packages("fivethirtyeight", dependencies = TRUE)

# Load the packages
library(ggplot2)
library(fivethirtyeight)

# Create a basic scatter plot of median_house_inc vs. hate