Natural Language Processing: Sentiments of Obama and Trump Speech

We will take a look at Obama’s first inauguration speech from 2009 and Trumps recent speech. We will analyse the sentiments of the speeches and compare them.

Introduction

As always we start with loading required packages. We need

rJava, qdap for text mining
dplyr for data preparation
ggplot2, ggrepel for visualisation
rio for data import

library(rJava)
suppressPackageStartupMessages(library(qdap))
suppressPackageStartupMessages(library(plyr))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
library(ggrepel)
library(rio)

Data Import

I prepared matching of president and party and load this file. Next, I load all inauguration speeches, which I before downloaded and saved in individual csv-files.

files <- c("2009_Obama.txt", "2017_Trump.txt")
lng_files <- length(files)
sentences <- data.frame(speech = rep(NA, lng_files),
            year_president = rep(NA, lng_files))

for (i in 1 : 2) {
    # import each individual speech
    file_path <- paste0("./data/", files[i])
    temp <- readLines(file_path)
    temp <- paste(temp, collapse = " ")  # concatenate all characters to one string
    sentences$speech[i] <- temp
    sentences$year_president[i] <- strsplit(files[i], split = ".", fixed = T)[[1]][1]
}

sentences <- sentSplit(sentences, "speech", verbose = F)

pol <- polarity(sentences$speech, sentences$year_president)
pol_df <- pol$all
pol_df <- pol_df %>% filter(!is.na(year_president))
pol_df$year_president <- as.factor(pol_df$year_president)
pol_df$pos.words <- NULL
pol_df$neg.words <- NULL

Now we analyse sentiment in both speeches.

# Polarity for both Speeches
colors_pol <- c("red", "green")
Obama_2009 <- pol_df[pol_df$year_president == "2009_Obama", ]
Obama_2009$total_words <- sum(Obama_2009$wc, na.rm=T)
Trump_2017 <- pol_df[pol_df$year_president == "2017_Trump", ]
Trump_2017$total_words <- sum(Trump_2017$wc, na.rm=T)
Obama_Trump <- rbind(Obama_2009, Trump_2017)

Sentiments in both Speeches

The result of sentiment analysis can be plotted with ggplot(). Each sentence has a unique sentiment, which was calculated. Each word of this sentence is plotted in the same color. Color code is a range from -1 (red, very negative) to +1 (blue, very positive).

g <- ggplot(data = Obama_Trump)
g <- g + geom_bar(aes(x= as.factor(year_president),
              y = wc,
              fill = polarity),
              stat= "identity", 
          position = "stack")
g <- g + coord_flip()
g <- g + ylab ("Words [-]")
g <- g + xlab ("Year and President")
g <- g + ggtitle ("Sentiments in Speech")
g <- g + theme_bw()
g <- g + scale_fill_gradientn(colours=rainbow(3))
g

We see, that Obamas speech was much longer compared to 2017 Trump speech.

Which are the most negative and most positive sentences in Trump’s speech according to the algorithm? We can find out with which.min()

pos_most_negative <- which.min(Trump_2017$polarity)
pol_most_negative <- Trump_2017$polarity[pos_most_negative]
Trump_2017$text.var[pos_most_negative]

## [1] "We will confront hardships."

pos_most_positive <- which.max(Trump_2017$polarity)
pol_most_positive <- Trump_2017$polarity[pos_most_positive]
Trump_2017$text.var[pos_most_positive]

## [1] "Protection will lead to great prosperity and strength."

Ok. What about Obama’s speech?

pos_most_negative <- which.min(Obama_2009$polarity)
pol_most_negative <- Obama_2009$polarity[pos_most_negative]
Obama_2009$text.var[pos_most_negative]

## [1] "Less measurable but no less profound is a sapping of confidence across our landâ\200”a nagging fear that Americaâ\200\231s decline is inevitable, that the next generation must lower its sights."

pos_most_positive <- which.max(Obama_2009$polarity)
pol_most_positive <- Obama_2009$polarity[pos_most_positive]
Obama_2009$text.var[pos_most_positive]

## [1] "The time has come to reaffirm our enduring spirit; to choose our better history; to carry forward that precious gift, that noble idea, passed on from generation to generation: the God-given promise that all are equal, all are free, and all deserve a chance to pursue their full measure of happiness."

It might be a coincidence, but Obama’s sentences are much longer than Trump’s.

Let’s find out about average sentence lengths.

median(Trump_2017$wc)

## [1] 13

median(Obama_2009$wc)

## [1] 19

Our gut feeling was right. Obama’s sentences are much longer.

Sentiment Histogram

I will plot a histogram that shows frequencies of sentiment for Obama and Trump.

Obama_2009$polarity_cut <- round(Obama_2009$polarity, 1)

pol_wc_count_Obama <- Obama_2009 %>% ddply(.(polarity_cut), summarise,
             wc_freq = sum(wc)/sum(Obama_2009$wc, na.rm=T) * 100)
pol_wc_count_Obama$president <- "Obama"
Trump_2017$polarity_cut <- round(Trump_2017$polarity, 1)
pol_wc_count_Trump <- Trump_2017 %>% ddply(.(polarity_cut), summarise,
                     wc_freq = sum(wc)/sum(Trump_2017$wc, na.rm=T) * 100)
pol_wc_count_Trump$president <- "Trump"

pol_wc_count <- rbind(pol_wc_count_Trump, pol_wc_count_Obama)

g <- ggplot(pol_wc_count, aes(x = polarity_cut, 
                  y = wc_freq,
                  fill = president))
g <- g + geom_bar(stat = "identity", position = "dodge")
g <- g + theme_bw()
g <- g + coord_cartesian(xlim = c(-1, 1))
g <- g + xlab ("Sentiment")
g <- g + ylab ("Frequency [%]")
g <- g + ggtitle ("Sentiment Frequencies")
g

Leaving one very negative sentence out, range of emotions is narrower for Trump’s speech. Obama’s speech covers a wider range of emotions and seems more balanced.