Beauty and the Beast and regular expressions

Dook sucks

Why this matters

What you will learn

R can read

GASTON: How can you read this? There’s no pictures!

BELLE: Well, some people use their imaginations.

Beauty and the beast text

http://www.fpx.de/fp/Disney/Scripts/

First some tidy data

library(tidyverse)
library(stringr) # does not come with tidyverse
library(RColorBrewer)

# read in the cleaned beaut and the beast data frame
# see the .Rmd for the code that creates beauty_clean_df
beauty <- read_csv('https://raw.githubusercontent.com/idc9/stor390/master/data/beauty_clean_df.csv')
beauty[1, 1]
## # A tibble: 1 × 1
##     person
##      <chr>
## 1 NARRATOR
beauty[1, 2]
## # A tibble: 1 × 1
##                                                                          line
##                                                                         <chr>
## 1 Once upon a time, in a faraway land, a young prince lived in a;shining cast

base R vs. stringr

How many times do the main characters speak?

# person vector
beauty$person[1:5]
## [1] "NARRATOR"    "BELLE"       "TOWNSFOLK 1" "TOWNSFOLK 2" "TOWNSFOLK 3"
sum(str_count(beauty$person, "GASTON")) # using stringr
## [1] 66
sum(grepl("GASTON", beauty$person)) # using base R
## [1] 66

How many characters did each character speak?

beauty$line[str_detect(beauty$person, "GASTON")] %>% 
  nchar %>% sum
## [1] 4903

Question 1

Make a data frame with three locums:

What is the standard deviation of the number of characters spoken by each character? What is the max number of lines spoken by a character?

A few points on general regular expression logic:

Extraction

A quick example on that last point to demonstrate a string extraction using pattern matching.

str_extract_all("TOWNFOLK2 townfolk!", "[A-Z]+[0-9]+|[a-z]+[[:punct:]]+")
str_extract_all("TOWNFOLK2 townfolk!", "([A-Z]+[0-9]+|[a-z]+)[[:punct:]]+")
str_extract_all("TOWNFOLK2 townfolk!", "[A-Z]+([0-9]+|[a-z]+[[:punct:]]+)")

Matches capital letters AND subsequent numbers OR lower case letters AND subsequent punctuation

str_extract_all("TOWNFOLK2 townfolk!", "[A-Z]+[0-9]+|[a-z]+[[:punct:]]+")
## [[1]]
## [1] "TOWNFOLK2" "townfolk!"

Matches (capitals AND numbers OR lower case) AND punctuation

str_extract_all("TOWNFOLK2 townfolk!", "([A-Z]+[0-9]+|[a-z]+)[[:punct:]]+")
## [[1]]
## [1] "townfolk!"

Matches capitals AND (numbers OR lower case AND punctuation)

str_extract_all("TOWNFOLK2 townfolk!", "[A-Z]+([0-9]+|[a-z]+[[:punct:]]+)")
## [[1]]
## [1] "TOWNFOLK2"

Replacement

Replace TOWNSFOLK K with townsfolk

str_replace_all(beauty$person, "TOWNSFOLK[0-9\\s]*", "townsfolk")
##   [1] "NARRATOR"      "BELLE"         "townsfolk"     "townsfolk"    
##   [5] "townsfolk"     "townsfolk"     "townsfolk"     "BELLE"        
##   [9] "BAKER"         "BELLE"         "BAKER"         "BELLE"        
##  [13] "BAKER"         "townsfolk"     "WOMAN 1"       "BARBER"       
##  [17] "townsfolk"     "DRIVER"        "WOMAN 2"       "DRIVER"       
##  [21] "WOMAN 3"       "MERCHANT"      "WOMAN 3"       "WOMAN 4"      
##  [25] "MAN 1"         "BELLE"         "BOOKSELLER"    "BELLE"        
##  [29] "BOOKSELLER"    "BELLE"         "BOOKSELLER"    "BELLE"        
##  [33] "BOOKSELLER"    "BELLE"         "BOOKSELLER"    "BELLE"        
##  [37] "BOOKSELLER"    "BELLE"         "MEN"           "MEN"          
##  [41] "MEN"           "BELLE"         "WOMAN 5"       "MERCHANT"     
##  [45] "ALL"           "LEFOU"         "GASTON"        "LEFOU"        
##  [49] "GASTON"        "LEFOU"         "GASTON"        "LEFOU"        
##  [53] "GASTON"        "LEFOU"         "GASTON"        "LEFOU"        
##  [57] "GASTON"        "BIMBETTES"     "MAN 1"         "GASTON"       
##  [61] "MAN 2"         "MAN 3"         "WOMAN 1"       "WOMAN 2"      
##  [65] "MAN 4"         "WOMAN 3"       "MAN 4"         "GASTON"       
##  [69] "MAN 4"         "GASTON"        "WOMAN 4"       "MAN 5"        
##  [73] "WOMAN 4"       "MAN 5"         "MAN 6"         "BELLE"        
##  [77] "ALL"           "GASTON"        "ALL"           "GROUP 1"      
##  [81] "GROUP 2"       "ALL"           "GASTON"        "BELLE"        
##  [85] "GASTON"        "BELLE"         "GASTON"        "BELLE"        
##  [89] "GASTON"        "BELLE"         "BIMBETTE 1"    "BIMBETTE 2"   
##  [93] "BIMBETTE 3"    "BELLE"         "LEFOU"         "BELLE"        
##  [97] "GASTON"        "BELLE"         "BELLE"         "MAURICE"      
## [101] "BELLE"         "MAURICE"       "BELLE"         "MAURICE"      
## [105] "BELLE"         "MAURICE"       "BELLE"         "MAURICE"      
## [109] "BELLE"         "MAURICE"       "BELLE"         "MAURICE"      
## [113] "BELLE"         "MAURICE"       "BELLE"         "MAURICE"      
## [117] "BELLE"         "MAURICE"       "BELLE"         "MAURICE"      
## [121] "BELLE"         "MAURICE"       "MAURICE"       "MAURICE"      
## [125] "MAURICE"       "MAURICE"       "MAURICE"       "MAURICE"      
## [129] "LUMIERE"       "COGSWORTH"     "MAURICE"       "COGSWORTH"    
## [133] "MAURICE"       "LUMIERE"       "COGSWORTH"     "LUMIERE"      
## [137] "MAURICE"       "LUMIERE"       "MAURICE"       "LUMIERE"      
## [141] "MAURICE"       "COGSWORTH"     "MAURICE"       "COGSWORTH"    
## [145] "MAURICE"       "LUMIERE"       "MAURICE"       "COGSWORTH"    
## [149] "MAURICE"       "COGSWORTH"     "MRS. POTTS"    "COGSWORTH"    
## [153] "CHIP"          "MAURICE"       "CHIP"          "BEAST"        
## [157] "LUMIERE"       "COGSWORTH"     "BEAST"         "MAURICE"      
## [161] "BEAST"         "MAURICE"       "BEAST"         "MAURICE"      
## [165] "BEAST"         "MAURICE"       "BEAST"         "LEFOU"        
## [169] "GASTON"        "GASTON"        "LEFOU"         "GASTON"       
## [173] "LEFOU"         "BELLE"         "GASTON"        "BELLE"        
## [177] "GASTON"        "BELLE"         "GASTON"        "BELLE"        
## [181] "GASTON"        "BELLE"         "GASTON"        "BELLE"        
## [185] "GASTON"        "BELLE"         "LEFOU"         "GASTON"       
## [189] "LEFOU"         "PIERRE"        "BELLE"         "BELLE"        
## [193] "BELLE"         "BELLE"         "COGSWORTH"     "LUMIERE"      
## [197] "BELLE"         "CHIP"          "MRS. POTTS"    "CHIP"         
## [201] "MRS. POTTS"    "FEATHERDUSTER" "CHIP"          "COGSWORTH"    
## [205] "BELLE"         "LUMIERE"       "COGSWORTH"     "LUMIERE"      
## [209] "COGSWORTH"     "BELLE"         "MAURICE"       "BELLE"        
## [213] "MAURICE"       "BELLE"         "MAURICE"       "BELLE"        
## [217] "MAURICE"       "BELLE"         "BEAST"         "MAURICE"      
## [221] "BELLE"         "BEAST"         "BELLE"         "BEAST"        
## [225] "BELLE"         "BEAST"         "BELLE"         "BEAST"        
## [229] "MAURICE"       "BELLE"         "BEAST"         "BELLE"        
## [233] "MAURICE"       "BELLE"         "BEAST"         "MAURICE"      
## [237] "BELLE"         "MAURICE"       "BELLE"         "MAURICE"      
## [241] "BEAST"         "MAURICE"       "LUMIERE"       "BEAST"        
## [245] "LUMIERE"       "BELLE"         "BEAST"         "BELLE"        
## [249] "BEAST"         "BELLE"         "BEAST"         "LUMIERE"      
## [253] "BEAST"         "BELLE"         "BEAST"         "BEAST"        
## [257] "LUMIERE"       "BEAST"         "GASTON"        "LEFOU"        
## [261] "GASTON"        "LEFOU"         "GASTON"        "LEFOU"        
## [265] "OLD CRONIES"   "LEFOU"         "GASTON"        "OLD CRONIES"  
## [269] "OLD CRONIES"   "LEFOU"         "ALL"           "WRESTLER"     
## [273] "BIMBETTES"     "GASTON"        "LEFOU"         "GASTON"       
## [277] "OLD CRONIES"   "LEFOU"         "GASTON"        "ALL"          
## [281] "GASTON"        "ALL"           "LEFOU"         "GASTON"       
## [285] "ALL"           "MAURICE"       "MAN"           "MAURICE"      
## [289] "LEFOU"         "MAURICE"       "GASTON"        "MAURICE"      
## [293] "CRONY 1"       "MAURICE"       "CRONY 2"       "MAURICE"      
## [297] "CRONY 3"       "MAURICE"       "GASTON"        "MAURICE"      
## [301] "CRONY 1"       "GASTON"        "LEFOU"         "GASTON"       
## [305] "GASTON"        "LEFOU"         "GASTON"        "LEFOU"        
## [309] "GASTON"        "LEFOU"         "BOTH"          "BOTH"         
## [313] "LEFOU"         "ALL"           "MAURICE"       "BELLE"        
## [317] "MRS. POTTS"    "BELLE"         "WARDROBE"      "BELLE"        
## [321] "WARDROBE"      "CHIP"          "MRS. POTTS"    "BELLE"        
## [325] "CHIP"          "MRS. POTTS"    "CHIP"          "MRS. POTTS"   
## [329] "WARDROBE"      "BELLE"         "MRS. POTTS"    "CHIP"         
## [333] "WARDROBE"      "BELLE"         "WARDROBE"      "COGSWORTH"    
## [337] "BEAST"         "MRS. POTTS"    "LUMIERE"       "BEAST"        
## [341] "LUMIERE"       "MRS. POTTS"    "LUMIERE"       "BEAST"        
## [345] "MRS. POTTS"    "BEAST"         "MRS. POTTS"    "LUMIERE"      
## [349] "MRS. POTTS"    "LUMIERE"       "MRS. POTTS"    "LUMIERE"      
## [353] "MRS. POTTS"    "LUMIERE"       "BOTH"          "LUMIERE"      
## [357] "COGSWORTH"     "BEAST"         "COGSWORTH"     "BEAST"        
## [361] "COGSWORTH"     "BEAST"         "BELLE"         "BEAST"        
## [365] "LUMIERE"       "COGSWORTH"     "BEAST"         "MRS. POTTS"   
## [369] "BEAST"         "BELLE"         "COGSWORTH"     "BEAST"        
## [373] "COGSWORTH"     "BEAST"         "BELLE"         "BEAST"        
## [377] "BELLE"         "BEAST"         "MRS. POTTS"    "COGSWORTH"    
## [381] "LUMIERE"       "COGSWORTH"     "BEAST"         "WARDROBE"     
## [385] "BELLE"         "BEAST"         "FEATHERDUSTER" "LUMIERE"      
## [389] "FEATHERDUSTER" "LUMIERE"       "FEATHERDUSTER" "FEATHERDUSTER"
## [393] "LUMIERE"       "MRS. POTTS"    "CHIP"          "MRS. POTTS"   
## [397] "CHIP"          "STOVE"         "MRS. POTTS"    "COGSWORTH"    
## [401] "MRS. POTTS"    "COGSWORTH"     "LUMIERE"       "COGSWORTH"    
## [405] "BELLE"         "MRS. POTTS"    "COGSWORTH"     "MRS. POTTS"   
## [409] "COGSWORTH"     "LUMIERE"       "COGSWORTH"     "LUMIERE"      
## [413] "COGSWORTH"     "LUMIERE"       "MUGS"          "ALL"          
## [417] "LUMIERE"       "ALL"           "LUMIERE"       "COGSWORTH"    
## [421] "LUMIERE"       "LUMIERE"       "MRS. POTTS"    "ALL"          
## [425] "MRS. POTTS"    "ALL"           "ALL"           "BELLE"        
## [429] "COGSWORTH"     "BELLE"         "COGSWORTH"     "BELLE"        
## [433] "LUMIERE"       "COGSWORTH"     "BELLE"         "COGSWORTH"    
## [437] "COGSWORTH"     "BELLE"         "COGSWORTH"     "BELLE"        
## [441] "LUMIERE"       "BELLE"         "LUMIERE"       "BELLE"        
## [445] "COGSWORTH"     "BELLE"         "LUMIERE"       "BELLE"        
## [449] "COGSWORTH"     "LUMIERE"       "COGSWORTH"     "LUMIERE"      
## [453] "COGSWORTH"     "LUMIERE"       "COGSWORTH"     "LUMIERE"      
## [457] "COGSWORTH"     "BEAST"         "BELLE"         "BEAST"        
## [461] "BELLE"         "BEAST"         "BELLE"         "BEAST"        
## [465] "LUMIERE"       "BELLE"         "COGSWORTH"     "BELLE"        
## [469] "BEAST"         "BELLE"         "BEAST"         "BELLE"        
## [473] "BEAST"         "BELLE"         "BEAST"         "ARQUE"        
## [477] "GASTON"        "LEFOU"         "GASTON"        "ARQUE"        
## [481] "GASTON"        "LEFOU"         "ARQUE"         "MAURICE"      
## [485] "GASTON"        "LEFOU"         "GASTON"        "LEFOU"        
## [489] "BEAST"         "COGSWORTH"     "LUMIERE"       "BEAST"        
## [493] "BELLE"         "BEAST"         "BELLE"         "BEAST"        
## [497] "BELLE"         "BEAST"         "BELLE"         "BEAST"        
## [501] "BEAST"         "MRS. POTTS"    "LUMIERE"       "CHIP"         
## [505] "COGSWORTH"     "FEATHERDUSTER" "CHIP"          "MRS. POTTS"   
## [509] "CHIP"          "BELLE"         "BEAST"         "BELLE"        
## [513] "LUMIERE"       "MRS. POTTS"    "COGSWORTH"     "MRS. POTTS"   
## [517] "LUMIERE"       "MRS. POTTS"    "ALL"           "COGSWORTH"    
## [521] "CHIP"          "MRS. POTTS"    "CHIP"          "MRS. POTTS"   
## [525] "LUMIERE"       "BEAST"         "LUMIERE"       "BEAST"        
## [529] "LUMIERE"       "BEAST"         "LUMIERE"       "BEAST"        
## [533] "LUMIERE"       "BEAST"         "LUMIERE"       "COGSWORTH"    
## [537] "MRS. POTTS"    "MRS. POTTS"    "BEAST"         "BELLE"        
## [541] "BEAST"         "BELLE"         "BEAST"         "BEAST"        
## [545] "BELLE"         "BELLE"         "BEAST"         "BELLE"        
## [549] "BEAST"         "BELLE"         "BEAST"         "BELLE"        
## [553] "BEAST"         "BELLE"         "COGSWORTH"     "BEAST"        
## [557] "COGSWORTH"     "BEAST"         "COGSWORTH"     "BEAST"        
## [561] "ALL"           "COGSWORTH"     "CHIP"          "LUMIERE"      
## [565] "MRS. POTTS"    "LUMIERE"       "MRS. POTTS"    "COGSWORTH"    
## [569] "LEFOU"         "MAURICE"       "BELLE"         "MAURICE"      
## [573] "BELLE"         "MAURICE"       "BELLE"         "MAURICE"      
## [577] "BELLE"         "CHIP"          "BELLE"         "MAURICE"      
## [581] "CHIP"          "BELLE"         "BELLE"         "ARQUE"        
## [585] "BELLE"         "ARQUE"         "BELLE"         "LEFOU"        
## [589] "BYSTANDERS"    "BELLE"         "MAURICE"       "LEFOU"        
## [593] "MAURICE"       "LEFOU"         "MAURICE"       "LEFOU"        
## [597] "MAURICE"       "BELLE"         "GASTON"        "BELLE"        
## [601] "GASTON"        "BELLE"         "GASTON"        "BELLE"        
## [605] "GASTON"        "BELLE"         "GASTON"        "MAURICE"      
## [609] "BELLE"         "WOMAN 1"       "BELLE"         "GASTON"       
## [613] "BELLE"         "GASTON"        "BELLE"         "GASTON"       
## [617] "MAN 1"         "MAN 2"         "WOMAN 1"       "MAN 3"        
## [621] "GASTON"        "BELLE"         "GASTON"        "MAURICE"      
## [625] "GASTON"        "BELLE"         "GASTON"        "MOB"          
## [629] "GASTON"        "MOB"           "GASTON"        "BELLE"        
## [633] "MAURICE"       "MOB"           "COGSWORTH"     "LUMIERE"      
## [637] "LUMIERE"       "MRS. POTTS"    "LUMIERE"       "COGSWORTH"    
## [641] "MRS. POTTS"    "COGSWORTH"     "GASTON"        "OBJECTS"      
## [645] "MOB"           "MRS. POTTS"    "BEAST"         "MRS. POTTS"   
## [649] "MOB"           "LUMIERE"       "FEATHERDUSTER" "LUMIERE"      
## [653] "MOB"           "MRS. POTTS"    "BEAST"         "MOB"          
## [657] "LUMIERE"       "CHIP"          "MAURICE"       "CHIP"         
## [661] "COGSWORTH"     "GASTON"        "GASTON"        "BELLE"        
## [665] "BEAST"         "BELLE"         "BELLE"         "GASTON"       
## [669] "GASTON"        "GASTON"        "BEAST"         "BELLE"        
## [673] "BEAST"         "BEAST"         "BEAST"         "BELLE"        
## [677] "BEAST"         "BELLE"         "BEAST"         "BELLE"        
## [681] "PRINCE"        "BELLE"         "PRINCE"        "CHIP"         
## [685] "MRS. POTTS"    "LUMIERE"       "LUMIERE"       "COGSWORTH"    
## [689] "LUMIERE"       "COGSWORTH"     "LUMIERE"       "COGSWORTH"    
## [693] "LUMIERE"       "CHIP"          "MRS. POTTS"    "CHIP"         
## [697] "CHORUS"

What is this regex finding?

str_extract(beauty$person, "CRONY[0-9\\s]*")
str_extract_all(beauty$person, "CRONY[0-9\\s]*")
str_extract_all(beauty$person, "CRONY[0-9\\s]*", simplify = T)

str_extract

str_extract(beauty$person, "CRONY[0-9\\s]*")
##   [1] NA        NA        NA        NA        NA        NA        NA       
##   [8] NA        NA        NA        NA        NA        NA        NA       
##  [15] NA        NA        NA        NA        NA        NA        NA       
##  [22] NA        NA        NA        NA        NA        NA        NA       
##  [29] NA        NA        NA        NA        NA        NA        NA       
##  [36] NA        NA        NA        NA        NA        NA        NA       
##  [43] NA        NA        NA        NA        NA        NA        NA       
##  [50] NA        NA        NA        NA        NA        NA        NA       
##  [57] NA        NA        NA        NA        NA        NA        NA       
##  [64] NA        NA        NA        NA        NA        NA        NA       
##  [71] NA        NA        NA        NA        NA        NA        NA       
##  [78] NA        NA        NA        NA        NA        NA        NA       
##  [85] NA        NA        NA        NA        NA        NA        NA       
##  [92] NA        NA        NA        NA        NA        NA        NA       
##  [99] NA        NA        NA        NA        NA        NA        NA       
## [106] NA        NA        NA        NA        NA        NA        NA       
## [113] NA        NA        NA        NA        NA        NA        NA       
## [120] NA        NA        NA        NA        NA        NA        NA       
## [127] NA        NA        NA        NA        NA        NA        NA       
## [134] NA        NA        NA        NA        NA        NA        NA       
## [141] NA        NA        NA        NA        NA        NA        NA       
## [148] NA        NA        NA        NA        NA        NA        NA       
## [155] NA        NA        NA        NA        NA        NA        NA       
## [162] NA        NA        NA        NA        NA        NA        NA       
## [169] NA        NA        NA        NA        NA        NA        NA       
## [176] NA        NA        NA        NA        NA        NA        NA       
## [183] NA        NA        NA        NA        NA        NA        NA       
## [190] NA        NA        NA        NA        NA        NA        NA       
## [197] NA        NA        NA        NA        NA        NA        NA       
## [204] NA        NA        NA        NA        NA        NA        NA       
## [211] NA        NA        NA        NA        NA        NA        NA       
## [218] NA        NA        NA        NA        NA        NA        NA       
## [225] NA        NA        NA        NA        NA        NA        NA       
## [232] NA        NA        NA        NA        NA        NA        NA       
## [239] NA        NA        NA        NA        NA        NA        NA       
## [246] NA        NA        NA        NA        NA        NA        NA       
## [253] NA        NA        NA        NA        NA        NA        NA       
## [260] NA        NA        NA        NA        NA        NA        NA       
## [267] NA        NA        NA        NA        NA        NA        NA       
## [274] NA        NA        NA        NA        NA        NA        NA       
## [281] NA        NA        NA        NA        NA        NA        NA       
## [288] NA        NA        NA        NA        NA        "CRONY 1" NA       
## [295] "CRONY 2" NA        "CRONY 3" NA        NA        NA        "CRONY 1"
## [302] NA        NA        NA        NA        NA        NA        NA       
## [309] NA        NA        NA        NA        NA        NA        NA       
## [316] NA        NA        NA        NA        NA        NA        NA       
## [323] NA        NA        NA        NA        NA        NA        NA       
## [330] NA        NA        NA        NA        NA        NA        NA       
## [337] NA        NA        NA        NA        NA        NA        NA       
## [344] NA        NA        NA        NA        NA        NA        NA       
## [351] NA        NA        NA        NA        NA        NA        NA       
## [358] NA        NA        NA        NA        NA        NA        NA       
## [365] NA        NA        NA        NA        NA        NA        NA       
## [372] NA        NA        NA        NA        NA        NA        NA       
## [379] NA        NA        NA        NA        NA        NA        NA       
## [386] NA        NA        NA        NA        NA        NA        NA       
## [393] NA        NA        NA        NA        NA        NA        NA       
## [400] NA        NA        NA        NA        NA        NA        NA       
## [407] NA        NA        NA        NA        NA        NA        NA       
## [414] NA        NA        NA        NA        NA        NA        NA       
## [421] NA        NA        NA        NA        NA        NA        NA       
## [428] NA        NA        NA        NA        NA        NA        NA       
## [435] NA        NA        NA        NA        NA        NA        NA       
## [442] NA        NA        NA        NA        NA        NA        NA       
## [449] NA        NA        NA        NA        NA        NA        NA       
## [456] NA        NA        NA        NA        NA        NA        NA       
## [463] NA        NA        NA        NA        NA        NA        NA       
## [470] NA        NA        NA        NA        NA        NA        NA       
## [477] NA        NA        NA        NA        NA        NA        NA       
## [484] NA        NA        NA        NA        NA        NA        NA       
## [491] NA        NA        NA        NA        NA        NA        NA       
## [498] NA        NA        NA        NA        NA        NA        NA       
## [505] NA        NA        NA        NA        NA        NA        NA       
## [512] NA        NA        NA        NA        NA        NA        NA       
## [519] NA        NA        NA        NA        NA        NA        NA       
## [526] NA        NA        NA        NA        NA        NA        NA       
## [533] NA        NA        NA        NA        NA        NA        NA       
## [540] NA        NA        NA        NA        NA        NA        NA       
## [547] NA        NA        NA        NA        NA        NA        NA       
## [554] NA        NA        NA        NA        NA        NA        NA       
## [561] NA        NA        NA        NA        NA        NA        NA       
## [568] NA        NA        NA        NA        NA        NA        NA       
## [575] NA        NA        NA        NA        NA        NA        NA       
## [582] NA        NA        NA        NA        NA        NA        NA       
## [589] NA        NA        NA        NA        NA        NA        NA       
## [596] NA        NA        NA        NA        NA        NA        NA       
## [603] NA        NA        NA        NA        NA        NA        NA       
## [610] NA        NA        NA        NA        NA        NA        NA       
## [617] NA        NA        NA        NA        NA        NA        NA       
## [624] NA        NA        NA        NA        NA        NA        NA       
## [631] NA        NA        NA        NA        NA        NA        NA       
## [638] NA        NA        NA        NA        NA        NA        NA       
## [645] NA        NA        NA        NA        NA        NA        NA       
## [652] NA        NA        NA        NA        NA        NA        NA       
## [659] NA        NA        NA        NA        NA        NA        NA       
## [666] NA        NA        NA        NA        NA        NA        NA       
## [673] NA        NA        NA        NA        NA        NA        NA       
## [680] NA        NA        NA        NA        NA        NA        NA       
## [687] NA        NA        NA        NA        NA        NA        NA       
## [694] NA        NA        NA        NA

str_extract_all

str_extract_all(beauty$person, "CRONY[0-9\\s]*")
## [[1]]
## character(0)
## 
## [[2]]
## character(0)
## 
## [[3]]
## character(0)
## 
## [[4]]
## character(0)
## 
## [[5]]
## character(0)
## 
## [[6]]
## character(0)
## 
## [[7]]
## character(0)
## 
## [[8]]
## character(0)
## 
## [[9]]
## character(0)
## 
## [[10]]
## character(0)
## 
## [[11]]
## character(0)
## 
## [[12]]
## character(0)
## 
## [[13]]
## character(0)
## 
## [[14]]
## character(0)
## 
## [[15]]
## character(0)
## 
## [[16]]
## character(0)
## 
## [[17]]
## character(0)
## 
## [[18]]
## character(0)
## 
## [[19]]
## character(0)
## 
## [[20]]
## character(0)
## 
## [[21]]
## character(0)
## 
## [[22]]
## character(0)
## 
## [[23]]
## character(0)
## 
## [[24]]
## character(0)
## 
## [[25]]
## character(0)
## 
## [[26]]
## character(0)
## 
## [[27]]
## character(0)
## 
## [[28]]
## character(0)
## 
## [[29]]
## character(0)
## 
## [[30]]
## character(0)
## 
## [[31]]
## character(0)
## 
## [[32]]
## character(0)
## 
## [[33]]
## character(0)
## 
## [[34]]
## character(0)
## 
## [[35]]
## character(0)
## 
## [[36]]
## character(0)
## 
## [[37]]
## character(0)
## 
## [[38]]
## character(0)
## 
## [[39]]
## character(0)
## 
## [[40]]
## character(0)
## 
## [[41]]
## character(0)
## 
## [[42]]
## character(0)
## 
## [[43]]
## character(0)
## 
## [[44]]
## character(0)
## 
## [[45]]
## character(0)
## 
## [[46]]
## character(0)
## 
## [[47]]
## character(0)
## 
## [[48]]
## character(0)
## 
## [[49]]
## character(0)
## 
## [[50]]
## character(0)
## 
## [[51]]
## character(0)
## 
## [[52]]
## character(0)
## 
## [[53]]
## character(0)
## 
## [[54]]
## character(0)
## 
## [[55]]
## character(0)
## 
## [[56]]
## character(0)
## 
## [[57]]
## character(0)
## 
## [[58]]
## character(0)
## 
## [[59]]
## character(0)
## 
## [[60]]
## character(0)
## 
## [[61]]
## character(0)
## 
## [[62]]
## character(0)
## 
## [[63]]
## character(0)
## 
## [[64]]
## character(0)
## 
## [[65]]
## character(0)
## 
## [[66]]
## character(0)
## 
## [[67]]
## character(0)
## 
## [[68]]
## character(0)
## 
## [[69]]
## character(0)
## 
## [[70]]
## character(0)
## 
## [[71]]
## character(0)
## 
## [[72]]
## character(0)
## 
## [[73]]
## character(0)
## 
## [[74]]
## character(0)
## 
## [[75]]
## character(0)
## 
## [[76]]
## character(0)
## 
## [[77]]
## character(0)
## 
## [[78]]
## character(0)
## 
## [[79]]
## character(0)
## 
## [[80]]
## character(0)
## 
## [[81]]
## character(0)
## 
## [[82]]
## character(0)
## 
## [[83]]
## character(0)
## 
## [[84]]
## character(0)
## 
## [[85]]
## character(0)
## 
## [[86]]
## character(0)
## 
## [[87]]
## character(0)
## 
## [[88]]
## character(0)
## 
## [[89]]
## character(0)
## 
## [[90]]
## character(0)
## 
## [[91]]
## character(0)
## 
## [[92]]
## character(0)
## 
## [[93]]
## character(0)
## 
## [[94]]
## character(0)
## 
## [[95]]
## character(0)
## 
## [[96]]
## character(0)
## 
## [[97]]
## character(0)
## 
## [[98]]
## character(0)
## 
## [[99]]
## character(0)
## 
## [[100]]
## character(0)
## 
## [[101]]
## character(0)
## 
## [[102]]
## character(0)
## 
## [[103]]
## character(0)
## 
## [[104]]
## character(0)
## 
## [[105]]
## character(0)
## 
## [[106]]
## character(0)
## 
## [[107]]
## character(0)
## 
## [[108]]
## character(0)
## 
## [[109]]
## character(0)
## 
## [[110]]
## character(0)
## 
## [[111]]
## character(0)
## 
## [[112]]
## character(0)
## 
## [[113]]
## character(0)
## 
## [[114]]
## character(0)
## 
## [[115]]
## character(0)
## 
## [[116]]
## character(0)
## 
## [[117]]
## character(0)
## 
## [[118]]
## character(0)
## 
## [[119]]
## character(0)
## 
## [[120]]
## character(0)
## 
## [[121]]
## character(0)
## 
## [[122]]
## character(0)
## 
## [[123]]
## character(0)
## 
## [[124]]
## character(0)
## 
## [[125]]
## character(0)
## 
## [[126]]
## character(0)
## 
## [[127]]
## character(0)
## 
## [[128]]
## character(0)
## 
## [[129]]
## character(0)
## 
## [[130]]
## character(0)
## 
## [[131]]
## character(0)
## 
## [[132]]
## character(0)
## 
## [[133]]
## character(0)
## 
## [[134]]
## character(0)
## 
## [[135]]
## character(0)
## 
## [[136]]
## character(0)
## 
## [[137]]
## character(0)
## 
## [[138]]
## character(0)
## 
## [[139]]
## character(0)
## 
## [[140]]
## character(0)
## 
## [[141]]
## character(0)
## 
## [[142]]
## character(0)
## 
## [[143]]
## character(0)
## 
## [[144]]
## character(0)
## 
## [[145]]
## character(0)
## 
## [[146]]
## character(0)
## 
## [[147]]
## character(0)
## 
## [[148]]
## character(0)
## 
## [[149]]
## character(0)
## 
## [[150]]
## character(0)
## 
## [[151]]
## character(0)
## 
## [[152]]
## character(0)
## 
## [[153]]
## character(0)
## 
## [[154]]
## character(0)
## 
## [[155]]
## character(0)
## 
## [[156]]
## character(0)
## 
## [[157]]
## character(0)
## 
## [[158]]
## character(0)
## 
## [[159]]
## character(0)
## 
## [[160]]
## character(0)
## 
## [[161]]
## character(0)
## 
## [[162]]
## character(0)
## 
## [[163]]
## character(0)
## 
## [[164]]
## character(0)
## 
## [[165]]
## character(0)
## 
## [[166]]
## character(0)
## 
## [[167]]
## character(0)
## 
## [[168]]
## character(0)
## 
## [[169]]
## character(0)
## 
## [[170]]
## character(0)
## 
## [[171]]
## character(0)
## 
## [[172]]
## character(0)
## 
## [[173]]
## character(0)
## 
## [[174]]
## character(0)
## 
## [[175]]
## character(0)
## 
## [[176]]
## character(0)
## 
## [[177]]
## character(0)
## 
## [[178]]
## character(0)
## 
## [[179]]
## character(0)
## 
## [[180]]
## character(0)
## 
## [[181]]
## character(0)
## 
## [[182]]
## character(0)
## 
## [[183]]
## character(0)
## 
## [[184]]
## character(0)
## 
## [[185]]
## character(0)
## 
## [[186]]
## character(0)
## 
## [[187]]
## character(0)
## 
## [[188]]
## character(0)
## 
## [[189]]
## character(0)
## 
## [[190]]
## character(0)
## 
## [[191]]
## character(0)
## 
## [[192]]
## character(0)
## 
## [[193]]
## character(0)
## 
## [[194]]
## character(0)
## 
## [[195]]
## character(0)
## 
## [[196]]
## character(0)
## 
## [[197]]
## character(0)
## 
## [[198]]
## character(0)
## 
## [[199]]
## character(0)
## 
## [[200]]
## character(0)
## 
## [[201]]
## character(0)
## 
## [[202]]
## character(0)
## 
## [[203]]
## character(0)
## 
## [[204]]
## character(0)
## 
## [[205]]
## character(0)
## 
## [[206]]
## character(0)
## 
## [[207]]
## character(0)
## 
## [[208]]
## character(0)
## 
## [[209]]
## character(0)
## 
## [[210]]
## character(0)
## 
## [[211]]
## character(0)
## 
## [[212]]
## character(0)
## 
## [[213]]
## character(0)
## 
## [[214]]
## character(0)
## 
## [[215]]
## character(0)
## 
## [[216]]
## character(0)
## 
## [[217]]
## character(0)
## 
## [[218]]
## character(0)
## 
## [[219]]
## character(0)
## 
## [[220]]
## character(0)
## 
## [[221]]
## character(0)
## 
## [[222]]
## character(0)
## 
## [[223]]
## character(0)
## 
## [[224]]
## character(0)
## 
## [[225]]
## character(0)
## 
## [[226]]
## character(0)
## 
## [[227]]
## character(0)
## 
## [[228]]
## character(0)
## 
## [[229]]
## character(0)
## 
## [[230]]
## character(0)
## 
## [[231]]
## character(0)
## 
## [[232]]
## character(0)
## 
## [[233]]
## character(0)
## 
## [[234]]
## character(0)
## 
## [[235]]
## character(0)
## 
## [[236]]
## character(0)
## 
## [[237]]
## character(0)
## 
## [[238]]
## character(0)
## 
## [[239]]
## character(0)
## 
## [[240]]
## character(0)
## 
## [[241]]
## character(0)
## 
## [[242]]
## character(0)
## 
## [[243]]
## character(0)
## 
## [[244]]
## character(0)
## 
## [[245]]
## character(0)
## 
## [[246]]
## character(0)
## 
## [[247]]
## character(0)
## 
## [[248]]
## character(0)
## 
## [[249]]
## character(0)
## 
## [[250]]
## character(0)
## 
## [[251]]
## character(0)
## 
## [[252]]
## character(0)
## 
## [[253]]
## character(0)
## 
## [[254]]
## character(0)
## 
## [[255]]
## character(0)
## 
## [[256]]
## character(0)
## 
## [[257]]
## character(0)
## 
## [[258]]
## character(0)
## 
## [[259]]
## character(0)
## 
## [[260]]
## character(0)
## 
## [[261]]
## character(0)
## 
## [[262]]
## character(0)
## 
## [[263]]
## character(0)
## 
## [[264]]
## character(0)
## 
## [[265]]
## character(0)
## 
## [[266]]
## character(0)
## 
## [[267]]
## character(0)
## 
## [[268]]
## character(0)
## 
## [[269]]
## character(0)
## 
## [[270]]
## character(0)
## 
## [[271]]
## character(0)
## 
## [[272]]
## character(0)
## 
## [[273]]
## character(0)
## 
## [[274]]
## character(0)
## 
## [[275]]
## character(0)
## 
## [[276]]
## character(0)
## 
## [[277]]
## character(0)
## 
## [[278]]
## character(0)
## 
## [[279]]
## character(0)
## 
## [[280]]
## character(0)
## 
## [[281]]
## character(0)
## 
## [[282]]
## character(0)
## 
## [[283]]
## character(0)
## 
## [[284]]
## character(0)
## 
## [[285]]
## character(0)
## 
## [[286]]
## character(0)
## 
## [[287]]
## character(0)
## 
## [[288]]
## character(0)
## 
## [[289]]
## character(0)
## 
## [[290]]
## character(0)
## 
## [[291]]
## character(0)
## 
## [[292]]
## character(0)
## 
## [[293]]
## [1] "CRONY 1"
## 
## [[294]]
## character(0)
## 
## [[295]]
## [1] "CRONY 2"
## 
## [[296]]
## character(0)
## 
## [[297]]
## [1] "CRONY 3"
## 
## [[298]]
## character(0)
## 
## [[299]]
## character(0)
## 
## [[300]]
## character(0)
## 
## [[301]]
## [1] "CRONY 1"
## 
## [[302]]
## character(0)
## 
## [[303]]
## character(0)
## 
## [[304]]
## character(0)
## 
## [[305]]
## character(0)
## 
## [[306]]
## character(0)
## 
## [[307]]
## character(0)
## 
## [[308]]
## character(0)
## 
## [[309]]
## character(0)
## 
## [[310]]
## character(0)
## 
## [[311]]
## character(0)
## 
## [[312]]
## character(0)
## 
## [[313]]
## character(0)
## 
## [[314]]
## character(0)
## 
## [[315]]
## character(0)
## 
## [[316]]
## character(0)
## 
## [[317]]
## character(0)
## 
## [[318]]
## character(0)
## 
## [[319]]
## character(0)
## 
## [[320]]
## character(0)
## 
## [[321]]
## character(0)
## 
## [[322]]
## character(0)
## 
## [[323]]
## character(0)
## 
## [[324]]
## character(0)
## 
## [[325]]
## character(0)
## 
## [[326]]
## character(0)
## 
## [[327]]
## character(0)
## 
## [[328]]
## character(0)
## 
## [[329]]
## character(0)
## 
## [[330]]
## character(0)
## 
## [[331]]
## character(0)
## 
## [[332]]
## character(0)
## 
## [[333]]
## character(0)
## 
## [[334]]
## character(0)
## 
## [[335]]
## character(0)
## 
## [[336]]
## character(0)
## 
## [[337]]
## character(0)
## 
## [[338]]
## character(0)
## 
## [[339]]
## character(0)
## 
## [[340]]
## character(0)
## 
## [[341]]
## character(0)
## 
## [[342]]
## character(0)
## 
## [[343]]
## character(0)
## 
## [[344]]
## character(0)
## 
## [[345]]
## character(0)
## 
## [[346]]
## character(0)
## 
## [[347]]
## character(0)
## 
## [[348]]
## character(0)
## 
## [[349]]
## character(0)
## 
## [[350]]
## character(0)
## 
## [[351]]
## character(0)
## 
## [[352]]
## character(0)
## 
## [[353]]
## character(0)
## 
## [[354]]
## character(0)
## 
## [[355]]
## character(0)
## 
## [[356]]
## character(0)
## 
## [[357]]
## character(0)
## 
## [[358]]
## character(0)
## 
## [[359]]
## character(0)
## 
## [[360]]
## character(0)
## 
## [[361]]
## character(0)
## 
## [[362]]
## character(0)
## 
## [[363]]
## character(0)
## 
## [[364]]
## character(0)
## 
## [[365]]
## character(0)
## 
## [[366]]
## character(0)
## 
## [[367]]
## character(0)
## 
## [[368]]
## character(0)
## 
## [[369]]
## character(0)
## 
## [[370]]
## character(0)
## 
## [[371]]
## character(0)
## 
## [[372]]
## character(0)
## 
## [[373]]
## character(0)
## 
## [[374]]
## character(0)
## 
## [[375]]
## character(0)
## 
## [[376]]
## character(0)
## 
## [[377]]
## character(0)
## 
## [[378]]
## character(0)
## 
## [[379]]
## character(0)
## 
## [[380]]
## character(0)
## 
## [[381]]
## character(0)
## 
## [[382]]
## character(0)
## 
## [[383]]
## character(0)
## 
## [[384]]
## character(0)
## 
## [[385]]
## character(0)
## 
## [[386]]
## character(0)
## 
## [[387]]
## character(0)
## 
## [[388]]
## character(0)
## 
## [[389]]
## character(0)
## 
## [[390]]
## character(0)
## 
## [[391]]
## character(0)
## 
## [[392]]
## character(0)
## 
## [[393]]
## character(0)
## 
## [[394]]
## character(0)
## 
## [[395]]
## character(0)
## 
## [[396]]
## character(0)
## 
## [[397]]
## character(0)
## 
## [[398]]
## character(0)
## 
## [[399]]
## character(0)
## 
## [[400]]
## character(0)
## 
## [[401]]
## character(0)
## 
## [[402]]
## character(0)
## 
## [[403]]
## character(0)
## 
## [[404]]
## character(0)
## 
## [[405]]
## character(0)
## 
## [[406]]
## character(0)
## 
## [[407]]
## character(0)
## 
## [[408]]
## character(0)
## 
## [[409]]
## character(0)
## 
## [[410]]
## character(0)
## 
## [[411]]
## character(0)
## 
## [[412]]
## character(0)
## 
## [[413]]
## character(0)
## 
## [[414]]
## character(0)
## 
## [[415]]
## character(0)
## 
## [[416]]
## character(0)
## 
## [[417]]
## character(0)
## 
## [[418]]
## character(0)
## 
## [[419]]
## character(0)
## 
## [[420]]
## character(0)
## 
## [[421]]
## character(0)
## 
## [[422]]
## character(0)
## 
## [[423]]
## character(0)
## 
## [[424]]
## character(0)
## 
## [[425]]
## character(0)
## 
## [[426]]
## character(0)
## 
## [[427]]
## character(0)
## 
## [[428]]
## character(0)
## 
## [[429]]
## character(0)
## 
## [[430]]
## character(0)
## 
## [[431]]
## character(0)
## 
## [[432]]
## character(0)
## 
## [[433]]
## character(0)
## 
## [[434]]
## character(0)
## 
## [[435]]
## character(0)
## 
## [[436]]
## character(0)
## 
## [[437]]
## character(0)
## 
## [[438]]
## character(0)
## 
## [[439]]
## character(0)
## 
## [[440]]
## character(0)
## 
## [[441]]
## character(0)
## 
## [[442]]
## character(0)
## 
## [[443]]
## character(0)
## 
## [[444]]
## character(0)
## 
## [[445]]
## character(0)
## 
## [[446]]
## character(0)
## 
## [[447]]
## character(0)
## 
## [[448]]
## character(0)
## 
## [[449]]
## character(0)
## 
## [[450]]
## character(0)
## 
## [[451]]
## character(0)
## 
## [[452]]
## character(0)
## 
## [[453]]
## character(0)
## 
## [[454]]
## character(0)
## 
## [[455]]
## character(0)
## 
## [[456]]
## character(0)
## 
## [[457]]
## character(0)
## 
## [[458]]
## character(0)
## 
## [[459]]
## character(0)
## 
## [[460]]
## character(0)
## 
## [[461]]
## character(0)
## 
## [[462]]
## character(0)
## 
## [[463]]
## character(0)
## 
## [[464]]
## character(0)
## 
## [[465]]
## character(0)
## 
## [[466]]
## character(0)
## 
## [[467]]
## character(0)
## 
## [[468]]
## character(0)
## 
## [[469]]
## character(0)
## 
## [[470]]
## character(0)
## 
## [[471]]
## character(0)
## 
## [[472]]
## character(0)
## 
## [[473]]
## character(0)
## 
## [[474]]
## character(0)
## 
## [[475]]
## character(0)
## 
## [[476]]
## character(0)
## 
## [[477]]
## character(0)
## 
## [[478]]
## character(0)
## 
## [[479]]
## character(0)
## 
## [[480]]
## character(0)
## 
## [[481]]
## character(0)
## 
## [[482]]
## character(0)
## 
## [[483]]
## character(0)
## 
## [[484]]
## character(0)
## 
## [[485]]
## character(0)
## 
## [[486]]
## character(0)
## 
## [[487]]
## character(0)
## 
## [[488]]
## character(0)
## 
## [[489]]
## character(0)
## 
## [[490]]
## character(0)
## 
## [[491]]
## character(0)
## 
## [[492]]
## character(0)
## 
## [[493]]
## character(0)
## 
## [[494]]
## character(0)
## 
## [[495]]
## character(0)
## 
## [[496]]
## character(0)
## 
## [[497]]
## character(0)
## 
## [[498]]
## character(0)
## 
## [[499]]
## character(0)
## 
## [[500]]
## character(0)
## 
## [[501]]
## character(0)
## 
## [[502]]
## character(0)
## 
## [[503]]
## character(0)
## 
## [[504]]
## character(0)
## 
## [[505]]
## character(0)
## 
## [[506]]
## character(0)
## 
## [[507]]
## character(0)
## 
## [[508]]
## character(0)
## 
## [[509]]
## character(0)
## 
## [[510]]
## character(0)
## 
## [[511]]
## character(0)
## 
## [[512]]
## character(0)
## 
## [[513]]
## character(0)
## 
## [[514]]
## character(0)
## 
## [[515]]
## character(0)
## 
## [[516]]
## character(0)
## 
## [[517]]
## character(0)
## 
## [[518]]
## character(0)
## 
## [[519]]
## character(0)
## 
## [[520]]
## character(0)
## 
## [[521]]
## character(0)
## 
## [[522]]
## character(0)
## 
## [[523]]
## character(0)
## 
## [[524]]
## character(0)
## 
## [[525]]
## character(0)
## 
## [[526]]
## character(0)
## 
## [[527]]
## character(0)
## 
## [[528]]
## character(0)
## 
## [[529]]
## character(0)
## 
## [[530]]
## character(0)
## 
## [[531]]
## character(0)
## 
## [[532]]
## character(0)
## 
## [[533]]
## character(0)
## 
## [[534]]
## character(0)
## 
## [[535]]
## character(0)
## 
## [[536]]
## character(0)
## 
## [[537]]
## character(0)
## 
## [[538]]
## character(0)
## 
## [[539]]
## character(0)
## 
## [[540]]
## character(0)
## 
## [[541]]
## character(0)
## 
## [[542]]
## character(0)
## 
## [[543]]
## character(0)
## 
## [[544]]
## character(0)
## 
## [[545]]
## character(0)
## 
## [[546]]
## character(0)
## 
## [[547]]
## character(0)
## 
## [[548]]
## character(0)
## 
## [[549]]
## character(0)
## 
## [[550]]
## character(0)
## 
## [[551]]
## character(0)
## 
## [[552]]
## character(0)
## 
## [[553]]
## character(0)
## 
## [[554]]
## character(0)
## 
## [[555]]
## character(0)
## 
## [[556]]
## character(0)
## 
## [[557]]
## character(0)
## 
## [[558]]
## character(0)
## 
## [[559]]
## character(0)
## 
## [[560]]
## character(0)
## 
## [[561]]
## character(0)
## 
## [[562]]
## character(0)
## 
## [[563]]
## character(0)
## 
## [[564]]
## character(0)
## 
## [[565]]
## character(0)
## 
## [[566]]
## character(0)
## 
## [[567]]
## character(0)
## 
## [[568]]
## character(0)
## 
## [[569]]
## character(0)
## 
## [[570]]
## character(0)
## 
## [[571]]
## character(0)
## 
## [[572]]
## character(0)
## 
## [[573]]
## character(0)
## 
## [[574]]
## character(0)
## 
## [[575]]
## character(0)
## 
## [[576]]
## character(0)
## 
## [[577]]
## character(0)
## 
## [[578]]
## character(0)
## 
## [[579]]
## character(0)
## 
## [[580]]
## character(0)
## 
## [[581]]
## character(0)
## 
## [[582]]
## character(0)
## 
## [[583]]
## character(0)
## 
## [[584]]
## character(0)
## 
## [[585]]
## character(0)
## 
## [[586]]
## character(0)
## 
## [[587]]
## character(0)
## 
## [[588]]
## character(0)
## 
## [[589]]
## character(0)
## 
## [[590]]
## character(0)
## 
## [[591]]
## character(0)
## 
## [[592]]
## character(0)
## 
## [[593]]
## character(0)
## 
## [[594]]
## character(0)
## 
## [[595]]
## character(0)
## 
## [[596]]
## character(0)
## 
## [[597]]
## character(0)
## 
## [[598]]
## character(0)
## 
## [[599]]
## character(0)
## 
## [[600]]
## character(0)
## 
## [[601]]
## character(0)
## 
## [[602]]
## character(0)
## 
## [[603]]
## character(0)
## 
## [[604]]
## character(0)
## 
## [[605]]
## character(0)
## 
## [[606]]
## character(0)
## 
## [[607]]
## character(0)
## 
## [[608]]
## character(0)
## 
## [[609]]
## character(0)
## 
## [[610]]
## character(0)
## 
## [[611]]
## character(0)
## 
## [[612]]
## character(0)
## 
## [[613]]
## character(0)
## 
## [[614]]
## character(0)
## 
## [[615]]
## character(0)
## 
## [[616]]
## character(0)
## 
## [[617]]
## character(0)
## 
## [[618]]
## character(0)
## 
## [[619]]
## character(0)
## 
## [[620]]
## character(0)
## 
## [[621]]
## character(0)
## 
## [[622]]
## character(0)
## 
## [[623]]
## character(0)
## 
## [[624]]
## character(0)
## 
## [[625]]
## character(0)
## 
## [[626]]
## character(0)
## 
## [[627]]
## character(0)
## 
## [[628]]
## character(0)
## 
## [[629]]
## character(0)
## 
## [[630]]
## character(0)
## 
## [[631]]
## character(0)
## 
## [[632]]
## character(0)
## 
## [[633]]
## character(0)
## 
## [[634]]
## character(0)
## 
## [[635]]
## character(0)
## 
## [[636]]
## character(0)
## 
## [[637]]
## character(0)
## 
## [[638]]
## character(0)
## 
## [[639]]
## character(0)
## 
## [[640]]
## character(0)
## 
## [[641]]
## character(0)
## 
## [[642]]
## character(0)
## 
## [[643]]
## character(0)
## 
## [[644]]
## character(0)
## 
## [[645]]
## character(0)
## 
## [[646]]
## character(0)
## 
## [[647]]
## character(0)
## 
## [[648]]
## character(0)
## 
## [[649]]
## character(0)
## 
## [[650]]
## character(0)
## 
## [[651]]
## character(0)
## 
## [[652]]
## character(0)
## 
## [[653]]
## character(0)
## 
## [[654]]
## character(0)
## 
## [[655]]
## character(0)
## 
## [[656]]
## character(0)
## 
## [[657]]
## character(0)
## 
## [[658]]
## character(0)
## 
## [[659]]
## character(0)
## 
## [[660]]
## character(0)
## 
## [[661]]
## character(0)
## 
## [[662]]
## character(0)
## 
## [[663]]
## character(0)
## 
## [[664]]
## character(0)
## 
## [[665]]
## character(0)
## 
## [[666]]
## character(0)
## 
## [[667]]
## character(0)
## 
## [[668]]
## character(0)
## 
## [[669]]
## character(0)
## 
## [[670]]
## character(0)
## 
## [[671]]
## character(0)
## 
## [[672]]
## character(0)
## 
## [[673]]
## character(0)
## 
## [[674]]
## character(0)
## 
## [[675]]
## character(0)
## 
## [[676]]
## character(0)
## 
## [[677]]
## character(0)
## 
## [[678]]
## character(0)
## 
## [[679]]
## character(0)
## 
## [[680]]
## character(0)
## 
## [[681]]
## character(0)
## 
## [[682]]
## character(0)
## 
## [[683]]
## character(0)
## 
## [[684]]
## character(0)
## 
## [[685]]
## character(0)
## 
## [[686]]
## character(0)
## 
## [[687]]
## character(0)
## 
## [[688]]
## character(0)
## 
## [[689]]
## character(0)
## 
## [[690]]
## character(0)
## 
## [[691]]
## character(0)
## 
## [[692]]
## character(0)
## 
## [[693]]
## character(0)
## 
## [[694]]
## character(0)
## 
## [[695]]
## character(0)
## 
## [[696]]
## character(0)
## 
## [[697]]
## character(0)

str_extract_all, simplify=TRUE

str_extract_all(beauty$person, "CRONY[0-9\\s]*", simplify = T)
##        [,1]     
##   [1,] ""       
##   [2,] ""       
##   [3,] ""       
##   [4,] ""       
##   [5,] ""       
##   [6,] ""       
##   [7,] ""       
##   [8,] ""       
##   [9,] ""       
##  [10,] ""       
##  [11,] ""       
##  [12,] ""       
##  [13,] ""       
##  [14,] ""       
##  [15,] ""       
##  [16,] ""       
##  [17,] ""       
##  [18,] ""       
##  [19,] ""       
##  [20,] ""       
##  [21,] ""       
##  [22,] ""       
##  [23,] ""       
##  [24,] ""       
##  [25,] ""       
##  [26,] ""       
##  [27,] ""       
##  [28,] ""       
##  [29,] ""       
##  [30,] ""       
##  [31,] ""       
##  [32,] ""       
##  [33,] ""       
##  [34,] ""       
##  [35,] ""       
##  [36,] ""       
##  [37,] ""       
##  [38,] ""       
##  [39,] ""       
##  [40,] ""       
##  [41,] ""       
##  [42,] ""       
##  [43,] ""       
##  [44,] ""       
##  [45,] ""       
##  [46,] ""       
##  [47,] ""       
##  [48,] ""       
##  [49,] ""       
##  [50,] ""       
##  [51,] ""       
##  [52,] ""       
##  [53,] ""       
##  [54,] ""       
##  [55,] ""       
##  [56,] ""       
##  [57,] ""       
##  [58,] ""       
##  [59,] ""       
##  [60,] ""       
##  [61,] ""       
##  [62,] ""       
##  [63,] ""       
##  [64,] ""       
##  [65,] ""       
##  [66,] ""       
##  [67,] ""       
##  [68,] ""       
##  [69,] ""       
##  [70,] ""       
##  [71,] ""       
##  [72,] ""       
##  [73,] ""       
##  [74,] ""       
##  [75,] ""       
##  [76,] ""       
##  [77,] ""       
##  [78,] ""       
##  [79,] ""       
##  [80,] ""       
##  [81,] ""       
##  [82,] ""       
##  [83,] ""       
##  [84,] ""       
##  [85,] ""       
##  [86,] ""       
##  [87,] ""       
##  [88,] ""       
##  [89,] ""       
##  [90,] ""       
##  [91,] ""       
##  [92,] ""       
##  [93,] ""       
##  [94,] ""       
##  [95,] ""       
##  [96,] ""       
##  [97,] ""       
##  [98,] ""       
##  [99,] ""       
## [100,] ""       
## [101,] ""       
## [102,] ""       
## [103,] ""       
## [104,] ""       
## [105,] ""       
## [106,] ""       
## [107,] ""       
## [108,] ""       
## [109,] ""       
## [110,] ""       
## [111,] ""       
## [112,] ""       
## [113,] ""       
## [114,] ""       
## [115,] ""       
## [116,] ""       
## [117,] ""       
## [118,] ""       
## [119,] ""       
## [120,] ""       
## [121,] ""       
## [122,] ""       
## [123,] ""       
## [124,] ""       
## [125,] ""       
## [126,] ""       
## [127,] ""       
## [128,] ""       
## [129,] ""       
## [130,] ""       
## [131,] ""       
## [132,] ""       
## [133,] ""       
## [134,] ""       
## [135,] ""       
## [136,] ""       
## [137,] ""       
## [138,] ""       
## [139,] ""       
## [140,] ""       
## [141,] ""       
## [142,] ""       
## [143,] ""       
## [144,] ""       
## [145,] ""       
## [146,] ""       
## [147,] ""       
## [148,] ""       
## [149,] ""       
## [150,] ""       
## [151,] ""       
## [152,] ""       
## [153,] ""       
## [154,] ""       
## [155,] ""       
## [156,] ""       
## [157,] ""       
## [158,] ""       
## [159,] ""       
## [160,] ""       
## [161,] ""       
## [162,] ""       
## [163,] ""       
## [164,] ""       
## [165,] ""       
## [166,] ""       
## [167,] ""       
## [168,] ""       
## [169,] ""       
## [170,] ""       
## [171,] ""       
## [172,] ""       
## [173,] ""       
## [174,] ""       
## [175,] ""       
## [176,] ""       
## [177,] ""       
## [178,] ""       
## [179,] ""       
## [180,] ""       
## [181,] ""       
## [182,] ""       
## [183,] ""       
## [184,] ""       
## [185,] ""       
## [186,] ""       
## [187,] ""       
## [188,] ""       
## [189,] ""       
## [190,] ""       
## [191,] ""       
## [192,] ""       
## [193,] ""       
## [194,] ""       
## [195,] ""       
## [196,] ""       
## [197,] ""       
## [198,] ""       
## [199,] ""       
## [200,] ""       
## [201,] ""       
## [202,] ""       
## [203,] ""       
## [204,] ""       
## [205,] ""       
## [206,] ""       
## [207,] ""       
## [208,] ""       
## [209,] ""       
## [210,] ""       
## [211,] ""       
## [212,] ""       
## [213,] ""       
## [214,] ""       
## [215,] ""       
## [216,] ""       
## [217,] ""       
## [218,] ""       
## [219,] ""       
## [220,] ""       
## [221,] ""       
## [222,] ""       
## [223,] ""       
## [224,] ""       
## [225,] ""       
## [226,] ""       
## [227,] ""       
## [228,] ""       
## [229,] ""       
## [230,] ""       
## [231,] ""       
## [232,] ""       
## [233,] ""       
## [234,] ""       
## [235,] ""       
## [236,] ""       
## [237,] ""       
## [238,] ""       
## [239,] ""       
## [240,] ""       
## [241,] ""       
## [242,] ""       
## [243,] ""       
## [244,] ""       
## [245,] ""       
## [246,] ""       
## [247,] ""       
## [248,] ""       
## [249,] ""       
## [250,] ""       
## [251,] ""       
## [252,] ""       
## [253,] ""       
## [254,] ""       
## [255,] ""       
## [256,] ""       
## [257,] ""       
## [258,] ""       
## [259,] ""       
## [260,] ""       
## [261,] ""       
## [262,] ""       
## [263,] ""       
## [264,] ""       
## [265,] ""       
## [266,] ""       
## [267,] ""       
## [268,] ""       
## [269,] ""       
## [270,] ""       
## [271,] ""       
## [272,] ""       
## [273,] ""       
## [274,] ""       
## [275,] ""       
## [276,] ""       
## [277,] ""       
## [278,] ""       
## [279,] ""       
## [280,] ""       
## [281,] ""       
## [282,] ""       
## [283,] ""       
## [284,] ""       
## [285,] ""       
## [286,] ""       
## [287,] ""       
## [288,] ""       
## [289,] ""       
## [290,] ""       
## [291,] ""       
## [292,] ""       
## [293,] "CRONY 1"
## [294,] ""       
## [295,] "CRONY 2"
## [296,] ""       
## [297,] "CRONY 3"
## [298,] ""       
## [299,] ""       
## [300,] ""       
## [301,] "CRONY 1"
## [302,] ""       
## [303,] ""       
## [304,] ""       
## [305,] ""       
## [306,] ""       
## [307,] ""       
## [308,] ""       
## [309,] ""       
## [310,] ""       
## [311,] ""       
## [312,] ""       
## [313,] ""       
## [314,] ""       
## [315,] ""       
## [316,] ""       
## [317,] ""       
## [318,] ""       
## [319,] ""       
## [320,] ""       
## [321,] ""       
## [322,] ""       
## [323,] ""       
## [324,] ""       
## [325,] ""       
## [326,] ""       
## [327,] ""       
## [328,] ""       
## [329,] ""       
## [330,] ""       
## [331,] ""       
## [332,] ""       
## [333,] ""       
## [334,] ""       
## [335,] ""       
## [336,] ""       
## [337,] ""       
## [338,] ""       
## [339,] ""       
## [340,] ""       
## [341,] ""       
## [342,] ""       
## [343,] ""       
## [344,] ""       
## [345,] ""       
## [346,] ""       
## [347,] ""       
## [348,] ""       
## [349,] ""       
## [350,] ""       
## [351,] ""       
## [352,] ""       
## [353,] ""       
## [354,] ""       
## [355,] ""       
## [356,] ""       
## [357,] ""       
## [358,] ""       
## [359,] ""       
## [360,] ""       
## [361,] ""       
## [362,] ""       
## [363,] ""       
## [364,] ""       
## [365,] ""       
## [366,] ""       
## [367,] ""       
## [368,] ""       
## [369,] ""       
## [370,] ""       
## [371,] ""       
## [372,] ""       
## [373,] ""       
## [374,] ""       
## [375,] ""       
## [376,] ""       
## [377,] ""       
## [378,] ""       
## [379,] ""       
## [380,] ""       
## [381,] ""       
## [382,] ""       
## [383,] ""       
## [384,] ""       
## [385,] ""       
## [386,] ""       
## [387,] ""       
## [388,] ""       
## [389,] ""       
## [390,] ""       
## [391,] ""       
## [392,] ""       
## [393,] ""       
## [394,] ""       
## [395,] ""       
## [396,] ""       
## [397,] ""       
## [398,] ""       
## [399,] ""       
## [400,] ""       
## [401,] ""       
## [402,] ""       
## [403,] ""       
## [404,] ""       
## [405,] ""       
## [406,] ""       
## [407,] ""       
## [408,] ""       
## [409,] ""       
## [410,] ""       
## [411,] ""       
## [412,] ""       
## [413,] ""       
## [414,] ""       
## [415,] ""       
## [416,] ""       
## [417,] ""       
## [418,] ""       
## [419,] ""       
## [420,] ""       
## [421,] ""       
## [422,] ""       
## [423,] ""       
## [424,] ""       
## [425,] ""       
## [426,] ""       
## [427,] ""       
## [428,] ""       
## [429,] ""       
## [430,] ""       
## [431,] ""       
## [432,] ""       
## [433,] ""       
## [434,] ""       
## [435,] ""       
## [436,] ""       
## [437,] ""       
## [438,] ""       
## [439,] ""       
## [440,] ""       
## [441,] ""       
## [442,] ""       
## [443,] ""       
## [444,] ""       
## [445,] ""       
## [446,] ""       
## [447,] ""       
## [448,] ""       
## [449,] ""       
## [450,] ""       
## [451,] ""       
## [452,] ""       
## [453,] ""       
## [454,] ""       
## [455,] ""       
## [456,] ""       
## [457,] ""       
## [458,] ""       
## [459,] ""       
## [460,] ""       
## [461,] ""       
## [462,] ""       
## [463,] ""       
## [464,] ""       
## [465,] ""       
## [466,] ""       
## [467,] ""       
## [468,] ""       
## [469,] ""       
## [470,] ""       
## [471,] ""       
## [472,] ""       
## [473,] ""       
## [474,] ""       
## [475,] ""       
## [476,] ""       
## [477,] ""       
## [478,] ""       
## [479,] ""       
## [480,] ""       
## [481,] ""       
## [482,] ""       
## [483,] ""       
## [484,] ""       
## [485,] ""       
## [486,] ""       
## [487,] ""       
## [488,] ""       
## [489,] ""       
## [490,] ""       
## [491,] ""       
## [492,] ""       
## [493,] ""       
## [494,] ""       
## [495,] ""       
## [496,] ""       
## [497,] ""       
## [498,] ""       
## [499,] ""       
## [500,] ""       
## [501,] ""       
## [502,] ""       
## [503,] ""       
## [504,] ""       
## [505,] ""       
## [506,] ""       
## [507,] ""       
## [508,] ""       
## [509,] ""       
## [510,] ""       
## [511,] ""       
## [512,] ""       
## [513,] ""       
## [514,] ""       
## [515,] ""       
## [516,] ""       
## [517,] ""       
## [518,] ""       
## [519,] ""       
## [520,] ""       
## [521,] ""       
## [522,] ""       
## [523,] ""       
## [524,] ""       
## [525,] ""       
## [526,] ""       
## [527,] ""       
## [528,] ""       
## [529,] ""       
## [530,] ""       
## [531,] ""       
## [532,] ""       
## [533,] ""       
## [534,] ""       
## [535,] ""       
## [536,] ""       
## [537,] ""       
## [538,] ""       
## [539,] ""       
## [540,] ""       
## [541,] ""       
## [542,] ""       
## [543,] ""       
## [544,] ""       
## [545,] ""       
## [546,] ""       
## [547,] ""       
## [548,] ""       
## [549,] ""       
## [550,] ""       
## [551,] ""       
## [552,] ""       
## [553,] ""       
## [554,] ""       
## [555,] ""       
## [556,] ""       
## [557,] ""       
## [558,] ""       
## [559,] ""       
## [560,] ""       
## [561,] ""       
## [562,] ""       
## [563,] ""       
## [564,] ""       
## [565,] ""       
## [566,] ""       
## [567,] ""       
## [568,] ""       
## [569,] ""       
## [570,] ""       
## [571,] ""       
## [572,] ""       
## [573,] ""       
## [574,] ""       
## [575,] ""       
## [576,] ""       
## [577,] ""       
## [578,] ""       
## [579,] ""       
## [580,] ""       
## [581,] ""       
## [582,] ""       
## [583,] ""       
## [584,] ""       
## [585,] ""       
## [586,] ""       
## [587,] ""       
## [588,] ""       
## [589,] ""       
## [590,] ""       
## [591,] ""       
## [592,] ""       
## [593,] ""       
## [594,] ""       
## [595,] ""       
## [596,] ""       
## [597,] ""       
## [598,] ""       
## [599,] ""       
## [600,] ""       
## [601,] ""       
## [602,] ""       
## [603,] ""       
## [604,] ""       
## [605,] ""       
## [606,] ""       
## [607,] ""       
## [608,] ""       
## [609,] ""       
## [610,] ""       
## [611,] ""       
## [612,] ""       
## [613,] ""       
## [614,] ""       
## [615,] ""       
## [616,] ""       
## [617,] ""       
## [618,] ""       
## [619,] ""       
## [620,] ""       
## [621,] ""       
## [622,] ""       
## [623,] ""       
## [624,] ""       
## [625,] ""       
## [626,] ""       
## [627,] ""       
## [628,] ""       
## [629,] ""       
## [630,] ""       
## [631,] ""       
## [632,] ""       
## [633,] ""       
## [634,] ""       
## [635,] ""       
## [636,] ""       
## [637,] ""       
## [638,] ""       
## [639,] ""       
## [640,] ""       
## [641,] ""       
## [642,] ""       
## [643,] ""       
## [644,] ""       
## [645,] ""       
## [646,] ""       
## [647,] ""       
## [648,] ""       
## [649,] ""       
## [650,] ""       
## [651,] ""       
## [652,] ""       
## [653,] ""       
## [654,] ""       
## [655,] ""       
## [656,] ""       
## [657,] ""       
## [658,] ""       
## [659,] ""       
## [660,] ""       
## [661,] ""       
## [662,] ""       
## [663,] ""       
## [664,] ""       
## [665,] ""       
## [666,] ""       
## [667,] ""       
## [668,] ""       
## [669,] ""       
## [670,] ""       
## [671,] ""       
## [672,] ""       
## [673,] ""       
## [674,] ""       
## [675,] ""       
## [676,] ""       
## [677,] ""       
## [678,] ""       
## [679,] ""       
## [680,] ""       
## [681,] ""       
## [682,] ""       
## [683,] ""       
## [684,] ""       
## [685,] ""       
## [686,] ""       
## [687,] ""       
## [688,] ""       
## [689,] ""       
## [690,] ""       
## [691,] ""       
## [692,] ""       
## [693,] ""       
## [694,] ""       
## [695,] ""       
## [696,] ""       
## [697,] ""

What is this regex finding?

str_detect(beauty$person, "CRONY[0-9\\s]*")
beauty$person[str_detect(beauty$person, "CRONY[0-9\\s]*")]

str_detect

str_detect(beauty$person, "CRONY[0-9\\s]*")
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [100] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [155] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [166] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [188] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [199] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [210] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [221] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [232] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [243] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [254] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [276] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [287] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE
## [298] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [309] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [320] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [331] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [342] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [364] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [375] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [386] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [397] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [408] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [419] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [430] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [441] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [452] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [463] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [474] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [485] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [496] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [507] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [518] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [540] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [551] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [562] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [573] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [584] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [595] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [606] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [617] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [628] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [639] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [650] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [661] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [672] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [683] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [694] FALSE FALSE FALSE FALSE

look at detected strings

str_detect(beauty$person, "CRONY[0-9\\s]*")
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [100] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [155] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [166] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [188] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [199] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [210] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [221] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [232] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [243] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [254] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [276] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [287] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE
## [298] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [309] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [320] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [331] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [342] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [364] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [375] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [386] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [397] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [408] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [419] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [430] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [441] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [452] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [463] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [474] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [485] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [496] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [507] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [518] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [540] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [551] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [562] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [573] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [584] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [595] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [606] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [617] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [628] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [639] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [650] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [661] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [672] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [683] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [694] FALSE FALSE FALSE FALSE

Two different ways

# first way
str_extract(beauty$person, "CRONY[0-9\\s]*")
##   [1] NA        NA        NA        NA        NA        NA        NA       
##   [8] NA        NA        NA        NA        NA        NA        NA       
##  [15] NA        NA        NA        NA        NA        NA        NA       
##  [22] NA        NA        NA        NA        NA        NA        NA       
##  [29] NA        NA        NA        NA        NA        NA        NA       
##  [36] NA        NA        NA        NA        NA        NA        NA       
##  [43] NA        NA        NA        NA        NA        NA        NA       
##  [50] NA        NA        NA        NA        NA        NA        NA       
##  [57] NA        NA        NA        NA        NA        NA        NA       
##  [64] NA        NA        NA        NA        NA        NA        NA       
##  [71] NA        NA        NA        NA        NA        NA        NA       
##  [78] NA        NA        NA        NA        NA        NA        NA       
##  [85] NA        NA        NA        NA        NA        NA        NA       
##  [92] NA        NA        NA        NA        NA        NA        NA       
##  [99] NA        NA        NA        NA        NA        NA        NA       
## [106] NA        NA        NA        NA        NA        NA        NA       
## [113] NA        NA        NA        NA        NA        NA        NA       
## [120] NA        NA        NA        NA        NA        NA        NA       
## [127] NA        NA        NA        NA        NA        NA        NA       
## [134] NA        NA        NA        NA        NA        NA        NA       
## [141] NA        NA        NA        NA        NA        NA        NA       
## [148] NA        NA        NA        NA        NA        NA        NA       
## [155] NA        NA        NA        NA        NA        NA        NA       
## [162] NA        NA        NA        NA        NA        NA        NA       
## [169] NA        NA        NA        NA        NA        NA        NA       
## [176] NA        NA        NA        NA        NA        NA        NA       
## [183] NA        NA        NA        NA        NA        NA        NA       
## [190] NA        NA        NA        NA        NA        NA        NA       
## [197] NA        NA        NA        NA        NA        NA        NA       
## [204] NA        NA        NA        NA        NA        NA        NA       
## [211] NA        NA        NA        NA        NA        NA        NA       
## [218] NA        NA        NA        NA        NA        NA        NA       
## [225] NA        NA        NA        NA        NA        NA        NA       
## [232] NA        NA        NA        NA        NA        NA        NA       
## [239] NA        NA        NA        NA        NA        NA        NA       
## [246] NA        NA        NA        NA        NA        NA        NA       
## [253] NA        NA        NA        NA        NA        NA        NA       
## [260] NA        NA        NA        NA        NA        NA        NA       
## [267] NA        NA        NA        NA        NA        NA        NA       
## [274] NA        NA        NA        NA        NA        NA        NA       
## [281] NA        NA        NA        NA        NA        NA        NA       
## [288] NA        NA        NA        NA        NA        "CRONY 1" NA       
## [295] "CRONY 2" NA        "CRONY 3" NA        NA        NA        "CRONY 1"
## [302] NA        NA        NA        NA        NA        NA        NA       
## [309] NA        NA        NA        NA        NA        NA        NA       
## [316] NA        NA        NA        NA        NA        NA        NA       
## [323] NA        NA        NA        NA        NA        NA        NA       
## [330] NA        NA        NA        NA        NA        NA        NA       
## [337] NA        NA        NA        NA        NA        NA        NA       
## [344] NA        NA        NA        NA        NA        NA        NA       
## [351] NA        NA        NA        NA        NA        NA        NA       
## [358] NA        NA        NA        NA        NA        NA        NA       
## [365] NA        NA        NA        NA        NA        NA        NA       
## [372] NA        NA        NA        NA        NA        NA        NA       
## [379] NA        NA        NA        NA        NA        NA        NA       
## [386] NA        NA        NA        NA        NA        NA        NA       
## [393] NA        NA        NA        NA        NA        NA        NA       
## [400] NA        NA        NA        NA        NA        NA        NA       
## [407] NA        NA        NA        NA        NA        NA        NA       
## [414] NA        NA        NA        NA        NA        NA        NA       
## [421] NA        NA        NA        NA        NA        NA        NA       
## [428] NA        NA        NA        NA        NA        NA        NA       
## [435] NA        NA        NA        NA        NA        NA        NA       
## [442] NA        NA        NA        NA        NA        NA        NA       
## [449] NA        NA        NA        NA        NA        NA        NA       
## [456] NA        NA        NA        NA        NA        NA        NA       
## [463] NA        NA        NA        NA        NA        NA        NA       
## [470] NA        NA        NA        NA        NA        NA        NA       
## [477] NA        NA        NA        NA        NA        NA        NA       
## [484] NA        NA        NA        NA        NA        NA        NA       
## [491] NA        NA        NA        NA        NA        NA        NA       
## [498] NA        NA        NA        NA        NA        NA        NA       
## [505] NA        NA        NA        NA        NA        NA        NA       
## [512] NA        NA        NA        NA        NA        NA        NA       
## [519] NA        NA        NA        NA        NA        NA        NA       
## [526] NA        NA        NA        NA        NA        NA        NA       
## [533] NA        NA        NA        NA        NA        NA        NA       
## [540] NA        NA        NA        NA        NA        NA        NA       
## [547] NA        NA        NA        NA        NA        NA        NA       
## [554] NA        NA        NA        NA        NA        NA        NA       
## [561] NA        NA        NA        NA        NA        NA        NA       
## [568] NA        NA        NA        NA        NA        NA        NA       
## [575] NA        NA        NA        NA        NA        NA        NA       
## [582] NA        NA        NA        NA        NA        NA        NA       
## [589] NA        NA        NA        NA        NA        NA        NA       
## [596] NA        NA        NA        NA        NA        NA        NA       
## [603] NA        NA        NA        NA        NA        NA        NA       
## [610] NA        NA        NA        NA        NA        NA        NA       
## [617] NA        NA        NA        NA        NA        NA        NA       
## [624] NA        NA        NA        NA        NA        NA        NA       
## [631] NA        NA        NA        NA        NA        NA        NA       
## [638] NA        NA        NA        NA        NA        NA        NA       
## [645] NA        NA        NA        NA        NA        NA        NA       
## [652] NA        NA        NA        NA        NA        NA        NA       
## [659] NA        NA        NA        NA        NA        NA        NA       
## [666] NA        NA        NA        NA        NA        NA        NA       
## [673] NA        NA        NA        NA        NA        NA        NA       
## [680] NA        NA        NA        NA        NA        NA        NA       
## [687] NA        NA        NA        NA        NA        NA        NA       
## [694] NA        NA        NA        NA
# second way
str_extract(beauty$person, "CRONY\\s[0-9]")
##   [1] NA        NA        NA        NA        NA        NA        NA       
##   [8] NA        NA        NA        NA        NA        NA        NA       
##  [15] NA        NA        NA        NA        NA        NA        NA       
##  [22] NA        NA        NA        NA        NA        NA        NA       
##  [29] NA        NA        NA        NA        NA        NA        NA       
##  [36] NA        NA        NA        NA        NA        NA        NA       
##  [43] NA        NA        NA        NA        NA        NA        NA       
##  [50] NA        NA        NA        NA        NA        NA        NA       
##  [57] NA        NA        NA        NA        NA        NA        NA       
##  [64] NA        NA        NA        NA        NA        NA        NA       
##  [71] NA        NA        NA        NA        NA        NA        NA       
##  [78] NA        NA        NA        NA        NA        NA        NA       
##  [85] NA        NA        NA        NA        NA        NA        NA       
##  [92] NA        NA        NA        NA        NA        NA        NA       
##  [99] NA        NA        NA        NA        NA        NA        NA       
## [106] NA        NA        NA        NA        NA        NA        NA       
## [113] NA        NA        NA        NA        NA        NA        NA       
## [120] NA        NA        NA        NA        NA        NA        NA       
## [127] NA        NA        NA        NA        NA        NA        NA       
## [134] NA        NA        NA        NA        NA        NA        NA       
## [141] NA        NA        NA        NA        NA        NA        NA       
## [148] NA        NA        NA        NA        NA        NA        NA       
## [155] NA        NA        NA        NA        NA        NA        NA       
## [162] NA        NA        NA        NA        NA        NA        NA       
## [169] NA        NA        NA        NA        NA        NA        NA       
## [176] NA        NA        NA        NA        NA        NA        NA       
## [183] NA        NA        NA        NA        NA        NA        NA       
## [190] NA        NA        NA        NA        NA        NA        NA       
## [197] NA        NA        NA        NA        NA        NA        NA       
## [204] NA        NA        NA        NA        NA        NA        NA       
## [211] NA        NA        NA        NA        NA        NA        NA       
## [218] NA        NA        NA        NA        NA        NA        NA       
## [225] NA        NA        NA        NA        NA        NA        NA       
## [232] NA        NA        NA        NA        NA        NA        NA       
## [239] NA        NA        NA        NA        NA        NA        NA       
## [246] NA        NA        NA        NA        NA        NA        NA       
## [253] NA        NA        NA        NA        NA        NA        NA       
## [260] NA        NA        NA        NA        NA        NA        NA       
## [267] NA        NA        NA        NA        NA        NA        NA       
## [274] NA        NA        NA        NA        NA        NA        NA       
## [281] NA        NA        NA        NA        NA        NA        NA       
## [288] NA        NA        NA        NA        NA        "CRONY 1" NA       
## [295] "CRONY 2" NA        "CRONY 3" NA        NA        NA        "CRONY 1"
## [302] NA        NA        NA        NA        NA        NA        NA       
## [309] NA        NA        NA        NA        NA        NA        NA       
## [316] NA        NA        NA        NA        NA        NA        NA       
## [323] NA        NA        NA        NA        NA        NA        NA       
## [330] NA        NA        NA        NA        NA        NA        NA       
## [337] NA        NA        NA        NA        NA        NA        NA       
## [344] NA        NA        NA        NA        NA        NA        NA       
## [351] NA        NA        NA        NA        NA        NA        NA       
## [358] NA        NA        NA        NA        NA        NA        NA       
## [365] NA        NA        NA        NA        NA        NA        NA       
## [372] NA        NA        NA        NA        NA        NA        NA       
## [379] NA        NA        NA        NA        NA        NA        NA       
## [386] NA        NA        NA        NA        NA        NA        NA       
## [393] NA        NA        NA        NA        NA        NA        NA       
## [400] NA        NA        NA        NA        NA        NA        NA       
## [407] NA        NA        NA        NA        NA        NA        NA       
## [414] NA        NA        NA        NA        NA        NA        NA       
## [421] NA        NA        NA        NA        NA        NA        NA       
## [428] NA        NA        NA        NA        NA        NA        NA       
## [435] NA        NA        NA        NA        NA        NA        NA       
## [442] NA        NA        NA        NA        NA        NA        NA       
## [449] NA        NA        NA        NA        NA        NA        NA       
## [456] NA        NA        NA        NA        NA        NA        NA       
## [463] NA        NA        NA        NA        NA        NA        NA       
## [470] NA        NA        NA        NA        NA        NA        NA       
## [477] NA        NA        NA        NA        NA        NA        NA       
## [484] NA        NA        NA        NA        NA        NA        NA       
## [491] NA        NA        NA        NA        NA        NA        NA       
## [498] NA        NA        NA        NA        NA        NA        NA       
## [505] NA        NA        NA        NA        NA        NA        NA       
## [512] NA        NA        NA        NA        NA        NA        NA       
## [519] NA        NA        NA        NA        NA        NA        NA       
## [526] NA        NA        NA        NA        NA        NA        NA       
## [533] NA        NA        NA        NA        NA        NA        NA       
## [540] NA        NA        NA        NA        NA        NA        NA       
## [547] NA        NA        NA        NA        NA        NA        NA       
## [554] NA        NA        NA        NA        NA        NA        NA       
## [561] NA        NA        NA        NA        NA        NA        NA       
## [568] NA        NA        NA        NA        NA        NA        NA       
## [575] NA        NA        NA        NA        NA        NA        NA       
## [582] NA        NA        NA        NA        NA        NA        NA       
## [589] NA        NA        NA        NA        NA        NA        NA       
## [596] NA        NA        NA        NA        NA        NA        NA       
## [603] NA        NA        NA        NA        NA        NA        NA       
## [610] NA        NA        NA        NA        NA        NA        NA       
## [617] NA        NA        NA        NA        NA        NA        NA       
## [624] NA        NA        NA        NA        NA        NA        NA       
## [631] NA        NA        NA        NA        NA        NA        NA       
## [638] NA        NA        NA        NA        NA        NA        NA       
## [645] NA        NA        NA        NA        NA        NA        NA       
## [652] NA        NA        NA        NA        NA        NA        NA       
## [659] NA        NA        NA        NA        NA        NA        NA       
## [666] NA        NA        NA        NA        NA        NA        NA       
## [673] NA        NA        NA        NA        NA        NA        NA       
## [680] NA        NA        NA        NA        NA        NA        NA       
## [687] NA        NA        NA        NA        NA        NA        NA       
## [694] NA        NA        NA        NA

Question 2

str_extract('CRONY 0 9 0 9', "CRONY[0-9\\s]*")
str_extract('CRONY 0 9 0 9', "CRONY\\s[0-9]")

Putting it all together

beauty$person <- str_replace_all(beauty$person, "TOWNSFOLK[0-9\\s]*", "townsfolk") %>%
  str_replace_all("CRONY[0-9\\s]*|CRONIES|OLD CRONIES", "crony") %>%
  str_replace_all("WOMAN[0-9\\s]*|BIMBETTES?[0-9\\s]*", "woman") %>%
  str_replace_all("MAN[0-9\\s]*|MEN", "man") %>%
  str_replace_all("GROUP[0-9\\s]*|ALL|BOTH|CHORUS|OBJECTS|BYSTANDERS|MUGS|MOB", "group") %>%
  tolower

splitting a string

str_split('the quick brown fox', '\\s', simplify = T)
##      [,1]  [,2]    [,3]    [,4] 
## [1,] "the" "quick" "brown" "fox"

count number of words in each line

beauty %>% 
    mutate(num_words=???)

count number of words in each line

beauty %>% 
    mutate(num_words=length(str_split(line, '\\s', simplify = T ))) %>%
    select(num_words)
## # A tibble: 697 × 1
##    num_words
##        <int>
## 1     165189
## 2     165189
## 3     165189
## 4     165189
## 5     165189
## 6     165189
## 7     165189
## 8     165189
## 9     165189
## 10    165189
## # ... with 687 more rows

line counts

group_by(beauty, person) %>% 
  summarise(N = sum(nchar(line))) %>%
  arrange(desc(N)) %>% slice(1:10) %>%
  ggplot(data = ., aes(x = person, y = N)) + 
  geom_bar(stat = "identity") +
  theme_minimal() + theme(axis.text.x  = element_text(angle=75, vjust=0.5, size=10), axis.title.x = element_blank())

Clean the raw data

Download the raw text data

# read the raw data in
beauty <- read_lines('http://www.fpx.de/fp/Disney/Scripts/BeautyAndTheBeast.txt')

Goal

The goal is to break this long string up into the lines each character speaks. Specifically,

look for a way to identify transitions between lines

The plan

Collapse

# collapse each line into a single string
# separate lines by a ;
beauty <- beauty%>% 
           str_trim(side = "both") %>% 
           paste(collapse = ";")



str_sub(beauty, 1, 100)
## [1] "<pre>;Beauty and the Beast;The Complete Script;;Compiled by Ben Scripps <34rqnpq@cmuvm.csv.cmich.edu"
# To avoid annoying issues later, since we don't try to distinguish individuals in group dialogue
beauty <- str_replace(beauty, " \\(ex. COGSWORTH\\):", ":") %>% str_replace(" \\(esp. LUMIERE\\):", ":")

Step 2: Extraction, first try

We want a data frame with one column for the dialogue identifier (speaker name) and one for the line. Since every line starts with an identifier, we could try to:

Won’t work!

look ahead/behind

test case

test <- "I don't usually leave the asylum in the middle of the night, but they said you'd make it worth my while. GASTON: It's like this.  I've got my heart set on marrying Belle, but she needs a little persuasion."

look ahaed

str_detect(c('candle!!?', 'candlemaker', 'smart candle%'), 'candle(?=[[:punct:]]+)')
## [1]  TRUE FALSE  TRUE

How do you extract everything before GASTON:

Use the catch-all .

str_extract(test, ".+(?=GASTON:)")
## [1] "I don't usually leave the asylum in the middle of the night, but they said you'd make it worth my while. "

any character

[A-Z\\s[:punct:]]+[:] will match any character name, of variable lengths, some with punctuation and spaces

another example

test <- "BEAST: What are you doing here? MAURICE: Run, Belle! BEAST: The master of this castle. BELLE: I've come for my father.  Please let him out!  Can't you see he's sick? BEAST: Then he shouldn't have trespassed here."

str_extract(test, ".+(?=[A-Z\\s[:punct:]]+:)")
## [1] "BEAST: What are you doing here? MAURICE: Run, Belle! BEAST: The master of this castle. BELLE: I've come for my father.  Please let him out!  Can't you see he's sick? BEAS"

Didn’t quite work…

some other failed attempts

str_extract_all(test, "[a-z[:punct:]\\s]+(?=[A-Z\\s[:punct:]]+:)")

some other failed attempts

matching only lower cases, spaces and punctuation in the first pattern fails to pick up the capital letters starting sentences and proper names in the dialogue.

str_extract_all(test, "[a-z[:punct:]\\s]+(?=[A-Z\\s[:punct:]]+:)")
## [[1]]
## [1] "hat are you doing here? "   "elle! "                    
## [3] "he master of this castle. " "an't you see he's sick? "

some other failed attempts

str_extract_all(test, "[A-z;.,'!?\\s]+(?![A-Z\\s[:punct:]]+:)")

some other failed attempts

We put specific punctuation in excluding the colon. But colons in our data do show up outside of person identifiers, so we can’t do that either.

str_extract_all(test, "[A-z;.,'!?\\s]+(?![A-Z\\s[:punct:]]+:)")
## [[1]]
## [1] "BEAST"                                                                         
## [2] " What are you doing here? MAURICE"                                             
## [3] " Run, Belle! BEAST"                                                            
## [4] " The master of this castle. BELLE"                                             
## [5] " I've come for my father.  Please let him out!  Can't you see he's sick? BEAST"
## [6] " Then he shouldn't have trespassed here."

A recap of the problem, and a fix

But the person identifiers are still distinct enough that we can match them—which means we can replace them with identifiers that are different enough from the dialogue to be good split criteria.

add some bogus text

Extracting the person identifiers, adding some bogus lines to show this works for character names with punctuation and numbers:

test <- "BEAST: What are you doing here? MAURICE: Run, Belle! BEAST: The master of this castle. BELLE: I've come for my father.  Please let him out!  Can't you see he's sick? BEAST: Then he shouldn't have trespassed here. TOWNSFOLK 2: He's a monster! MRS. POTTS: Now pipe down!"

str_extract_all(test, "[A-Z]+[\\s0-9[:punct:]]*:|MRS. POTTS:")
## [[1]]
## [1] "BEAST:"       "MAURICE:"     "BEAST:"       "BELLE:"      
## [5] "BEAST:"       "TOWNSFOLK 2:" "MRS. POTTS:"

Replacing them:

str_replace_all(test, "[A-Z]+[\\s0-9[:punct:]]*:|MRS. POTTS:", "&&&&&&&&")
## [1] "&&&&&&&& What are you doing here? &&&&&&&& Run, Belle! &&&&&&&& The master of this castle. &&&&&&&& I've come for my father.  Please let him out!  Can't you see he's sick? &&&&&&&& Then he shouldn't have trespassed here. &&&&&&&& He's a monster! &&&&&&&& Now pipe down!"

New plan

replace character names

str_replace_all(test, c("BEAST:" = "001>", "MAURICE:" = "002>", "BELLE:" = "003>"))
## [1] "001> What are you doing here? 002> Run, Belle! 001> The master of this castle. 003> I've come for my father.  Please let him out!  Can't you see he's sick? 001> Then he shouldn't have trespassed here. TOWNSFOLK 2: He's a monster! MRS. POTTS: Now pipe down!"

find all character names

str_extract_all(test, "[A-Z]+[\\s0-9[:punct:]]*:|MRS. POTTS:")
## [[1]]
## [1] "BEAST:"       "MAURICE:"     "BEAST:"       "BELLE:"      
## [5] "BEAST:"       "TOWNSFOLK 2:" "MRS. POTTS:"

give each character a unique code

unique(str_extract_all(test, "[A-Z]+[\\s0-9[:punct:]]*:|MRS. POTTS:")[[1]])
## [1] "BEAST:"       "MAURICE:"     "BELLE:"       "TOWNSFOLK 2:"
## [5] "MRS. POTTS:"

give each character a unique code

codes <- unique(str_extract_all(test, "[A-Z]+[\\s0-9[:punct:]]*:|MRS. POTTS:")[[1]])
as.list(paste0(seq(from = 100, to = 100 + length(codes) - 1), ">"))
## [[1]]
## [1] "100>"
## 
## [[2]]
## [1] "101>"
## 
## [[3]]
## [1] "102>"
## 
## [[4]]
## [1] "103>"
## 
## [[5]]
## [1] "104>"

give each character a unique code

codes <- unique(str_extract_all(test, "[A-Z]+[\\s0-9[:punct:]]*:|MRS. POTTS:")[[1]])
codes_list <- as.list(paste0(seq(from = 100, to = 100 + length(codes) - 1), ">"))
names(codes_list) <- codes

give each character a unique code

codes <- unique(str_extract_all(test, "[A-Z]+[\\s0-9[:punct:]]*:|MRS. POTTS:")[[1]])
codes_list <- as.list(paste0(seq(from = 100, to = 100 + length(codes) - 1), ">"))
names(codes_list) <- codes
codes_list
## $`BEAST:`
## [1] "100>"
## 
## $`MAURICE:`
## [1] "101>"
## 
## $`BELLE:`
## [1] "102>"
## 
## $`TOWNSFOLK 2:`
## [1] "103>"
## 
## $`MRS. POTTS:`
## [1] "104>"

now replace names with their codes

test_coded <- str_replace_all(test, codes_list)
test_coded
## [1] "100> What are you doing here? 101> Run, Belle! 100> The master of this castle. 102> I've come for my father.  Please let him out!  Can't you see he's sick? 100> Then he shouldn't have trespassed here. 103> He's a monster! 104> Now pipe down!"

extract dialog

str_extract_all(test_coded, "[A-z[:punct:][:space:]]+(?![0-9]{3}>)")
## [[1]]
## [1] " What are you doing here?"                                               
## [2] " Run, Belle!"                                                            
## [3] " The master of this castle."                                             
## [4] " I've come for my father.  Please let him out!  Can't you see he's sick?"
## [5] " Then he shouldn't have trespassed here."                                
## [6] " He's a monster!"                                                        
## [7] " Now pipe down!"

extract speakers

str_extract_all(test_coded, "[0-9]{3}>")
## [[1]]
## [1] "100>" "101>" "100>" "102>" "100>" "103>" "104>"

see notes for the the full data