Comprehensive and up-to-date data on international investment agreements (IIAs) concluded between (groups of) countries are hard to come by, making it difficult to study their effects on bilateral investment flows. In this post, I propose and describe a simple approach to obtain and manipulate data from UNCTAD’s IIA Mapping Project using R.
Bilateral investment treaties (BITs) and related bilateral tax treaties (BTTs) are intended to promote, among others, capital flows between countries. For example, such treaties facilitate cross-border investment by multinationals by eliminating double-taxation. On the other hand, these treaties deter FDI in some cases by limiting investors’ ability to circumvent taxes by investing elsewhere. Understandably, the potentially ambiguous effects of these treaties on capital flows, such as aggregate foreign direct investment (FDI), and mergers and acquisitions (M&A), have received much attention in the empirical international economics literature.
Blonigen and Davies (2004)—one of the earliest studies on this front—could not find persuasive evidence for the facilitating role of BTTs with respect to US inbound and outbound FDI activity over the period 1980–1999. Although not their primary focus, Di Giovanni (2005) shows that capital tax treaties promote cross-border M&A deals for the period 1990 – 1999, beyond that of the advantages gleaned from countries’ capital tax rates, perhaps as a result of improved transparency in doing business across borders.1 Blonigen and Piger (2014) use Bayesian statistical techniques to highlight negotiated bilateral agreements, like BITs and service agreements, as important determinants of M&A activity worldwide. Specifically, it is suggested that these treaties are consequential drivers of FDI into poorer countries.
However, keeping track of the status BITs and other treaties with investment provisions (TIPs) remains rather difficult. Ever-evolving lists of signatories and economic union members, as well as differences between ratification, enforcement and termination dates make it particularly difficult. In an effort to construct my own BIT measure to include in the model specification of my research paper, “Gravity and Cross-Border Influence: The Importance of Distance in Mobile Telecommunications”, I turned to web scraping. In what follows, I describe the corresponding data source and methodology.
I apply a simple and undemanding web scraping method to the International Investment Agreements (IIA) Navigator hosted by the United Nations Conference on Trade and Development (UNCTAD).2 UNCTAD provides this and other tools to aid the monitoring, analysis and improvement of international investment policymaking. The IIA Navigator comprises a database of IIAs, a product of the collaborative IIA Mapping Project,3 which is regularly updated as the mapping of IIA content continues. For more information, see the relevant project description and methodology.4
IIAs are categorised as either bilateral investment treaties (BITs) or treaties with investment provisions (TIPs). The Navigator conceives of this distinction as follows:
“A bilateral investment treaty is an agreement between two countries regarding promotion and protection of investments made by investors from respective countries in each other’s territory. The great majority of IIAs are BITs.
The category of treaties with investment provisions brings together various types of investment treaties that are not BITs. Three main types of TIPs can be distinguished:
broad economic treaties that include obligations commonly found in BITs (e.g. a free trade agreement with an investment chapter);
treaties with limited investment-related provisions (e.g. only those concerning establishment of investments or free transfer of investment-related funds); and
treaties that only contain “framework” clauses such as the ones on cooperation in the area of investment and/or for a mandate for future negotiations on investment issues.”
Although not considered in this post, a comparable resource for bilateral tax treaties can be found in the Tax Treaties Explorer provided by the International Centre for Tax and Development (ICTD).5
In this section, I describe and provide the code employed to obtain and clean IIA data. First, I’ll load the necessary packages. In addition, I set the plan for executing futures across multiple sessions in parallel.
The focus of my web scraping is merely the table on the Mapping of IIA Content webpage. I am primarily concerned with the timing of countries’ IIA affiliations. Hence, it is important to standardise the coding of countries’ names, as well as all date variables.
Using rvest
, I scrape the webpage and parse the table. I save it as an RDS file for future use, instead of repeatedly scraping and parsing.
iia_url <- "https://investmentpolicy.unctad.org/international-investment-agreements/iia-mapping"
iia_webpage <- read_html(iia_url)
iia_table <- html_table(iia_webpage, header = T, trim = T)[[1]]
saveRDS(iia_table, "data/iia_table.rds")
The table is then read back into the R environment. I skim the resulting table to get an overview of the data. At first glance, there seems to be plenty of empty cells, and clunky variable names. I create parsimonious variable names and replace empty cells with missing values.6
iia_table <- read_rds("data/iia_table.rds") %>%
rename(Index = No.,
Full = `Full title`,
Short = `Short title`,
Signature = `Date of signature`,
Entry = `Date of entry into force`,
Termination = `Termination date`) %>%
select(-Text) %>%
mutate(across(where(is.character), ~ifelse(. == "", NA, .)))
skim(iia_table)
Name | iia_table |
Number of rows | 2591 |
Number of columns | 9 |
_______________________ | |
Column type frequency: | |
character | 8 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Full | 2503 | 0.03 | 21 | 207 | 0 | 88 | 0 |
Short | 0 | 1.00 | 10 | 96 | 0 | 2591 | 0 |
Type | 0 | 1.00 | 4 | 4 | 0 | 2 | 0 |
Status | 0 | 1.00 | 8 | 21 | 0 | 3 | 0 |
Parties | 0 | 1.00 | 10 | 165 | 0 | 2504 | 0 |
Signature | 0 | 1.00 | 10 | 10 | 0 | 2104 | 0 |
Entry | 317 | 0.88 | 10 | 10 | 0 | 1897 | 0 |
Termination | 2121 | 0.18 | 7 | 10 | 0 | 301 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Index | 0 | 1 | 1296 | 748.1 | 1 | 648.5 | 1296 | 1943.5 | 2591 | ▇▇▇▇▇ |
The resulting data frame comprises 2591 rows or distinct IIAs. The table below presents a sample of five of these rows.
iia_table %>%
select(Index, Parties, Signature, Entry, Termination) %>%
head(5)
Index | Parties | Signature | Entry | Termination |
---|---|---|---|---|
1 | Afghanistan, Germany | 20/04/2005 | 12/10/2007 | NA |
2 | Afghanistan, Türkiye | 10/07/2004 | 19/07/2005 | NA |
3 | Albania, Austria | 18/03/1993 | 01/08/1995 | NA |
4 | Albania, Azerbaijan | 09/02/2012 | 13/07/2012 | NA |
5 | Albania, BLEU (Belgium-Luxembourg Economic Union) | 01/02/1999 | 18/10/2002 | NA |
The tables above highlight some important issues with the data, which I will adress in turn:
Some variables are prone to missing many observations.
Entry
observations are indicative of the fact that some IIAs are not enforced.Date variables are given as character variables.
Individual parties are delimited by commas.
Individual parties are sometimes given by their non-English names.
Parties include economic blocs and unions, which are referred to as “Country Groupings” by the Navigator.
I convert date columns into the appropriate format with the lubridate
package. The conditional mutate of Terminate
was necessary because some observations were given in the mm/yyyy format.
Parties to a given IIA are presented as a character string delimited by commas. For each IIA, I transform this value into a list containing elements representing each party. However, some parties’ names comprise commas too, e.g., Republic of Korea is given as “Korea, Republic of”. To delimit parties, I first correct the names of parties which contain commas. The names of these parties, as well as their replacement values, were identified manually, which are represented by the list correctnames
.7 Using the correctname_function
, country names are standardised.
correctnames <- list(
# list(original, replacement)
list("Korea, Republic of","Republic of Korea"),
list("Moldova, Republic of", "Republic of Moldova"),
list("Iran, Islamic Republic of", "Islamic Republic of Iran"),
list("Bolivia, Plurinational State of", "Plurinational State of Bolivia"),
list("Korea, Dem. People's Rep. of", "Dem. People's Rep. of Korea"),
list("Venezuela, Bolivarian Republic of", "Bolivarian Republic of Venezuela"),
list("Congo, Democratic Republic of the", "Democratic Republic of the Congo"),
list("Tanzania, United Republic of", "United Republic of Tanzania"),
list("Micronesia, Federated States of", "Federated States of Micronesia")
)
correctname_function <- function(incorrectnamevector){
temp_names <- incorrectnamevector
for (i in 1:length(correctnames)) {
temp_names <- gsub(
pattern = correctnames[[i]][1],
replacement = correctnames[[i]][2],
x = temp_names,
ignore.case = T
)
}
return(temp_names)
}
iia_table$Parties <- iia_table$Parties %>%
lapply(correctname_function) %>% unlist()
As it stands, the Parties
column consists of character strings containing countries’ standardised names, as well as the names of country groupings, all delimited by commas. I want to replace country groupings in Parties
with their corresponding sets of countries.
Hence, I obtain a list of all country groupings by scraping the table on the IIAs by Country Grouping webpage with an approach akin to that used before. Each grouping in the table contains a link to its dedicated webpage. I use a CSS selector to identify the elements in groupings_webpage
which contain these links.8 I parse these links and merge them with the table listing all country groupings, i.e., groupings_table
.
groupings_url <- "https://investmentpolicy.unctad.org/international-investment-agreements/by-country-grouping"
groupings_webpage <- read_html(groupings_url)
groupings_table <- html_table(groupings_webpage, header = T, trim = T)[[1]] %>%
select(Index = No., Name)
groupings_table$Links <- html_elements(groupings_webpage, ".min-one-line a") %>%
html_attrs() %>%
unlist() %>%
unname()
Groupings’ webpages, like that of the ACP, are devoted to providing additional information regarding their members, and the status of their IIA affiliations. For now, I am only interested in determining the country compositions of each grouping. I write a function to access groupings’ webpages using their corresponding links, which are relative to the website’s base URL.9 On each webpage, the function scrapes a vector of members, again with the use of the appropriate CSS selector.
I use future_map
from the furrr
package to execute the function across multiple sessions to speed up operations. For each link, the function returns a single character string containing a grouping’s members delimited by commas, as in the original table of IIAs or iia_table
. As before, I save the table as an RDS file for later use, and to avoid repeatedly scraping multiple pages.
grouping_members_function <- function(groupinglink){
paste("https://investmentpolicy.unctad.org", groupinglink, sep = "") %>%
read_html(.) %>%
html_elements("#general a") %>%
html_text2() %>%
paste(collapse = ", ") %>%
return(.)
}
groupings_table$Members <- groupings_table$Links %>%
future_map(~grouping_members_function(groupinglink = .x)) %>%
unlist()
saveRDS(groupings_table, "data/groupings_table.rds")
I read groupings_table
back into the global environment and, as before, clean non-standard country names with correctname_function
. Some country groupings’ members are country groupings in and of themselves. I create a minor function to highlight the country groupings of concern.
groupings_table <- read_rds("data/groupings_table.rds")
groupings_table$Members <- groupings_table$Members %>%
lapply(correctname_function) %>% unlist()
# Create short name to look for matches in members
groupings_table <- groupings_table %>%
mutate(Short = str_extract(Name, "\\(([^)]+)\\)"),
Short = gsub("\\(|\\)", "", Short))
groupingroup_function <- function(groupingname){
member_vector <- groupings_table %>%
filter(Name == groupingname) %>%
pull(Members) %>%
str_split(", ") %>%
unlist() %>%
unique()
member_vector <- member_vector[member_vector %in% (groupings_table$Short)]
if(length(member_vector) > 0){
return(data.frame(Name = groupingname, Issue = paste(member_vector, collapse = ", ")))
} else{
return(data.frame(Name = groupingname, Issue = NA))
}
}
groupings_table$Name %>%
map_dfr(groupingroup_function) %>%
filter(!is.na(Issue))
Name | Issue |
---|---|
Energy Charter Treaty members | European Union |
EU (European Union) | European Union |
Seemingly, European Union is the only country grouping to be listed as member in other groupings. For example, Energy Charter Treaty lists the European Union as one of its members. To address this I must first remove European Union as a member of the EU (European Union) country grouping.
groupings_table <- groupings_table %>%
mutate(Members = case_when(
Short == "European Union" ~ gsub("European Union,", "", Members),
TRUE ~ Members
))
groupings_table <- groupings_table %>%
mutate(Members = gsub(
"European Union",
(groupings_table %>% filter(Short == "European Union") %>% pull(Members)),
x = Members))
# test if cleaning worked
groupings_table$Name %>%
map_dfr(groupingroup_function) %>%
filter(!is.na(Issue)) %>%
nrow(.) == 0
[1] TRUE
Now, I need to replace the country groupings, as they appear in the Parties
column of iia_table
, with their corresponding Members
. To this end, I first transform Parties
into character vectors by delimiting individual parties, and ensuring there are no white spaces. I create and a execute a function to identify country groupings in party vectors, and in turn, replace them with groupings’ members. The result is iia_table
with a Parties
column consisting of vectors of strictly countries.
iia_table$Parties <- iia_table$Parties %>%
str_split(", ") %>%
lapply(str_squish)
append_groupings_function <- function(partylist){
temp_vector <- unlist(partylist)
for (i in 1:length(temp_vector)) {
if (temp_vector[i] %in% groupings_table$Name) {
temp_members <- groupings_table %>%
filter(Name == temp_vector[i]) %>%
pull(Members) %>%
str_split(", ") %>%
unlist()
temp_vector <- temp_vector %>% append(temp_members)
}
}
temp_vector <- temp_vector[!temp_vector %in% groupings_table$Name]
return(temp_vector)
}
iia_table$Parties <- iia_table$Parties %>%
lapply(append_groupings_function)
Subsequently, I use the formidable countrycode
package to convert the country names in each vector of countries in Parties
to their corresponding ISO 3 codes. Once again, I execute the function using futures.
iia_table$Parties <- iia_table$Parties %>%
future_map(~countrycode(.x,
origin = "country.name",
destination = "iso3c"))
The product of all operations thus far is a data frame comprising IIAs already mapped by UNCTAD. The processed iita_table
presents each IIA alongside information about its type, signatories, dates, and status. It is now possible to construct a time-varying, bilateral matrix of countries’ mutual involvement in IIAs.
I consider only those IIA which are/have been enforced, as opposed to those which have been signed but not yet enforced. 317 out of 2591 IIAs have been signed without ever entering into force. I exclude these IIAs from the construction of the bilateral matrix.
The matrix consists of a dummy variable indicating the presence of an IIA between an origin and destination country in a given year. I take 1 July as the cut-off date to convert Entry
and Termination
into year columns, Start
and End
. In other words, if an IIA takes effect before 1 July of year \(t\), the particular IIA is indicated for \(t\). If the IIA takes effect on or after this cut-off, it is only indicated for \(t+1\). On the other hand, if an IIA is terminated after 1 July of year \(t\), it is still deemed in place during \(t\). If the IIA is terminated before the cut-off date in \(t\), it is deemed in place until the end of \(t-1\). Where termination dates are indefinite, I take the current year as the end of the enforcement period.
iia_table <- iia_table %>%
mutate(Start = case_when(
Entry < ymd(paste(year(Entry), "-07-01", sep = "")) ~ year(Entry),
Entry >= ymd(paste(year(Entry), "-07-01", sep = "")) ~ year(Entry) + 1,
TRUE ~ NA)) %>%
mutate(End = case_when(
is.na(Termination) ~ year(Sys.Date()),
Termination < ymd(paste(year(Termination), "-07-01", sep = "")) ~ year(Termination) - 1,
Termination >= ymd(paste(year(Termination), "-07-01", sep = "")) ~ year(Termination),
TRUE ~ NA))
Thus, I have determined the start and end years of each IIA’s enforcement. As with the Parties
variable, I create a vector containing all years of enforcement for each.
iia_table <- iia_table %>%
mutate(Period = future_map2(Start, End, ~seq.int(.x, .y, by = 1)))
For example, the first IIA in iia_table
, titled Afghanistan - Germany BIT (2005), was signed on 2005-04-20, entered into force on 2007-10-12, and has an undefined termination date. Hence, its enforcement period is as follows:
iia_table$Period[[1]]
[1] 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
[14] 2021 2022 2023 2024
Using Index
, Parties
and Period
columns from iia_table
, I can begin to construct this matrix, for example:
iia_matrix <- iia_table %>%
select(Index, Origin = Parties, Destination = Parties, Period)
# Example
print(filter(iia_matrix, Index == 1))
# A tibble: 1 × 4
Index Origin Destination Period
<int> <list> <list> <list>
1 1 <chr [2]> <chr [2]> <dbl [17]>
As opposed to imposing some structure on a matrix of countries, and subsequently applying a function to determine a country pair’s joint affiliation to the same IIA in a given year, I instead allow for the matrix to spring from IIA data. The unnest
function is very useful in this regard. I create a data frame explicitly containing all possible Origin-Destination combinations of countries, which appear in mapped IIAs. In addition, I create the indicator variable of interest agree_iia
. Following the typical structure of gravity covariate datasets, agree_iia
need not be dyadic across Origin-Destination pairs.
iia_matrix <- iia_matrix %>%
unnest(Origin) %>%
unnest(Destination) %>%
distinct() %>% # to eliminate duplicate country pairs per IIA
mutate(agree_iia = 1)
# Example
print(filter(iia_matrix, Index == 1))
# A tibble: 4 × 5
Index Origin Destination Period agree_iia
<int> <chr> <chr> <list> <dbl>
1 1 AFG AFG <dbl [17]> 1
2 1 AFG DEU <dbl [17]> 1
3 1 DEU AFG <dbl [17]> 1
4 1 DEU DEU <dbl [17]> 1
I follow the same unnesting procedure for the enforcement periods of each IIA. Subsequently, I aggregate agree_iia
from the IIA-Origin-Destination-Year level to the Origin-Destination-Year level. Thus, agree_iia
is now indicated for a given year if at least one IIA is enforced among a particular Origin-Destination pair in said year.
iia_matrix <- iia_matrix %>%
unnest(Period) %>%
group_by(Origin, Destination, Period) %>%
summarise(agree_iia = max(agree_iia, na.rm = T)) %>%
ungroup()
There are 127833 Origin-Destination-Year combinations arising from mapped IIAs. The earliest enforcement period start is 1962, and the latest end—by construction—is 2024. There are 188 unique countries covered by previously mapped IIAs, and IIAs’ cover 5558 actual combinations of Origins and Destinations.
Using the spread
and gather
functions, I transform the data frame into a balanced matrix format. If a particular Origin-Destination-Year observation of agree_iia
was not present in the original set of 127833 observations, agree_iia
now takes the value of zero for said combination. In addition, I convert agree_iia
to zero where Origins and Destinations are symmetric, which is standard practice in the construction of bilateral trade facilitation variables.
iia_matrix <- iia_matrix %>%
spread(Period, agree_iia, fill = 0) %>%
gather(Year, agree_iia, 3:ncol(.)) %>%
spread(Destination, agree_iia, fill = 0) %>%
gather(Destination, agree_iia, 3:ncol(.)) %>%
select(Year, Origin, Destination, agree_iia) %>%
mutate(agree_iia = case_when(
Origin == Destination ~ 0,
TRUE ~ agree_iia
))
The final product is a balanced bilateral matrix of countries’ joint affiliation to at least one IIA—or lack thereof—in a given year. It comprises 2226672 observations of agree_iia
, spanning the years 1962 to 2024. It encompasses 35344 unique combinations of Origin and Destination countries—the countries covered by mapped IIAs. An exemplary subset of this matrix is tabulated below; that is, observations of agree_iia
for 2024. The tabulation is accompanied by a download link for the corresponding .csv
data.
I want to urge caution when adopting a similar approach to the one given here, or when employing the resulting dataset in empirical research. Please note the following limitations:
As it stands, the bilateral matrix of IIA involvement does not yet account for the entry and exit of countries into and out of country groupings, such as economic blocs and unions. This would entail acquiring a membership timeline for each grouping addressed in the Country Groupings section. Subsequently, country groupings’ lists of members should be made time-varying by accounting for deviations from current lists over time. Time-adjusted lists of members should be used to replace country groupings in the appearances of country groupings in the IIA data prior to constructing the bilateral matrix.
Thanks to great work of the team at UNCTAD, and their collaborators, we are able to construct a matrix such as this one. However, it should be noted that the mapping of IIAs has not yet been completed, and will also be updated regularly. Hence, the matrix is not exhaustive (in a retrospective sense), and will not necessarily be up-to-date henceforth.
Related to the point above, the outcome of these operations is only as comprehensive as the coverage of previously mapped IIAs permits. It may be that agree_iia
should actually be indicated for a particular Origin-Destination-Year combination, but has instead been computed as \(0\), or not computed at all, as the IIA(s) of concern has not yet been mapped. Even though constructing the matrix from mapped IIAs—or from the ground up as I do here—is computationally efficient, the resulting coverage of countries, country pairs, and enforcement periods are similarly constrained. Thus, it is an important limitation to consider when using the matrix in empirical research, as it may not be representative of the universe of IIAs, and similarly, countries’ investment relationships.
Web scraping allows us to stand on the shoulders of giants when it comes to acquiring useful data in the public domain. However, unless performed ethically and within reasonable limits, web scraping may come at a great cost to the host. Read more about ethical data collection here.
I kindly invite willing and eager readers to reach out to me regarding any flaws, concerns, suggestions, etc. I would love to hear them!
This post was last updated on 04 August 2024.
─ Session info ─────────────────────────────────────────────────────
setting value
version R version 4.4.1 (2024-06-14 ucrt)
os Windows 11 x64 (build 22631)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_South Africa.utf8
ctype English_South Africa.utf8
tz Africa/Johannesburg
date 2024-08-04
pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ─────────────────────────────────────────────────────────
package * version date (UTC) lib source
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.4.0)
bslib 0.7.0 2024-03-29 [1] CRAN (R 4.4.1)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.1)
cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.1)
codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.1)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.1)
countrycode * 1.6.0 2024-03-22 [1] CRAN (R 4.4.1)
crosstalk 1.2.1 2023-11-23 [1] CRAN (R 4.4.1)
digest 0.6.36 2024-06-23 [1] CRAN (R 4.4.1)
distill * 1.6 2023-10-06 [1] CRAN (R 4.4.1)
downlit 0.4.4 2024-06-10 [1] CRAN (R 4.4.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.1)
DT * 0.33 2024-04-04 [1] CRAN (R 4.4.1)
evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.1)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.1)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.1)
fontawesome 0.5.2 2023-08-19 [1] CRAN (R 4.4.1)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.4.1)
furrr * 0.3.1 2022-08-15 [1] CRAN (R 4.4.1)
future * 1.33.2 2024-03-26 [1] CRAN (R 4.4.1)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.1)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.1)
globals 0.16.3 2024-03-08 [1] CRAN (R 4.4.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.1)
gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.1)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.4.1)
highr 0.11 2024-05-26 [1] CRAN (R 4.4.1)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.1)
htmltools * 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.1)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.4.1)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.4.1)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.1)
kableExtra * 1.4.0.4 2024-07-19 [1] local
knitr * 1.48 2024-07-07 [1] CRAN (R 4.4.1)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.1)
listenv 0.9.1 2024-01-29 [1] CRAN (R 4.4.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.4.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.1)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.1)
MetBrewer * 0.2.0 2022-03-21 [1] CRAN (R 4.4.1)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.1)
pacman * 0.5.1 2019-03-11 [1] CRAN (R 4.4.1)
parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.4.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.1)
printr * 0.3 2023-03-08 [1] CRAN (R 4.4.1)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.4.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.1)
ragg 1.3.2 2024-05-15 [1] CRAN (R 4.4.1)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.4.1)
repr 1.1.7 2024-03-22 [1] CRAN (R 4.4.1)
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.1)
rmarkdown * 2.27 2024-05-17 [1] CRAN (R 4.4.1)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.4.1)
rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.1)
rvest * 1.0.4 2024-02-12 [1] CRAN (R 4.4.1)
sass 0.4.9 2024-03-15 [1] CRAN (R 4.4.1)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.1)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.1)
skimr * 2.1.5 2022-12-23 [1] CRAN (R 4.4.1)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.4.1)
svglite 2.1.3 2023-12-08 [1] CRAN (R 4.4.1)
systemfonts 1.1.0 2024-05-15 [1] CRAN (R 4.4.1)
textshaping 0.4.0 2024-05-24 [1] CRAN (R 4.4.1)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.4.1)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.4.1)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.1)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.4.1)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.1)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.1)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.1)
uuid 1.2-0 2024-01-14 [1] CRAN (R 4.4.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.1)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.4.1)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.1)
xaringanExtra * 0.8.0 2024-05-19 [1] CRAN (R 4.4.1)
xfun 0.45 2024-06-16 [1] CRAN (R 4.4.1)
xml2 1.3.6 2023-12-04 [1] CRAN (R 4.4.1)
yaml 2.3.9 2024-07-05 [1] CRAN (R 4.4.1)
[1] C:/Users/marai/AppData/Local/R/win-library/4.4
[2] C:/Program Files/R/R-4.4.1/library
────────────────────────────────────────────────────────────────────
https://investmentpolicy.unctad.org/international-investment-agreements↩︎
https://investmentpolicy.unctad.org/international-investment-agreements/iia-mapping↩︎
https://investmentpolicy.unctad.org/uploaded-files/document/Mapping%20Project%20Description%20and%20Methodology.pdf↩︎
I remove the Text
column, because I am not interested in IIAs’ original documentation.↩︎
This was done by observing the parties with the longest names, for those containing additional commas typically have longer names.↩︎
I like the user-friendly and simple to use CSS selector offered by the SelectorGadget Chrome extension.↩︎
For example, /international-investment-agreements/groupings/11/acp-african-caribbean-and-pacific-group-of-states-.↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/WihanZA/wihan_distill, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Marais (2024, May 6). Wihan Marais: Scraping International Investment Agreement Data. Retrieved from https://www.wihanza.com/posts/2024-05-06-scraping-international-investment-agreement-data/
BibTeX citation
@misc{marais2024scraping, author = {Marais, Wihan}, title = {Wihan Marais: Scraping International Investment Agreement Data}, url = {https://www.wihanza.com/posts/2024-05-06-scraping-international-investment-agreement-data/}, year = {2024} }