For this lecture we need to packages PlackettLuce, ggplot2, gosset and igraph. We load the packages with library()
. We also load some extra functions from the workflow employed in the ClimMob platform.
library("PlackettLuce")
library("ggplot2")
library("gosset")
library("igraph")
source("https://raw.githubusercontent.com/AgrDataSci/ClimMob-analysis/master/R/functions.R")
Load the data of beans tricot trials from the package PlackettLuce
data("beans", package = "PlackettLuce")
str(beans)
## 'data.frame': 842 obs. of 14 variables:
## $ variety_a : chr "BRT 103-182" "INTA Rojo" "INTA Ferroso" "INTA Centro Sur" ...
## $ variety_b : chr "SJC 730-79" "INTA Centro Sur" "INTA Matagalpa" "INTA Rojo" ...
## $ variety_c : chr "PM2 Don Rey" "INTA Sequia" "BRT 103-182" "ALS 0532-6" ...
## $ best : chr "C" "B" "A" "B" ...
## $ worst : chr "A" "A" "C" "C" ...
## $ var_a : chr "Worse" "Worse" "Better" "Better" ...
## $ var_b : chr "Worse" "Better" "Worse" "Better" ...
## $ var_c : chr "Better" "Better" "Worse" "Better" ...
## $ season : Factor w/ 5 levels "Po - 15","Ap - 15",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ year : num 2015 2015 2015 2015 2015 ...
## $ maxTN : num 19.4 18.9 18.4 18.9 18.9 ...
## $ lon : num -85.7 -85.4 -85.4 -85.4 -85.4 ...
## $ lat : num 13.1 13.3 13.3 13.3 13.3 ...
## $ planting_date: Date, format: "2015-12-18" "2015-12-18" ...
The function table()
builds a contingency table of the counts at each combination of characters or factor levels. Letâs see the number of points in the beans data per season.
table(beans$season)
##
## Po - 15 Ap - 15 Pr - 16 Po - 16 Ap - 16
## 177 481 64 33 87
Now number of varieties that were tested in the project. For this we use the function unlist()
to coerce the three columns with the variety names into a vector of length 2526 (nrow(beans) x 3)
items <- unlist(beans[,c("variety_a", "variety_b", "variety_c")])
table(items)
## items
## ALS 0532-6 BRT 103-182 INTA Centro Sur INTA Ferroso INTA Matagalpa
## 261 245 260 234 258
## INTA Precoz INTA Rojo INTA Sequia PM2 Don Rey SJC 730-79
## 246 249 259 244 270
And now we see how these varieties were tested across each season. For this, we pass the vector items as first argument and repeat the vector beans$season three times to match the length of the items using the function rep()
.
table(items, rep(beans$season, 3))
##
## items Po - 15 Ap - 15 Pr - 16 Po - 16 Ap - 16
## ALS 0532-6 58 149 18 11 25
## BRT 103-182 51 140 18 9 27
## INTA Centro Sur 56 143 21 12 28
## INTA Ferroso 47 135 20 9 23
## INTA Matagalpa 53 152 19 9 25
## INTA Precoz 56 134 19 10 27
## INTA Rojo 45 144 19 11 30
## INTA Sequia 52 152 19 10 26
## PM2 Don Rey 54 139 16 10 25
## SJC 730-79 59 155 23 8 25
We can use the function density()
to compute a kernel density estimate of the maximum night temperature registed in the locations where the on-farm trials were stablished. To visualize this density, we use the function plot()
.
plot(density(beans$maxTN),
main = "Kernel density of maximun night temperature")
We can visualize this variable through their quartiles and check for any outlier with boxplot()
for visualization and boxplot.stats()
to get the statistics of the box plot.
boxplot(beans$maxTN)
With the function plot()
we can also plot the coordinates of each on-farm trial registered in the bean trial.
plot(beans[,c("lon", "lat")])
The output is not so informative as it is simply a scatterplot where the axys Y is the latitude and X is the longitude. We can improve this visualization by applying a layer of geographic coverage to get an idea where these trials were set. We use an internal function of the ClimMob workflow powered by the package leaflet.
plot_map(beans, c("lon", "lat"), map_provider = "OpenStreetMap.Mapnik")
First we coerce the tricot rankings into a PlackettLuce rankings
R <- rank_tricot(data = beans,
items = c("variety_a","variety_b","variety_c"),
input = c("best","worst"))
head(R)
## [1] "PM2 Don Rey > SJC 730-79 > BRT 103-182"
## [2] "INTA Centro Sur > INTA Sequia > INTA ..."
## [3] "INTA Ferroso > INTA Matagalpa > BRT ..."
## [4] "INTA Rojo > INTA Centro Sur > ALS 0532-6"
## [5] "PM2 Don Rey > INTA Sequia > SJC 730-79"
## [6] "ALS 0532-6 > INTA Matagalpa > INTA Rojo"
We can see the network (how these varieties are connected) of this trial by computing the adjacency of rankings with the function adjacency()
from PlackettLuce and the function network()
from the ClimMob workflow.
a <- adjacency(R)
a <- network(a)
plot(a)
The function summarise_favourite()
computes the favourability score from permutations contests. The favourability score is the difference between the wins and losses of a given item in a paired comparison.
fav <- summarise_favourite(R)
fav
## # A tibble: 10 x 6
## items N best worst wins fav_score
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 INTA Sequia 259 41.3 24.7 75.3 16.6
## 2 INTA Centro Sur 260 37.3 28.1 71.9 9.23
## 3 BRT 103-182 245 38.8 33.5 66.5 5.31
## 4 INTA Rojo 249 33.7 30.9 69.1 2.81
## 5 INTA Matagalpa 258 31.8 34.1 65.9 -2.33
## 6 PM2 Don Rey 244 31.1 35.2 64.8 -4.10
## 7 INTA Ferroso 234 29.9 35.0 65.0 -5.13
## 8 SJC 730-79 270 31.1 36.3 63.7 -5.19
## 9 ALS 0532-6 261 30.3 39.1 60.9 -8.81
## 10 INTA Precoz 246 27.6 36.6 63.4 -8.94
It has a plot method powered by the package ggplot2 which offer many options to change its layout.
plot(fav) +
theme_bw() +
theme(panel.grid = element_blank())
The function summarise_victories()
summarises the relative number of times a âplayer1â wins against a âplayer2â in a paired comparison.
vic <- summarise_victories(R)
head(vic)
## player1 player2 win ncontest victories
## 1 ALS 0532-6 BRT 103-182 27 64 0.4218750
## 2 ALS 0532-6 INTA Centro Sur 21 58 0.3620690
## 4 ALS 0532-6 INTA Ferroso 29 53 0.5471698
## 7 ALS 0532-6 INTA Matagalpa 29 54 0.5370370
## 11 ALS 0532-6 INTA Precoz 27 55 0.4909091
## 16 ALS 0532-6 INTA Rojo 34 63 0.5396825
Also with a plot method that passes ggplot2 arguments
plot(vic) +
theme_bw() +
theme(panel.grid = element_blank())
This is the data from Moyo et al (2021) who tested the consumersâ preference on sweetpotato materials (varieties and genotypes) in Ghana and Uganda. The data is available here and also in the classroom materials in the âdata/â folder.
We first read the .csv file with the function read.csv()
dt <- read.csv("data/sweet_potato.csv")
str(dt)
## 'data.frame': 1433 obs. of 20 variables:
## $ id : int 6 18 24 25 27 56 62 63 73 74 ...
## $ country : chr "Uganda" "Uganda" "Uganda" "Uganda" ...
## $ district : chr "Gulu" "Gulu" "Gulu" "Gulu" ...
## $ gender : chr "Woman" "Woman" "Man" "Man" ...
## $ trial : chr "community" "community" "community" "community" ...
## $ item_A : chr "NASPOT 10 (Kabode)" "Kakamega" "NASPOT 13" "NASPOT 12" ...
## $ item_B : chr "NASPOT 12" "NASPOT 8" "NASPOT 10 (Kabode)" "NASPOT 10 (Kabode)" ...
## $ item_C : chr "NASPOT 13" "NASPOT 13" "Kakamega" "NASPOT 13" ...
## $ best_overall : chr "B" "B" "A" "A" ...
## $ worst_overall : chr "A" "A" "C" "B" ...
## $ best_taste : chr "C" "B" "A" "A" ...
## $ worst_taste : chr "B" "C" "C" "B" ...
## $ best_color : chr "B" "C" "B" "C" ...
## $ worst_color : chr "C" "B" "A" "A" ...
## $ best_describe : chr "It has high dry matter content" "nice, sweet" "sweet" "very sweet and nice" ...
## $ worst_describe: chr "not too sweet" "sweet" "wet" "it is wet and not sweet" ...
## $ community : chr NA NA NA NA ...
## $ age : chr NA NA NA NA ...
## $ geno_test : chr NA NA NA NA ...
## $ region : chr NA NA NA NA ...
We are going to compare rankings from the three traits evaluated in the sweetpotato trial (taste, color and overall) using overall as baseline. We use the function summarise_agreement()
which computes the Kendall correlation between rankings, and the agreement between the first and last ranked varieties.
First, we subset the main dataset to retain only the Uganda data. The condition dt$country == "Uganda"
passed in the rows of dt
return a logical vector (TRUE FALSE
) for the observations in dt$country
that are equal to Uganda.
keep <- dt$country == "Uganda"
dt <- dt[keep, ]
dim(dt)
## [1] 418 20
Now we isolate the strings for the traits assessed in Uganda. We use the function paste0()
to create a vector that represents the traits and their names in dt
with the strings best_
and worst_
traits <- c("overall","taste","color")
sel <- paste0(c("best_","worst_"), rep(traits, each = 2))
sel
## [1] "best_overall" "worst_overall" "best_taste" "worst_taste"
## [5] "best_color" "worst_color"
We use an iteration with for
to create the rankings. Iterations are often used to optimize processes and enable a better debugging structure
We create an empty list and the rankings that are going to be created in each iteration i
performed along the vector traits
R <- list()
for(i in seq_along(traits)){
R[[i]] <- rank_tricot(dt,
items = c("item_A", "item_B", "item_C"),
input = paste0(c("best_","worst_"), traits[[i]]))
}
We can now compare the rankings. The first argument is the baseline wich is the first element in the list R
, we indicate it with R[[1]]
where 1 is the index for the first element in the list. The second argument is the list with rankings to compare, we indicate it with R[-1]
where -1 tells to R to use all the elements in the list besides the first.
a <- summarise_agreement(R[[1]],
compare.to = R[-1],
labels = c("Taste","Colour"))
a
## # A tibble: 2 x 4
## labels kendall first last
## <chr> <dbl> <dbl> <dbl>
## 1 Taste 79.9 88.5 81.3
## 2 Colour -3.19 27.0 18.2
We can also plot it with the ggplot2 methods
plot(a) +
theme_bw() +
theme(panel.grid = element_blank())