Set up

For this lecture we need to packages PlackettLuce, ggplot2, gosset and igraph. We load the packages with library(). We also load some extra functions from the workflow employed in the ClimMob platform.

library("PlackettLuce")
library("ggplot2")
library("gosset")
library("igraph")
source("https://raw.githubusercontent.com/AgrDataSci/ClimMob-analysis/master/R/functions.R")

Example 1: Beans data

Load the data of beans tricot trials from the package PlackettLuce

data("beans", package = "PlackettLuce")
str(beans)
## 'data.frame':    842 obs. of  14 variables:
##  $ variety_a    : chr  "BRT 103-182" "INTA Rojo" "INTA Ferroso" "INTA Centro Sur" ...
##  $ variety_b    : chr  "SJC 730-79" "INTA Centro Sur" "INTA Matagalpa" "INTA Rojo" ...
##  $ variety_c    : chr  "PM2 Don Rey" "INTA Sequia" "BRT 103-182" "ALS 0532-6" ...
##  $ best         : chr  "C" "B" "A" "B" ...
##  $ worst        : chr  "A" "A" "C" "C" ...
##  $ var_a        : chr  "Worse" "Worse" "Better" "Better" ...
##  $ var_b        : chr  "Worse" "Better" "Worse" "Better" ...
##  $ var_c        : chr  "Better" "Better" "Worse" "Better" ...
##  $ season       : Factor w/ 5 levels "Po - 15","Ap - 15",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ year         : num  2015 2015 2015 2015 2015 ...
##  $ maxTN        : num  19.4 18.9 18.4 18.9 18.9 ...
##  $ lon          : num  -85.7 -85.4 -85.4 -85.4 -85.4 ...
##  $ lat          : num  13.1 13.3 13.3 13.3 13.3 ...
##  $ planting_date: Date, format: "2015-12-18" "2015-12-18" ...

Contingency table

The function table() builds a contingency table of the counts at each combination of characters or factor levels. Let’s see the number of points in the beans data per season.

table(beans$season)
## 
## Po - 15 Ap - 15 Pr - 16 Po - 16 Ap - 16 
##     177     481      64      33      87

Now number of varieties that were tested in the project. For this we use the function unlist() to coerce the three columns with the variety names into a vector of length 2526 (nrow(beans) x 3)

items <- unlist(beans[,c("variety_a", "variety_b", "variety_c")])

table(items)
## items
##      ALS 0532-6     BRT 103-182 INTA Centro Sur    INTA Ferroso  INTA Matagalpa 
##             261             245             260             234             258 
##     INTA Precoz       INTA Rojo     INTA Sequia     PM2 Don Rey      SJC 730-79 
##             246             249             259             244             270

And now we see how these varieties were tested across each season. For this, we pass the vector items as first argument and repeat the vector beans$season three times to match the length of the items using the function rep().

table(items, rep(beans$season, 3))
##                  
## items             Po - 15 Ap - 15 Pr - 16 Po - 16 Ap - 16
##   ALS 0532-6           58     149      18      11      25
##   BRT 103-182          51     140      18       9      27
##   INTA Centro Sur      56     143      21      12      28
##   INTA Ferroso         47     135      20       9      23
##   INTA Matagalpa       53     152      19       9      25
##   INTA Precoz          56     134      19      10      27
##   INTA Rojo            45     144      19      11      30
##   INTA Sequia          52     152      19      10      26
##   PM2 Don Rey          54     139      16      10      25
##   SJC 730-79           59     155      23       8      25

Plot methods

We can use the function density() to compute a kernel density estimate of the maximum night temperature registed in the locations where the on-farm trials were stablished. To visualize this density, we use the function plot().

plot(density(beans$maxTN), 
     main = "Kernel density of maximun night temperature")

We can visualize this variable through their quartiles and check for any outlier with boxplot() for visualization and boxplot.stats() to get the statistics of the box plot.

boxplot(beans$maxTN)

With the function plot() we can also plot the coordinates of each on-farm trial registered in the bean trial.

plot(beans[,c("lon", "lat")])

The output is not so informative as it is simply a scatterplot where the axys Y is the latitude and X is the longitude. We can improve this visualization by applying a layer of geographic coverage to get an idea where these trials were set. We use an internal function of the ClimMob workflow powered by the package leaflet.

plot_map(beans, c("lon", "lat"), map_provider = "OpenStreetMap.Mapnik")

Rank visualization

First we coerce the tricot rankings into a PlackettLuce rankings

R <- rank_tricot(data = beans,
                 items = c("variety_a","variety_b","variety_c"),
                 input = c("best","worst"))
head(R)
## [1] "PM2 Don Rey > SJC 730-79 > BRT 103-182"  
## [2] "INTA Centro Sur > INTA Sequia > INTA ..."
## [3] "INTA Ferroso > INTA Matagalpa > BRT  ..."
## [4] "INTA Rojo > INTA Centro Sur > ALS 0532-6"
## [5] "PM2 Don Rey > INTA Sequia > SJC 730-79"  
## [6] "ALS 0532-6 > INTA Matagalpa > INTA Rojo"

We can see the network (how these varieties are connected) of this trial by computing the adjacency of rankings with the function adjacency() from PlackettLuce and the function network() from the ClimMob workflow.

a <- adjacency(R)

a <- network(a)

plot(a)

Favourability score

The function summarise_favourite() computes the favourability score from permutations contests. The favourability score is the difference between the wins and losses of a given item in a paired comparison.

fav <- summarise_favourite(R)

fav
## # A tibble: 10 x 6
##    items               N  best worst  wins fav_score
##    <chr>           <int> <dbl> <dbl> <dbl>     <dbl>
##  1 INTA Sequia       259  41.3  24.7  75.3     16.6 
##  2 INTA Centro Sur   260  37.3  28.1  71.9      9.23
##  3 BRT 103-182       245  38.8  33.5  66.5      5.31
##  4 INTA Rojo         249  33.7  30.9  69.1      2.81
##  5 INTA Matagalpa    258  31.8  34.1  65.9     -2.33
##  6 PM2 Don Rey       244  31.1  35.2  64.8     -4.10
##  7 INTA Ferroso      234  29.9  35.0  65.0     -5.13
##  8 SJC 730-79        270  31.1  36.3  63.7     -5.19
##  9 ALS 0532-6        261  30.3  39.1  60.9     -8.81
## 10 INTA Precoz       246  27.6  36.6  63.4     -8.94

It has a plot method powered by the package ggplot2 which offer many options to change its layout.

plot(fav) + 
  theme_bw() +
  theme(panel.grid = element_blank())

Victories

The function summarise_victories() summarises the relative number of times a “player1” wins against a “player2” in a paired comparison.

vic <- summarise_victories(R)

head(vic)
##       player1         player2 win ncontest victories
## 1  ALS 0532-6     BRT 103-182  27       64 0.4218750
## 2  ALS 0532-6 INTA Centro Sur  21       58 0.3620690
## 4  ALS 0532-6    INTA Ferroso  29       53 0.5471698
## 7  ALS 0532-6  INTA Matagalpa  29       54 0.5370370
## 11 ALS 0532-6     INTA Precoz  27       55 0.4909091
## 16 ALS 0532-6       INTA Rojo  34       63 0.5396825

Also with a plot method that passes ggplot2 arguments

plot(vic) +
  theme_bw() +
  theme(panel.grid = element_blank())

Example 2: Sweetpotato data

This is the data from Moyo et al (2021) who tested the consumers’ preference on sweetpotato materials (varieties and genotypes) in Ghana and Uganda. The data is available here and also in the classroom materials in the “data/” folder.

We first read the .csv file with the function read.csv()

dt <- read.csv("data/sweet_potato.csv")
str(dt)
## 'data.frame':    1433 obs. of  20 variables:
##  $ id            : int  6 18 24 25 27 56 62 63 73 74 ...
##  $ country       : chr  "Uganda" "Uganda" "Uganda" "Uganda" ...
##  $ district      : chr  "Gulu" "Gulu" "Gulu" "Gulu" ...
##  $ gender        : chr  "Woman" "Woman" "Man" "Man" ...
##  $ trial         : chr  "community" "community" "community" "community" ...
##  $ item_A        : chr  "NASPOT 10 (Kabode)" "Kakamega" "NASPOT 13" "NASPOT 12" ...
##  $ item_B        : chr  "NASPOT 12" "NASPOT 8" "NASPOT 10 (Kabode)" "NASPOT 10 (Kabode)" ...
##  $ item_C        : chr  "NASPOT 13" "NASPOT 13" "Kakamega" "NASPOT 13" ...
##  $ best_overall  : chr  "B" "B" "A" "A" ...
##  $ worst_overall : chr  "A" "A" "C" "B" ...
##  $ best_taste    : chr  "C" "B" "A" "A" ...
##  $ worst_taste   : chr  "B" "C" "C" "B" ...
##  $ best_color    : chr  "B" "C" "B" "C" ...
##  $ worst_color   : chr  "C" "B" "A" "A" ...
##  $ best_describe : chr  "It has high dry matter content" "nice, sweet" "sweet" "very sweet and nice" ...
##  $ worst_describe: chr  "not too sweet" "sweet" "wet" "it is wet and not sweet" ...
##  $ community     : chr  NA NA NA NA ...
##  $ age           : chr  NA NA NA NA ...
##  $ geno_test     : chr  NA NA NA NA ...
##  $ region        : chr  NA NA NA NA ...

Agreement between rankings

We are going to compare rankings from the three traits evaluated in the sweetpotato trial (taste, color and overall) using overall as baseline. We use the function summarise_agreement() which computes the Kendall correlation between rankings, and the agreement between the first and last ranked varieties.

First, we subset the main dataset to retain only the Uganda data. The condition dt$country == "Uganda" passed in the rows of dt return a logical vector (TRUE FALSE) for the observations in dt$country that are equal to Uganda.

keep <- dt$country == "Uganda"

dt <- dt[keep, ]

dim(dt)
## [1] 418  20

Now we isolate the strings for the traits assessed in Uganda. We use the function paste0() to create a vector that represents the traits and their names in dt with the strings best_ and worst_

traits <- c("overall","taste","color")

sel <- paste0(c("best_","worst_"), rep(traits, each = 2))

sel
## [1] "best_overall"  "worst_overall" "best_taste"    "worst_taste"  
## [5] "best_color"    "worst_color"

We use an iteration with for to create the rankings. Iterations are often used to optimize processes and enable a better debugging structure

We create an empty list and the rankings that are going to be created in each iteration i performed along the vector traits

R <- list()

for(i in seq_along(traits)){
  
  R[[i]] <- rank_tricot(dt, 
                        items = c("item_A", "item_B", "item_C"),
                        input = paste0(c("best_","worst_"), traits[[i]]))
  
}

We can now compare the rankings. The first argument is the baseline wich is the first element in the list R, we indicate it with R[[1]] where 1 is the index for the first element in the list. The second argument is the list with rankings to compare, we indicate it with R[-1] where -1 tells to R to use all the elements in the list besides the first.

a <- summarise_agreement(R[[1]],
                         compare.to = R[-1],
                         labels = c("Taste","Colour"))

a
## # A tibble: 2 x 4
##   labels kendall first  last
##   <chr>    <dbl> <dbl> <dbl>
## 1 Taste    79.9   88.5  81.3
## 2 Colour   -3.19  27.0  18.2

We can also plot it with the ggplot2 methods

plot(a) + 
  theme_bw() +
  theme(panel.grid = element_blank())



desousakaue k.desousa@cgiar.org