Berkson's Paradox: why handsome people are such a jerk?

PUBLISHED ON FEB 14, 2021 — CRITICAL-THINKING, STAT, VIZ

Many times we are in presence of two variables which they don’t have a relationship at all. Let say, \(X\) and \(Y\), but we observed both variables together could indicate just the opposite, that indeed variables are correlated. Maybe also we observed a negative correlation between variables but, actually, the correlation between \(X\) and \(Y\) is positive. This kind of cases most of the time could be explained by Berkson’s paradox.

Let’s think about the idea while we observed \(X\) and \(Y\) without taking much care or attention in the way we observed, i.e. without thinking about which size of the sample is necessary to draw significant conclusions or consider if different groups across the sample are well represented. In our daily lives, we are surrounded by plenty of these scenarios in which we are looking through an “observational lens”, a kind of kaleidoscope whose beautiful and crazy patterns have effects on how \(X\) and \(Y\) are related. When the kaleidoscope influences the observation itself inducing our belief about the association, we are in presence of a bias called selection bias.

Before jump into an example, its necessary to understand what I mean by \(X\) and \(Y\) are associated or related. The correlation (\(\rho\)) is a metric that measures the grade of a relationship between two variables. In general, a correlation measure the grade in which two variables are associated linearly. It’s a metric that goes from -1 to 1, in which -1 means a perfect negative linear relationship: more \(X\) is associate with less \(Y\), or vice-versa. A correlation of 1 is just the opposite: more \(X\) is associate with more \(Y\) and vice-versa. Finally, when \(X\) couldn’t be associated with more or less \(Y\), we said that the correlation between variables is 0, so there isn’t a linear relationship involved at all.

Imagine that you came up with an idea of how niceness and attractiveness are related, our \(X=Niceness\) and \(Y=Attractiveness\). This relationship was inferred by your own dates experiences and from the stories of your fine selection of friends. This is the kaleidoscope we referred to before, in which we deduce the idea of how attributes are associated just from the information around us without any concern. Sorry about the apparent banality of the example, but I promise that it’s a very good one to illustrate the concept. You can find this example better explained and a lot of other interesting things in the book Calling Bullshit (Bergstrom, West, 2020) Ok…let’s back to the point. it’s really common to hear from the popular wisdom library that handsome people are jerks or people who are enjoyable they’re not usually attractive at all, but…why it’s so common to hear this? Who is responsible to put in conflict attractiveness and niceness? The truth is that we don’t have many reasons to believe any relationship between these attributes, indeed both variables, in reality, correlate near to 0. But what could it explain this apparent contradiction? Berkson’s Paradox!

Below we have an animation of a scatterplot in what we measure each attribute from 0 to 100, i.e. for the absent to the plentifully of the attribute. In the title, you can see the correlation (\(\rho\)) of the current state.

It’s likely and honest to think that on average one restrict the potential candidates in which we are interested to get a date. You accept to date with someone who is not so good-looking if her or his eloquent and nice behaviour compensate the absence of visual gracefulness. Otherwise, you could tolerate an idiot in the way in her or his handsomeness diminished its jerkiness fragrance. As you can appreciate, from the above animation, your kaleidoscope removes from the map the group of candidates who don’t meet your minimal criteria, let’s call this group “You would not date”.

Unfortunately, there is another group who would use similar arguments to avoid our perfect combination of attractiveness and niceness and which we call “Would not date you”. This double mechanism of restrictions prevent us from “observed the world as it is” and explain how we go from a correlation close to \(0\) between attractiveness and niceness to a strong negative correlation of \(-0.7\). The illusion of the kaleidoscope gives us the impression of a false dichotomy between these two variables.

Additional resources

Code

The gif made by mounting three different plots created with ggplot2 (R) and using the tool Animated Gif Maker.

Here is the code if you want to reproduce the example:

library(ggplot2)
library(dplyr)
library(latex2exp)

# Multivariate normal dist parameters:
mu <- c(50, 50)
Sigma <- matrix(c(200, 4, 8, 200), 2)

# Generate 10.000 data points
set.seed(323)
df <- as.data.frame(MASS::mvrnorm(5e3, mu, Sigma))

# Compute correlation between V1 and V2
cor0 <- round(cor(df$V1, df$V2), 2)

# Initial plot
p0 <- df %>% 
  ggplot() +
  geom_point(aes(V1, V2), alpha = .3, size = .2, color = "steelblue") +
  scale_x_continuous(breaks = seq(0, 100, 20), n.breaks = seq(0, 100, 5)) +
  scale_y_continuous(breaks = seq(0, 100, 20)) +
  coord_cartesian(xlim = c(0, 100), ylim = c(0, 100), clip = "off") +  
  labs(x = "Niceness",
       y = "Attractiveness",
       title = TeX(paste("$\\rho$ =", as.character(cor0)))) +
  annotate(geom = "text", x = c(10, 90), y = -15,
           label = c("JERK", "NICE"), size = 4) +
  annotate(geom = "text", x = -10, y = c(10, 90),
           label = c("NOT", "HOT"), size = 4) +
  theme_bw(base_size = 8) +
  theme(plot.margin = margin(1.5, 1.5, 1.5, 1.5, "cm"),
        plot.title = element_text(hjust = 0.5),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

# Lower line:
# ------------------------------------------------------------------------------
# Create a lower diagonal line and assign an identifier to each group (2 labels)
# C: {LA=Lower Area, MA = Middle Area}
lower_line <- function(x, m=-1, b = 85) m * x + b

df <- df %>% 
  mutate(C = case_when(lower_line(V1) < V2 ~ "LA",
                        TRUE ~ "MA")) 

# Compute cor1
cor1 <- round(cor(df[df$C == "MA", "V1"], df[df$C == "MA", "V2"]), 2)

# Create p1
p1 <- df %>% 
  ggplot() +
  geom_point(aes(V1, V2, colour = C), alpha = .27, size = .2) +
  scale_x_continuous(breaks = seq(0, 100, 20)) +
  scale_y_continuous(breaks = seq(0, 100, 20)) +
  coord_cartesian(xlim = c(0, 100), ylim = c(0, 100), clip = "off") +
  scale_colour_manual(guide = FALSE, values = c("steelblue", "grey45")) +
  labs(x = "Niceness",
       y = "Attractiveness",
       colour = NULL,
       title = TeX(paste("$\\rho$ =", as.character(cor1)))) +
  annotate(geom = "text", x = c(10, 90), y = -15,
           label = c("JERK", "NICE"), size = 4) +
  annotate(geom = "text", x = -10, y = c(10, 90),
           label = c("NOT", "HOT"), size = 4) +
  annotate("text", label = "YOU WOULD\nNOT DATE", x = 20, y = 20, size = 3.5) + 
  geom_segment(aes(x=-5, y=lower_line(-5), xend=90, yend=lower_line(90))) +
  theme_bw(base_size = 8) + 
  theme(plot.margin = margin(1.5, 1.5, 1.5, 1.5, "cm"),
      plot.title = element_text(hjust = 0.5),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank())

# Upper line:
# ------------------------------------------------------------------------------
# Create a upper diagonal line and assign an identifier (3 labels)
upper_line <- function(x, m=-1, b=115) b + x*m


df <- df %>% 
  mutate(C = case_when((lower_line(V1) <= V2) & (V2 < upper_line(V1)) ~ "MA",
                       upper_line(V1) >= V2 ~ "UA",
                       lower_line(V1) < V2 ~ "LA"))

# Compute cor2
cor2 <- round(cor(df[df$C == "MA", "V1"], df[df$C == "MA", "V2"]), 2)

# Create p2
p2 <- df %>% 
  ggplot() +
  geom_point(aes(V1, V2, colour = C), alpha = .27, size = .2) +
  scale_x_continuous(breaks = seq(0, 100, 20)) +
  scale_y_continuous(breaks = seq(0, 100, 20)) +
  coord_cartesian(xlim = c(0, 100), ylim = c(0, 100), clip = "off") +
  scale_colour_manual(guide = FALSE, values = c("grey45", "steelblue", "grey45")) +
  labs(x = "Niceness",
       y = "Attractiveness",
       colour = NULL,
       title = TeX(paste("$\\rho$ =", as.character(cor2)))) +
  annotate(geom = "text", x = c(10, 90), y = -15,
           label = c("JERK", "NICE"), size = 4) +
  annotate(geom = "text", x = -10, y = c(10, 90),
           label = c("NOT", "HOT"), size = 4) +
  annotate("text", label = "YOU WOULD\nNOT DATE", x = 20, y = 20, size = 3.5) + 
  geom_segment(aes(x=-5, y=lower_line(-5), xend=90, yend=lower_line(90))) +
  annotate("text", label = "WOULD NOT\nDATE YOU", x = 80, y = 80, size = 3.5) + 
  geom_segment(aes(x=10, y=upper_line(10), xend=105, yend=upper_line(105))) +
  theme_bw(base_size = 8) +
  theme(plot.margin = margin(1.5, 1.5, 1.5, 1.5, "cm"),
      plot.title = element_text(hjust = 0.5),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank())


# Save plots
ggsave("/Users/YourUser/Desktop/bp0.png", p0, width = 7.5, height = 7.5, dpi = "retina")
ggsave("/Users/YourUser/Desktop/bp1.png", p1, width = 7.5, height = 7.5, dpi = "retina")
ggsave("/Users/YourUser/Desktop/bp2.png", p2, width = 7.5, height = 7.5, dpi = "retina")
comments powered by Disqus Go th the home page