Chapter 7 Env Data Analysis 2017-01-29
If we are looking at multiple variables, what can we see by looking at multiple variables together or individually?
R command pairs()
to generate a pairwise plot for each variable in a data.frame
(i.e. for water level, water flow, water depth, etc).
How do we model these relationships?
We’ll start with a linear model and go on from there.
7.1 Probability
\[ p(x, y \in A) = \int\int_A f(x,y) \text{d}x \text{d}y \] Conditional and joint probability: see Bayes Rule. We can do these in a multivariate setting. Marginal distribution. Can add an example here.
If you’ve seen linear regression, it gives you \(\mathbb{E}(y) | x\) but we may want the full conditional distribution, particularly in our environmental context where we’re particularly interested in rare events (floods, droughts, hazardous pollution levels, extreme wind, etc)
See sample code:
multivariate_normal.Rmd
.
Reading: Helsel & Hirsch Chapter 8: Monotinic vs Linear dependence, rank correlation, Kendall’s tau
- Kendall’s \(\tau\)
- Spearman’s \(\rho\)
- Mutual information
# install.packages("locfit")
# This rescales so that it goes from -1 to 1
mutual_information <- function(x,y){
require(locfit)
xy <- cbind(x, y)
fxy <- predict(locfit(~xy), xy)
fx <- predict(locfit(~x), x)
fy <- predict(locfit(~y), y)
r <- log(fxy / fx / fy)
mi <- (1 - exp(-2 * mean(r))) ^ 0.5
return (mi)
}