Carpe Datum
Data Science for Life’s Big Questions
Preface
“[T]he most important questions of life are for the most part only problems of probability. It may even be said, strictly speaking, that almost all our knowledge is only probabilistic.”
— Pierre-Simon Laplace
Caveat
This book is an incomplete draft of a work in progress being developed as lecture notes for an online course. Content is provisional, contingent, and possibly wrong, but always well intended.
Guiding principles in this book
Question-driven
Because the presentation of topics in this book is question-driven rather than method-driven, this coursebook has some idiosyncracies. Some topics that might be considered rather basic may be omitted, while some topics that are typically considered as advanced will get (a simplified) treatment.
No proofs
As a mathematical subject, statistics is often taught with derivation and proof using definitions, simple assumptions, and the logic of algebra and calculus. Mathematical formulas are the standard language of statistics. This approach to learning is powerful if the math supports rather than gets in the way of understanding. However, for many learners, the math obscures rather than clarifies, and another way–using demonstrations and simulations–might enable understanding, as Johnson & Johnson once said, without tears.
Now, demonstration of an example of two is not a proof. Here is a false proof that all odd numbers are prime: 1 is prime, 3 is prime, 5 is prime… all odd numbers are prime! In fact, 9 is a counterexample that disproves the claim. That said, many repeated experiments can sometimes be convincing even in the absence of proof. For example, I can prove to you that if you take any whole number (e.g., 1, 2, 3, 7, 21, 118, 8675309), multiply it by 9, and then sum the individual digits of that resulting product, that the sum itself will be a multiple of 9
Example:
7 * 9 = 63; 6 + 3 = 9.
21 * 9 = 189; 1 + 8 + 9 = 18.
An elegant and simple proof can be constructed (one way uses modular arithmetic), but if you try it out yourself enough times, you won’t need the proof to be convinced. Twice, by the way, is probably not be enough! You could, however, write a computer program to test this equality a thousand times using a thousand (or a million!) random whole numbers. This is called a computer simulation. Still not a proof, but it can be convincing under the right circumstances.
Now problems like this one about multiples of 9 are often used to teach proof technique rather than to encode cute number-facts in memory. And indeed, for training statisticians, a rigorous mathematical presentation is important. So, for that matter, is computer simulation. For most users of this book, intuition and understanding are the priority, and the ability to derive formulas is not necessary. We will, in due course, bring out some computer simulations.