1 Introduction: Why, what, how, and who
Data science acts on the belief that if you approach data in just the right way, you can discover and unlock its meanings. As data become more varied and complex, data science helps in removing impediments to data’s meanings so that no data are off limits, no data have to go unlearned. Sometimes I approach data gently, as a data-whisperer intent on codiscovering with the data its own potential to reveal things about the world and to inform action in the world. Sometimes I wade in gingerly; sometimes I dive in; and sometimes I catch and ride the waves as the story within the data comes to the surface.
It’s easy to be skeptical of the concept of data science, especially when it seems like it means many things but not much of anything. “Data science is what data scientists do,” wrote Davenport and Patil (2012). Does the phrase convey anything substantive? Does it offer anything new compared, say, to the data-oriented fields of statistics and informatics? Let’s open with the why, what, how, and who of data science and then unpack these themes.
Why: Foremost, data science is about learning from data. Its purpose is broadly to bring together, in a rigorous way, all that goes into doing good things with data. Data science promotes principled use of the full breadth of methods, from the familiar to the unfamiliar, along with the norms to ensure that methods and results stand up to scrutiny. Data science helps us to keep up with evolving methods, tools, and technology for learning from data of all structures, sizes, shapes, and speeds in a way that other disciplines do not. Dynamic and complex technologies and data motivate but do not define data science.
What: Data science studies how to learn from data—especially complex or nontraditional data. It combines analytic, computational, and subject-matter methods to connect the whole life cycle of data: Frame what you want to figure out. Obtain and prepare data to engage the question. Preserve and share what was learned, how it was learned, and how that learning fits in with what is already known and with other choices that could have been made.
How: At the individual level, data science calls for technical and nontechnical skills. At the collective level, it calls for a forward-looking but grounded culture that supports putting those skills to use for doing good things with data. Technical skills cover analytic methods, such as statistics, machine learning, or causal inference, and computational methods, such as data wrangling and implementing and scaling algorithms. Nontechnical skills support good science generally and good data science specifically, such as the ability to approach a problem with curiosity, attentiveness, perseverance, open-mindedness, and creativity.
Who: Everyone who wants to do good things with data should get to make the effort, as long as they are rigorous and accountable. A rich culture for data science includes expert and nonexpert doers, learners, mentors, supporters, and advocates, organized to operate and keep up with fast-moving methods, tools, and technology for doing good things with data effectively and sustainably.
I unpack these 4 circumstances—the why, what, how, and who—in the next 4 sections.