4 How

4.1 Foster a progressive culture

If we think of data as an asset, how does that asset produce value within the public health mission and its available resources? How can public health scientists who care about data keep up with fast-moving methods, tools, and technology for learning from data? A progressive culture intentionally orients itself proactively and not only reactively, toward advancement and not just tradition. While a progressive culture encourages innovation, more importantly this community continually expands the set of tools for doing good things with data and applies judgment for selecting among familiar or conventional options as well as unfamiliar or unconventional options. A progressive culture for data remains rooted in history, continues to learn from old data in new ways, anticipates the future, and handles evolving demands to keep up with fast-moving methods, tools, and technology.

Here I sketch a vision for fostering the practice of data science across disciplines and levels of experience by describing 3 components of a progressive culture for data:

  1. developing know-how through data-savvy technical skills to bridge domain knowledge and methods for learning from data,

  2. cultivating data-wise nontechnical skills to drive problem-solving with data (start inquiry, keep it on track, and deal with obstacles), and

  3. participating in an empowering community of mentors and peers to enable self-learning and foster practical wisdom.

After describing technical and nontechnical skills, I map those skills to an expanded treatment of Peng and Matsui’s core activities of data science. Then I sketch functions and roles in an empowering community. In the next section, on who does data science, I more fully articulate those functions and roles along with the level of technical and nontechnical skill needed for each.

4.2 Foster technical skills

What skills are required to practice data science rigorously? What about those who want or need to practice data science well but who don’t need to be expert data scientists? The core activities of data science call for knowing how to pose a good question, how to compile and prepare the data to answer the question, and how to extract, interpret, and convey meaning from the data in answer to the question. Data science skills are often represented (for example, NIST Big Data Public Working Group (2015)) as the cross-disciplinary intersection of 3 sets of technical skills that cover these core activities: domain-specific skills for posing a good question and interpreting and explaining results; computational skills for corralling, structuring, and applying algorithms to data; and data-analytic skills, including communication skills, for extracting, interpreting, and conveying meaning from data.

Domain-specific skills cover any subject about which one might want to use data to answer a question, including public health, epidemiology, medicine, microbiology, toxicology, and anthropology. In practice, different fields often call for different norms for rigor. Epidemiology establishes modes to reason about bias and causation. Medicine institutes norms for assessing preventive and therapeutic efficacy and effectiveness. Microbiology and toxicology work out how to establish and measure the presence of a pathogen or toxin for ascertaining individual cases.

Computational skills cover how to use theory, hardware, and software to represent and work with data of various structures, sizes, shapes, and speeds, to enable transmission and exchange of data and other information among systems and among users, to implement algorithms for working with and analyzing data, and to manage the efficiency of all of these undertakings. How should textual information, audiovisual information, and other types of information be represented for further computational access and use? Having obtained and stored various types of information, how should they be processed and arranged in preparation for analysis? How can algorithms for working with data make the best use of available computational resources, such as memory and processing time? How can algorithms be implemented to work with increasing volumes, speed, and complexity of data while ensuring that computational results are available in an acceptable amount of time and other limitations? Computational skills cover or overlap programming, data-wrangling, software engineering, statistical computing, and methods for breaking up high-volume, high-velocity, or otherwise intensive data problems into smaller pieces, processing them, and reassembling the output.

Data-analytic skills, as discussed above, encompass statistical methods, machine learning, and other modes of data analysis. Statistical modeling typically refers to using probability to think about how data might have been generated and then using data to figure out how we might separate a representation of something about the world (signal) from variability or uncertainty about that representation (noise). Statistical methods include simple summaries like means and medians and more complicated summaries like tables, regression models, and time-to-event models. Machine learning typically refers to asking whether we can find patterns within a set of data, like clusters of similar counties or patients or topics in a set of documents, or patterns that relate inputs to outputs based on examples, such as for predicting a patient’s disease status or prognosis from available insurance claims and billing information. Other data-analytic approaches include causal, geospatial, and econometric methods. These methods often overlap, and they often incorporate but don’t always center on probability components.

I link communication skills primarily with data-analytic skills, because an analyst often has primary responsibility for interpreting, representing, and conveying methods and results. These skills include the ability to use verbal narrative, tables, and graphics to explore, develop, and tell a story that translates results into stories, decisions, and actions.

While data science is often represented at the 3-way intersection of domain-specific, computational, and data-analytic skills, it is also instructive to review their pairwise overlap. The combination of domain-specific and computational skills could encompass domain-specific software development, as with medical or laboratory applications. Domain-specific and data-analytic skills entail applied research, as in epidemiological applications. And computational and data-analytic skills overlap in statistical computing, machine learning, and other applications that implement mathematical algorithms and optimization.

We can roughly associate each core activity in data science with technical skill areas.

Pose good questions. Domain knowledge is needed to state and refine a good question. An awareness and understanding of a broad and rich variety of data-analytic methods can also enhance the kinds of questions that one could pose.

Prepare data to address those questions. Domain knowledge informs what to measure or assess, and computational skills inform how to obtain, organize, store, transmit, extract, and transform data. Data-analytic skills support assessments of whether the data can answer the question.

Probe the data through rigorous analysis. Building a formal model depends primarily on data-analytic skills, supported by strong computational skills for implementing the analysis, especially when working with complex data or methods.

Place analytic results in context. Interpreting models depends on the data-analytic skills to construct them and to critique model-related assumptions, as well as domain knowledge to place the results in context of what is already known or perceived about the domain subject.

Present methods and results. Communication draws on data-analytic skills for correctly describing methods and formal results, as well as domain knowledge for correctly describing and relating to subject-matter.

Preserve the entire life cycle. Predominantly, computational skills support procedures for openness and traceability, including preparation of data and code for restricted or unrestricted sharing.

Of course, it is very likely that every core activity will draw on all 3 types of technical skills.

4.3 Foster nontechnical skills

To practice data science well and to keep up with constant change, it is not enough to focus on technical skills and knowledge. Technical skills cover the know-how for answering good scientific questions rigorously using data, but technical skills have limits and can become obsolete as methods, tools, and technology advance.

Nontechnical skills (sometimes called “soft skills”) are personality traits, goals, motivations, and preferences that are valued in an applied domain. For example, collaboration and communication call for interpersonal skills. In addition, many sources (such as Davenport and Patil (2012)) emphasize that those who practice data science should be passionate, curious problem-solvers. Here I pay special attention to traits that support, and even empower, learning from data through its life cycle, centered on analysis and subject to scientific norms. In other words, I describe and unpack the traits that flow from a love of knowledge and learning, followed by traits that support the ethical conduct of data science.

4.3.1 Intellectual character

In a progressive culture for data, fostering intellectual character can cultivate responsible learners and inquirers who are better able to keep up with fast-moving methods, tools, and technology. Intellectual virtues flow from a love of knowledge and learning, aiming at “cognitive goods”, like truth and understanding (King 2014; see also Costa and Kallick 2008). In data science, the practitioner seeks understanding mediated through data and the life cycle of data. Intellectual virtues animate scientific practice in general and data science in particular. This subsection draws heavily on the work of Jason Baehr, especially Baehr (2013a), Baehr (2015), and Baehr (2013b).

Baehr (2015) describes 3 dimensions of an intellectual virtue: First, an ability or skill specific to a virtue and leading to action. For the trait of curiosity, this skill is asking good questions. Second, the motivation or commitment to apply the virtue. With curiosity, the motivation is to ask good questions because of a love of knowledge or learning. Third, the judgment or sensitivity to know when and how to exercise virtuous abilities or skills. With curiosity, the sensitivity concerns when to start, continue, pause, or stop inquiry. In addition, each virtue can be seen as the mean between vices—too little of a good thing and too much of a good thing. Too little curiosity is the vice of indifference, while too much curiosity is the vice of obsession or fixation.

We can identify several intellectual virtues by examining the dispositions associated with stages of inquiry when approaching an objective: starting to learn and heading in the right direction, keeping the inquiry on track, and dealing with obstacles. For each stage of inquiry described below, I list stage-related virtues and use the pipe character (“|”) to delimit each virtue’s corresponding ability or skill, motivation or commitment, judgment or sensitivity, and vices representing too much or too little of the virtue.

Start learning and head in the right direction. A few intellectual virtues relate to how to start learning or start an inquiry and ensure that it heads in the right direction: In addition to curiosity, intellectual autonomy is the ability to think for oneself, and intellectual humility is the ability to admit one’s limitations—to know what you don’t know.

Curiosity: Ask good questions | to learn | discerning when to start, continue, pause, or stop the inquiry | mediating between indifference and fixation.

Intellectual autonomy: Think for oneself | to achieve independent thought or self-assuredness | discerning when to yield to others or differentiate from others | mediating between conformity and radicalism.

Intellectual humility: Admit one’s limitations | to recognize what one is able or unable to do or to locate oneself in the context of others’ interests | discerning when to assert oneself or to stand back | mediating between arrogance and self-deprecation.

Keep the learning process on track. After starting an inquiry, a few intellectual virtues assist the learner in keeping on track: Attentiveness is the ability to engage, to look and listen, and to notice details. Carefulness is the ability to spot and avoid errors. Thoroughness is the ability to go deep in order to gain understanding and to explain.

Intellectual attentiveness: Look and listen | to remain alert to details | discerning when to tune out or to focus more intently | mediating between distractedness and preoccupation.

Intellectual carefulness: Avoid errors | to assure or control the quality of one’s output | discerning when to ease up or to double down on quality control | mediating between sloppiness and perfectionism.

Intellectual thoroughness: Go deep to understand | to ensure sufficiently complete coverage or treatment | discerning when to fill gaps or let well enough alone | mediating between superficiality and meticulousness.

Deal with obstacles. Even on track to learning, one is likely to encounter obstacles. A learner benefits from intellectual virtues that help work through or around obstacles: Open-mindedness helps to think outside the box when confronted with a challenge to solve. Courage helps to be bold and to take intellectual risks. Flexibility helps to adapt as needed. Tenacity or perseverance helps to embrace struggle while working through a challenge.

Open-mindedness: Think outside the box | to consider new or unfamiliar ideas and seek diversity and inclusion | discerning which ideas to dismiss or to entertain an idea | mediating between narrow-mindedness and gullibility.

Intellectual courage: Take intellectual risks | to allow for bold action despite potential for failure | discerning when to tolerate more or less potential for failure | mediating between cowardice and foolhardiness.

Intellectual flexibility: Adapt as needed | to allow for change, especially for improving outcomes | discerning when and how much to stand firm or alter activity | mediating between intransigence and suggestibility.

Intellectual tenacity: Carry on | to continue toward learning objective, even when challenged | discerning when to persist and when to stop trying | mediating between fickleness and stubbornness.

Intellectual virtues can conflict with each other. For example, courage can conflict with humility when the drive to take an intellectual risk runs counter to the limitations of one’s abilities (when one’s reach exceeds one’s grasp). To navigate these conflicts, the good learner or thinker is aided by the mediating virtue of practical wisdom. This trait allows the inquirer (phronimos, per Baehr (2013a)) to grasp which intellectual activity is most valuable for attaining one’s goals. Recall that Baehr (2015) identifies one dimension of an intellectual virtue as judgment or sensitivity about when and when not to exercise that virtue. Practical wisdom undergirds this dimension, and it allows the good learner or thinker to take suitable action when intellectual virtues conflict. (See Turri et al. (2021) and Baehr (2013a).)

Even if, as a good learner or thinker, you are motivated to apply intellectual virtues, you still need to develop the abilities and judgment that connect your motivation to right action. Intellectual virtues are developed by practicing them and by critical reflection on your own actions and dispositions. You get better at courage by practicing courage—by taking intellectual risks and learning from the consequences. You get better at humility by practicing humility—by owning your limitations and not shying away from them. You also develop or cultivate practical wisdom—to avoid vice and to mediate conflicting virtues—through practice and guidance and seeing them modeled by others. Curricula and other resources, including literature, computing resources, and mentors, can help intentionally and systematically cultivate intellectual virtues. I return to these ideas in the section on learning data science in community.

Just as we associated each core activity with technical skills, we can also associate the critical reflection process with nontechnical skills. Setting expectations corresponds to starting learning and heading in the right direction, which calls for curiosity, autonomy, and humility. Collecting information and comparing expectations with that information corresponds to keeping the learning process on track: attentiveness, carefulness, and thoroughness. And dealing with matched or mismatched expectations and information corresponds to dealing with obstacles: open-mindedness, courage, flexibility, and tenacity.

Intellectual virtues, by stage of inquiry.
What: skill Why: drive How: practical wisdom
Virtue, by stage Skill or activity Motivation or commitment Judgment or sensitivity Mediating between too little and too much
Start learning
Curiosity Ask good questions learn about the world when to start, continue, pause, or stop the inquiry indifference /
fixation
Intellectual autonomy Think for oneself achieve independent thought or self-assuredness when to yield to others or differentiate from others conformity /
radicalism
Intellectual humility Admit one’s limitations recognize what one is able or unable to do when to assert oneself or to stand back arrogance /
self-deprecation
Keep learning on track
Intellectual attentiveness Look and listen remain alert to details when to tune out or to focus more intently distractedness /
preoccupation
Intellectual carefulness Avoid errors assure or control the quality of one’s output when to ease up or to double down on quality control sloppiness /
perfectionism
Intellectual thoroughness Go deep to understand ensure sufficient coverage or treatment when to fill gaps or let well enough alone superficiality /
meticulousness
Deal with obstacles
Open-mindedness Think outside the box consider new or unfamiliar ideas which ideas to dismiss or to entertain an idea narrow-mindedness /
gullibility
Intellectual courage Take intellectual risks allow for bold action despite potential for failure how much potential for failure to tolerate cowardice /
foolhardiness
Intellectual tenacity Carry on continue toward objective, even when challenged when to persist and when to stop trying fickleness /
stubbornness

4.3.2 Ethics and values

Where intellectual virtues connect a love of knowledge and learning to the practice of asking and answering questions, ethics and values promote behaviors to achieve other goods, including trust, equity, and fairness.

We seek to protect personal privacy, and to balance privacy and utility, with specific behaviors throughout the life cycle of data: posing questions that do not raise undue risk to respondents; obtaining, using, communicating about, and sharing data in ways to limit risks to privacy and confidentiality; interpreting and communicating findings in ways that respect other rights and the welfare of the subjects of analysis; and promoting openness, transparency, and other aspects of data utility to make the overall process, methods, and final products available for scrutiny.

Further considerations concerning ethics and values in the practice of data science stem from the conduct of research involving human subjects, the conduct of public health surveillance, scientific integrity, and public service. Many of these considerations pertain to data, data systems and informatics, and data analysis. They go beyond privacy and confidentiality to justifications for gathering information; for balancing benefits and harms, burden and utility, access and security; self-determination and substantive engagement; justice; duties to limit collections and to use what is collected; and responsibility to avoid fabricating, falsifying, and plagiarizing. These duties are covered extensively elsewhere and are often implemented through regulation, policy, checklists, and other forms of guidance. (See, for example, CDC's Office of Public Health Ethics and Regulations and Privacy Program.)

In a progressive culture for data, we value data because data help us to learn things about the world and to make informed choices about how we interact with the world. We value innovation and technology insofar as they help us to continue expanding the means for doing good things with data, but we do not seek innovation or technology as ends in themselves. An extensive set of tools gives us the broadest options for doing good things, so we remain open both to the unfamiliar or unconventional and the familiar or conventional. Based on these values, we practice pragmatic, principled pluralism by exploring and using wisely and well all methods that can help achieve technical excellence to learn from, about, and with data. Principled pluralism allows honest disagreement about methods, results, and interpretation.

4.4 Foster community and leadership

Community and leadership form the essence of culture in a progressive culture for data—so essential that I defer the full discussion to the next section (5). In this section, I sketch how community and leadership enable the practice of data science along the following dimensions:

Learning. Community supports learning about data, and learning how to do data science, by centering on learners. Learning can follow formal curricula and be encouraged in structured programs, but substantial portions of learning occur in informal settings. Learners benefit from interactions with peers and mentors. Mentors benefit from meta-mentors. Advocates influence, guide, and support the learning-oriented community.

Doing. Community supports the practice and profession of data science by giving everyone who wants to do good things with data the resources to do so. Data science learner-practitioners with basic or intermediate data skills come from any discipline to do good things with data. Expert practitioners go deep on data science methods and guide practitioners to proceed with rigor and stand behind their work. Managers supervise practitioners and experts, to ensure that they have the resources and direction that they need to achieve good things with data. Lay advocates, as persons literate in the value of data, work in community with practitioners, experts, and managers and help ensure supportive resources to enable the practice of data science.

Staffing. Community creates and ensures the capacity for data science through staffing and career development by all available means to recruit, retain, organize, and develop learners, doers, and supporters. This includes identifying and building on the data science potential among existing staff, finding and using mechanisms to bring on learners as well as other federal and nonfederal staff, and organizing formal and informal structures for staff to learn, do, manage, and support data science effectively.

Leading. In a progressive culture for data, leadership aims toward and flows from practical wisdom. Leadership is part of the practice of data science, and not separate from it. Leaders include practitioners, experts, managers, and laypersons, regardless of their career stage, job title or series, credential, or location in the hierarchy (subject to some structural constraints in the federal system).

Within and across these dimensions, an individual can carry out more than one role or function. For example, in the doing dimension, a practitioner can serve as both an expert and a manager. The same person can also serve as meta-mentor and advocate in the learning dimension. In the next section, I expand on the roles and functions that align with these dimensions.

References

Baehr J. 2013-05b. Educating for intellectual virtues: From theory to practice. Journal of Philosophy of Education. 47(2):248–262. https://doi.org/10.1111/1467-9752.12023
Baehr J. 2013a. The cognitive demands of intellectual virtue. In: Henning T, Schweikard DP, editors. Knowledge, Virtue, and Action: Putting Epistemic Virtues to Work. 1st Edition. New York: Routledge; p. 99–118. https://doi.org/10.4324/9780203098486
Baehr J. 2015. Cultivating Good Minds: A Philosophical & Practical Guide to Educating for Intellectual Virtues. https://jasonbaehr.gumroad.com/l/IJxPL
Costa AL, Kallick B. 2008-12. Describing the habits of mind. In: Learning and Leading with Habits of Mind: 16 Essential Characteristics for Success. Association for Supervision; Curriculum Development; p. 15–41. https://www.ascd.org/books/learning-and-leading-with-habits-of-mind?chapter=learning-through-reflection-learning-and-leading-with-habits-of-mind
Davenport TH, Patil DJ. 2012-10. Data scientist: The sexiest job of the 21st century. Harvard Business Review. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
King N. 2014-09. What are intellectual virtues? Five key features of the intellectual virtues. https://cct.biola.edu/intellectual-virtues/
NIST Big Data Public Working Group. 2015-10. NIST Big Data Interoperability Framework: Volume 1, Definitions. National Institute of Standards; Technology. https://doi.org/10.6028/NIST.SP.1500-1r2
Turri J, Alfano M, Greco J. 2021. Virtue epistemology. In: Zalta EN, editor. The Stanford Encyclopedia of Philosophy. Winter 2021. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/win2021/entries/epistemology-virtue/