Introduction to DataOps: Bringing Databases Into DevOps

DataOps is the streamlined combination of data development and data operations.  Data development, also known as data engineering, comprises the activities involved with engineering and evolving the data aspects of your technical solutions. Data operations comprises the activities to operate, support, and govern the data aspects of your technical solutions.

Figure 1 depicts the DataOps process loop. It is shown as an infinite mobius loop to indicate that DataOps is considered a continuous initiative that will last for the life of your organizational data. Data development is shown in the blue portion of Figure 1, comprised of activities to plan, code, build, and test your data assets. Data operations is shown in the yellow portion of Figure 1, comprised of activities to release (or more accurate to decide to deploy), deploy, operate, and monitor your data.

Figure 1. The DataOps continuous process loop (click to enlarge).

DataOps/Data DevOps


Critical DataOps Practices

Figure 1 above indicates what I believe to be key practices supporting DataOps.  These practices are:

  1. Agile data architecture. Data architecture is the foundation of a data strategy that supports your organization’s goals and priorities. Agile data architecture does so in a collaborative and evolutionary (iterative and incremental) manner.
  2. Agile data modeling. Data modeling is the act of exploring data-oriented structures. Agile data modeling is data modeling done in an evolutionary and collaborative manner.
  3. Agile data engineering. The work required to technically implement data assets, including data stores, data tooling, data transmission, and other components.
  4. Automated database regression testing. The validation of a data asset, in particular a database, in an automated manner. This is achieved through the creation of a test suite, comprised of automated tests, that is invoked via a testing tool.
  5. Continuous database deployment (CDD). Continuous deployment (CD) is the automatic deployment of a solution once it has passed any requisite quality criteria.  CDD is CD of a data store.
  6. Continuous database integration (CDI). Continuous integration (CI) is the automatic invocation of the build process for an asset. CDI is CI of a data store.
  7. Data lineage. Data lineage is the act of fully tracing a data element through the processing steps performed on it from source to destination
  8. Data security. Data security is the practice of protecting digital information from unauthorized access, corruption or theft throughout its entire lifecycle.
  9. Database refactoring. A database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics.
  10. Lean data governance. The goal of data governance is to ensure the quality, availability, integrity, security, and usability within an organization. A lean data governance approach promotes a healthy, collaborative relationship between data professionals and the teams that they’re supporting.
  11. Manual data testing. The validation of a data asset in a non-automated manner.
  12. Operational data quality assurance. The ongoing validation that operational data meets or exceeds the quality criteria set for it.
  13. Test data generation. Tooling and procedures to generate artificial data for testing purposes.

Yes, there are many more data-oriented practices that you will adopt to make data devops work in practice, but the ones listed above are the critical ones in my opinion.


Isn’t DataOps Really Data DevOps?

Yes, DataOps should be more accurately called Data DevOps.  DataOps is a much sleeker marketing term, and marketing tends to win out over accuracy in practice.  Although I have used the terms “Data DevOps” and “Database DevOps” since around 2018, I’ve decided to abandon them in favour of DataOps.


Related Resources