Who Do You Think You Are?
Who Do You Think You Are? is a British genealogy documentary series where in each episode a celebrity traces their family tree, often revealing surprising and interesting discoveries! The last decade has seen the advent of consumer DNA testing companies like 23andMe, AncestryDNA, MyHeritage DNA, and FamilyTreeDNA, once only accessible to doctors and detectives, have been made available and promoted to anyone curious about where they came from. One main draw of these services is learning more about your genealogy and ancestry and this is very welcome as long as you are aware of the potential for unexpected surprises. Increasingly, DNA tests are bringing to light the Illegitimate spawning of progeny, adoptions, cover-ups and lies that have been concealed for decades. This “DNA-matching” can throw up previously unknown or unacknowledged brothers, sisters, cousins, uncles, and aunts … or even reveal that the man you call dad is not your biological father.
Understanding genes and tracing genealogy is possible because genes hold all the details of what makes an individual. The genotype defines an organism’s genetic architecture, it is a complete set of instructions on how that person’s body is constructed and functions. For humans, they will have 1 head, 2 arms, 2 legs, 2 eyes etc. The phenotype refers to the collection of genes that hold the unique characteristics of an individual, for example the colour of their eyes, hair, skin etc.
Tracing through genealogy and examining your genes will reveal traits about you and your origins, including those characteristics that may have been lost, forgotten about, undiscovered, hidden etc. In seeking to find answers about one’s self, provided that they are prepared to accept the findings, DNA testing can provide invaluable information.
The same principles of understanding and tracing genes can be applied to understanding data and data movements held in IT systems where metadata is the genotype and phenotype, and data lineage is equivalent to genealogy. So, when it comes to data, much like understanding genes and genealogy, then we are interested in determining information about the data, for example:
- Abandoned/Orphaned – data that has been collected and not used or is forgotten about
- Legitimate/Illegitimate – has data been collected legitimately and is it being used for its intended purposes?
- Mutated – has the data been copied correctly during data movements and transformations?
- Duplicated – are their justifiable reasons why the data is replicated?
- Redundant – is the data valid? Should it be archived?
- Data Mappings – The movement, joining, and splitting of data (analogous to gene reproduction)
- Lineage – tracing data movement (analogous to genealogy)
- Data Transformation – understand what operations have been performed during data movements (analogous to environmental and external factors that may have affected gene reproduction)
Sites such as ancestry.com, ancestryDNA.com, and government registrars provide records of births, deaths, and marriages and are sources of determining a person’s genealogy. DNA testing services can help establish parentage, and reveal personal genetics based characteristics and health predispositions as determined by your genes. In the same way, sources of data, metadata, and lineage are recorded in:
- A data catalog, an organized inventory of data assets defined by their metadata to help manage data. Data catalogs are used to collect, organize, provide access, and enrich metadata.
- Data maps, a portfolio of data mappings that define the movement of data.
These data maps and catalogs support data discovery, data lineage and traceability, and data governance. These provide data engineers and business users with the means to determine the veracity of data and have confidence in the data to answer questions such as “Who Do You Think You Are?”
If you would like to learn more on how Sandhill Consultants and erwin’s data catalog and data literacy capabilities can help solve your data lineage and provenance mysteries, join our latest webinar ‘DevOps: Shift Left to Reduce Failure By Testing Earlier’ on November 26th.