London, The UK - 11/09/2022
I like everything there is a pinch of quantum in it; I don't mean to lecture you about quantum physics, but I love analogies, and I am good at making them up.
"A stable subatomic particle that has a positive charge equal in magnitude to a unit of electron charge and a rest mass of 1.67262 × 10−27 kg, which is 1,836 times the mass of an electron." (ref: Britannica online)
Making things a bit more complicated than it looks is my thing if I am not doing anything productive because I am just the opposite of that while working, so forgive me; this is my spare time, and I want to start with this AI-created convoluted picture by craiyon.
The guy in the picture may look so sad in a random elevator (he looks like older me, too), but don't worry, the story getting its phase. Let's intuitively analyse this; an elevator is holding its weight along with the guy. It can even carry four more people, so he feels safe, although he seems otherwise.
What is the total weight the elevator ropes carrying approximately?
Basically weight of the elevator plus the weight of the guy, right? It is correct. We can go from here and make some educated guesses but let me pause here and tell you something new.
I trust you know LHC (Large Hadron Collider), where we all have been waiting to find out how we ended up in this world, right?
Okay! I am writing this story on the 11th of September 2022, and about 20 days ago, scientists' measurements gave some weird results. According to the measurement, one of the particles that make a proton, called "charm", is heavier than the proton altogether! In other words, this guy is heavier than the elevator with him in.
How is this possible? I am discussing this with my 10-year-old, and he has come up with something like if the particle floats in another medium, then effective mass may be different. The scientist also explains this with Quantum physics, so a very good discussion is going on: I am proud of him.
Ok, let's bring it down to the data!
I, as a fellow data scientist, everywhere I look, see the data as marbles. It could be standing still or moving. If it is standing still, then I check its colour, weight and material so on. But if it is moving, it is important to know which way and how fast it is moving, when it started and when it stopped, along with all the other properties. I can find you 100 different data points (feature) if we are rolling marbles.
Let's give it a go from quantum science to data science. It is going to be fun! Do you see how I am good at making things complicated? Anyways, if an atom has particles and one of the particles is called "proton", which has quarks with different weights and characteristics, strange enough, one of them, called charm, has more weight than the proton altogether!
Here we are landing; as we see the names, attributes and relations of an atom and specifically a proton, we go deep into the data model.
So, how about other things? How do they make an atom, and why is this strange measurement?
This is the part we miss quite a bit. One of the main problems in data science projects, people don't know how the data behaves in the system. Charm particle is heavier than Proton; who cares! Hydrogen (H) has a single Proton and is the lightest element, good!; but two H and one O will give you water; this is an important behaviour which tells you much more about "the business".
Nevertheless, uncovering the business domain doesn't require quantum physics know-how, nor is it hard; it requires two things to unleash:
discovering the data points and the relationship between them
understanding the data in real-world examples to see the relations and behaviour of the data
The domain model notes the Charm particle behaviour; it contains a lot of weight, and if we were successful, it would explain how it collectively creates a Proton with other particles.
Coming to our challenges as data scientists, when we apply the science, we often conduct surface exploration of the business problem and define everything as numbers. It is easier to ignore the domain knowledge and focus on the numbers but not how the numbers landed there. This is where "data lies"; it is where it comes as biases later in our models.
So, a data scientist should have the COURAGE to learn from the business stakeholders before crunching numbers. This is the key to understanding the domain and building robust, accurate and unbiased models.
Good luck!
Emrah Gozcu
---
P.s. Once, I made up a word called TRUST from SCRUM values, but I use it for a completely different purpose than the SCRUM does. It is not even an abbreviation but something like:
respectT
couRage
focUs
oppenneSs
commiTment
In this post, I wrote about what I understand from courage; I will try to write about the other values too.