Picture Data Solution

The story of DataPlatformGenerator.com

Dec 01, 2021

Picture Data Solution — Photo by FORTYTWO on Unsplash

The Picture

A big problem with the data domain today is that we have an explosion of tools, but very few End-2-End solutions, and especially turnkey or out-of-the-box solutions. Though we may understand and experience the utility of a particular tool, or quantify somehow its value, once this tool is part of a bigger "Picture" integrated with other tools in a solution, this utility and value might fade away or become less relevant.

So far, during my professional career, no one was able to answer this "simple" question: "What is the value of your data solution?" All the discussions around this subject are about CAPEX, OPEX, TCO, which are valid points but stressful for the CTO, CIO, Enterprise Architects, and other important stakeholders.

Failing to answer this question in a few and simple words may lead to different results:

Your data program is poorly financed or worst, canceled.
Your small data team cannot grow, "because we have a lot of people doing the same stuff in Excel".
Your fancy "modern data stack" solution is put on hold as "it will generate more technical debt".
Your data governance initiative is a "must" because and only for the GDPR scarecrow.

If you have an answer to this question, and it can be expressed in one simple sentence, please send your comments, I'm always open to learning.

The Story

If you are not yet bored with my little surgical incision in the most wanted worldwide data unknown, bear with me a few minutes, as I go into details and tell you a story from my consultancy career.

But first, a short background on what a data solution is about, for everyone to understand.

An End-2-End data solution is usually built from several components: a data extraction and integration component, a data repository component, a data transformation component, a data visualization component, a data science component, a data management component, and so forth.

Some of these are seen as platforms fulfilling multiple capabilities, some of them are dedicated software fulfilling one maximum of two capabilities, and some of them are just plugins or libraries that sit on top or integrate with these aforementioned components.

For the sake of simplicity, in this article, I will call them "tools", though I feel that some of you reading this will disagree with me.

According to a study made by Gartner a couple of years ago, an End-2-End data solution inside a company is built from at least 10 tools, and to put all these puzzles pieces together is always challenging for them.

Now, back to the story…

I was working for a big client, with thousands of employees, with the scope to optimize and rationalize their current data solution landscape and prepare it to evolve and support the digital transformation journey of the company.

I started to work and after one month we were able to identify all the data tools in the company. We found not 10, not 20,… but 111 data tools this company was using or… somehow using! A magic number, I may say!

I informed my boss, and he told me:

"Adrian, let's put all of them in a big picture on our wall, to scare people away when they enter our office!", so the "Picture Data Solution" concept was born.

With the Picture Data Solution in hand, we went to my boss's boss, and after 4h discussion - where multiple words invoking divinity and several anatomy elements that I cannot describe here - the final decision was to find a way to simplify, optimize and standardize the current data solution landscape.

For the next 3 months, I was the "Picasso" of the enterprise diagraming tools, struggling to come up with the ideal solution for my client. Version 27 was the lucky one, where from 111 data tools we reduce to 23 of them.

My way of approaching this problem was using TOGAF enterprise architecture standards, where you take each tool, decompose it into application components and functions, map them to different business processes and capabilities, and at the end, you build a matrix where you can spot the overlapping ones.

In this case, we had, for example, 5 data extraction, transformation, and integration tools (ETL tools), 12 relational databases, 8 visualization tools. From the discussion with users, we got the information that some of the tools are not used at all, some are used a couple of times per year/month, and only a few of them are used in the day-by-day operations. That was my base core for Version 27 of the Picture Data Solution.

The big day has come, and we presented these findings in the SteerCo meeting, to all the important stakeholders in the company. This was the first time they uttered the words: transparency, clarity related to data solutions costs, as in the meantime we managed to assign some financial metrics to our picture.

The other outcome of this meeting was kind of unexpected for me or I was just not prepared at that time to give a response.

When you have thousands of employees working with some of these 111 data tools, What will be the best way to tell them that in 12/24 months they will only use 23 of them?

You have the "lucky ones" that already work with these tools and hopefully, there is no problem here, but you have the "unlucky ones" that clearly will have a problem.

Do we need to train them? How fast they will learn?
Do we need to move them to different positions or re-assign departments?
Fire them? Or they will quit their jobs before we fire them?

From the technology perspective:

What is the impact of decommissioning 88 tools, as they are dependent on the rest of 23 tools? What are the first ones to go? Will the world end?
Can we replace "n" existing tools with 1 tool? We will have the same capabilities? How about usability?
How about the new fancy data tools from the market that are popping up daily on ProductHunt?
Can we integrate them into the existing Picture Data Solution?
My favorite: Can we find the magical tool(s) that solves our existing problems?

These are sensitive questions with no simple answers I'm afraid. But any company out there will have to be prepared with answers at some point in time. Sooner, is better.

The Site

Now, back to our present days…

The last Matt Turck’s Data & AI landscape was just released, and I was discussing with my friends and colleagues about the data landscape future and the challenges companies have with the new wave of technologies. We concluded in a couple of points and "rhetorical" questions which might be practical, philosophical, or "crazy" for this present time:

We have a lot of technologies, but very few solutions.
- Most of the companies will buy tools from the market, but these tools will end up being part of their solution: current or future.
What is the best and fast way for companies to test and validate these new tools?
Or better, test and validate them with the existing solution to see how they integrate into the landscape?
Do we need to buy data tools anymore, for the next 2-3 years, or just rent them on-demand when we have a use case?
- With the advancement of DevOps, Cloud Native Services, Infrastructure as Code, this is possible today, not with all the tools, but this is the way.
- Imagine the following scenario:
  - You have your use case,
  - You choose different data solutions to test, with different tools,
  - You deploy them in minutes, with a few clicks, on your cloud provider of choice,
  - You run your experiments and validation
  - Pick the data solution that you need for your workloads
  - You keep it running for a limited time until your use case produces value for the company.
  - You save your outcome data to your "Data Temple", and in the end,
  - Terminate the environment.
- You only keep the core components of your data solution fixed, which are business-critical, as I call it the "Data Temple", an architecture pattern I will detail in future articles.
- Do you think this will improve your cost metrics?
- How simplified and transparent will be your Picture Data Solution in this scenario?
How companies will know how to combine these tools into data solutions that work?

The last point was the trigger for us to build the Data Platform Generator website, to present to the world the Picture Data Solution concept.

Our idea was simple:

Let's take some technologies from the MAD Landscape and generate simple visual data solution blueprints to inspire other data aficionados in the design process of a data platform.
Also, expose them to different technologies that exist on the market which could be good alternatives to already better-known technologies.
Create a generator to display one picture at a time - we have millions of combinations. Can you discover all of them?
We used for the first time No-Code applications like Bubble and Airtable
When working on the project, my lovely wife told us about Scott Brinker's Martech landscape and we lose our breath for a few seconds. Can we integrate the MAD & Martech? Do you like M&M?

The End

Officially we launched today our page, though we run into some design issues with the mobile terminals. After all, we are architects not web designers.

Well, actually this is incorrect, 20 years ago I run a web development agency! Adrian, you fool!

Are these combinations valid?

We don’t know, we are not experts in all the data tools, but if you know it, share your knowledge with the world by submitting your comments.

We only sell you the Picture Data Solution concept.

Have fun, be inspired, and we hope you enjoy it!

Don't forget to share it with your friends if you like it!

About the Author

I’m the Founder and Enterprise Data Architect at DataStema, a company based in Luxembourg, that unlocks the business value of data by making it easy for companies to automate and deploy effective data solutions.

You can connect with me on Linkedin, Twitter, and contact me on my official company email and let me know that you are a reader of my Substack posts in your invitation message.

Data Aficionados