A data business can be described as a company whose primary concern is the monetization of data. Powerhouses like Google, Amazon and Apple are all examples of data businesses that generate and extract value from data at an extremely large scale. In fact, big data powers much of our modern-day business. But how did these companies become the massive data enterprises they’re known as today?
While the definition of data business implies the data is owned by the organization, the raw data often derives from a variety of sources, like databases, flat files, web services and more. That data is then processed to provide a higher value dataset that the business then sells to customers and partners who design their own innovative products and experiences — much like how travel providers, airports, tech companies and airlines use OAG's data to feed their existing systems.
This technology system — or the data factory as I like to call it — is the key for companies to become successful data enterprises.
Developing The Data Factory
There are some key components to consider when building your data factory. While obvious, ensuring there is a commercial need for your data comes first. For example, tap into your target market’s pain points and desires and commit to rapid experimentation to establish customer validation and confirm you can convert your information into monetary value. You also need to consider your unique selling point (USP). Understanding your competition — what they’re selling, their weaknesses and their strong points — is critical.
Once you set the guiding principles for your data, it’s time to build your data factory. Over the years, I’ve learned what practices to adopt (and avoid) for maximum impact. Incorporate the following steps to ensure you develop your data factory in the right way.
Normalize Data Sources
Within the data factory, you’re likely to process data from a variety of different sources, delivery mechanisms, degrees of quality and a variety of different formats such as CSV, text delimited, XML, JSON, proprietary or, in our industry, the Standard Schedules Information (SSIM). With good technology, you can normalize all your data formats to expedite processing. Standardizing this data will also eliminate duplicates and help your team with segmentation and analyzing.
Don’t Process Everything
Data businesses process large volumes of data on a frequent basis. I’ve often found that much of the data changes a lot less regularly than the updates we receive. For example, the volume of record changes is often high. Depending on the frequency of the updates, 30%-60% of records may need to be refreshed. However, when you scrutinize the data at a field level, there may only be a small percentage of fields that have changed. To save time, only process the deltas between each update.
Manage Exceptions With a Rules Engine
You need to ensure the quality of your data, so creating rules that catch erroneous data and block it from entering the system is crucial. However, to avoid burying this capability in the technology, you need to provide end-users with the ability to define and amend these rules.
Employ User-Driven Reprocessing
When your rules engine highlights potential errors, you also want to empower your users to override and reprocess data once verified. For example, a typical data feed is expected to change between 30%-60%. Your system would require thresholds that are able to flag if the volume of changes breaches those levels. However, a user should then be able to apply domain knowledge and override this threshold when needed.
Ensuring the ability for user-driven reprocessing should be an early consideration when creating your rules engine. Adding this on later can be a painstaking and difficult process.
Liberate Your Data
Making your data accessible to as many people as possible drives numerous benefits. For example, delivering broad access to employees outside of the technology or data operations departments will derive great value from data exploration such as improved data literacy.
Establishing partnerships with the right tech vendors can also help “unshackle” your data and open opportunities for innovation. Having a dynamic data system with increased access enables domain experts, engineers and data scientists to offer fresh perspectives, highlight potential issues ahead of your customers and uncover future opportunities that only business domain experts can identify.
Setting the Data Factory up for Success
While the data factory sets the foundation, it’s impossible to build a world-class data business entirely in a data platform. You need the right software — whether it’s your normalization engine, delta engine, rules engine or more. All these platforms require the best software engineering practices for rapid development and keeping ahead of the competition.
First, the selection of your data or cloud technology vendor plays a major role in setting your data business up with the correct platform. The provider you choose should have the right solutions, scale and support to properly address your organization’s needs — but don’t forget to take advantage of their resources to extract the most value.
Secondly, creating efficient pipelines and repeatable processes will power innovation and reliability as well as support the build, validation and release of updates onto your data platform. One framework that I find effective for this is CALMS (culture, automation, lean, measurement and sharing).
Your product is only as good as the people who develop it, which is why you need to invest in your data and software engineers — the people who deal with the data and possess the skills to work on these systems. Devote time to building a team not only with the right tools and skills but also with high levels of engagement and enthusiasm.
Building a world-class data business goes beyond owning unique data and the ability to monetize it. You need world-class technology functionalities capable of continuously maintaining and developing the data factory. With the right data and technology platforms, you’re on your way to success.
This article was originally published by Forbes Technology Council.