2018: The Year of the Self-Learning Data Organization
As 2017 ends, Ramon Chen, Chief Product Officer at Reltio, the creator of data-driven applications, has peered into his crystal ball to decipher what 2018 will bring in data management. Find his predictions below.
2018 will be the year of AI and Machine Learning … again: There have been repeated predictions over the last couple of years touting a potential breakthrough in enterprise use of Artificial Intelligence and Machine Learning (ML). While there are no shortage of startups – CBInsights published an AI 100 selected from over 2000+ startups – the reality is that most enterprises have yet to see quantifiable benefits from their investments, and the hype has been rightly labelled as overblown. In fact, many are still reluctant to even start, with a combination of skepticism, lack of expertise, and most of all lack of confidence in the reliability of their data sets.
In fact, while the headlines will be mostly about AI, most enterprises will need to first focus on IA (Information Augmentation): getting their data organized in a manner that ensures it can be reconciled, refined and related, to uncover relevant insights that support efficient business execution across all departments, while addressing the burden of regulatory compliance.
The elements of data strategy (Source: HBR May-June 2017)
Enterprise data organization, not management, will be the new rallying cry: For over 20 years, the term data management has been viewed as a descriptor, category and function within IT. The term management represented a wide variety of technologies ranging from physical storage of the data, to handling specific types of data such as Master Data Management (MDM), as well as concepts such as data lakes, and other environments. Business teams have lost patience with the speed, and efficiency in which they are able to get their hands on reliable, relevant and actionable data. Many have invested in their own self-service data preparation, visualization and analytics tools, while others have even employed their own data scientists. The common refrain is that data first has to be made reliable, and connected with the rest of the enterprise, so that it can be trusted for use in critical business initiatives, and isolated initiatives such as MDM and Hadoop-powered data lakes have not been successful.
Organizing data across any data type or source, with ongoing contribution and collaboration on limitless attributes, will be the new rallying cry for frustrated business teams as it describes a state of continuous IA (Information Augmentation) that enterprises want to achieve before they can even consider AI as a potential next step.
Data-driven organizations will expect to measure outcomes: While being data-driven continuous to be vogue, companies have had surprisingly little in the way of measurable, quantifiable outcomes for their investments in technologies and tools. Certain Total Cost of Ownership (TCO) metrics such as savings realized from switching to cloud vs.on-premises are obvious, but there hasn’t been an obvious and clear direct correlation between data management, BI, analytics and the upcoming wave of AI investments. What’s missing is a way of capturing a historical baseline, and comparing it to improvements in data quality, generated insights, and resulting outcomes stemming from actions taken.
Much of this can be attributed to the continued disconnect between analytical environments such as data warehouses, data lakes and alike where insights are generated, and operational applications, where business execution actually takes place. Today’s Modern Data Management Platforms as a Service (PaaS) seamlessly power data-driven applications which are both analytical and operational, delivering contextual, goal-based insights and actions, which are specific and measurable, allowing outcomes to be correlated, leading to that Return on Investment (ROI) Holy Grail, and forming a foundation for machine learning to drive continuous improvement. As an added bonus, multitenant Modern Data Management PaaS in the Cloud, will also begin to provide industry comparables, so companies can finally understand how they rank relative to their peers.
Multi-cloud will be the new normal: With the Cloud Infrastructure as a Service (IaaS) wars heating up, players such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure continue to attempt to outdo each other on all vectors including capabilities, price, and service. With fears of being “Amazoned,” some retailers have even adopted a non-AWS Cloud policy. For most, however, it’s about efficiency and cost. Multi-cloud means choice and the opportunity to leverage the best technology for the business challenges they face. Unfortunately, multi-cloud is not realistic for all, only the largest corporations who have the IT teams and expertise to research and test out the latest and greatest from multiple providers. Even those mega-corporations are finding that they have to stick to a single IaaS Cloud partner to focus their efforts.
Today’s Modern Data Management PaaS are naturally multi-cloud, seamlessly keeping up with the best components and services that solve business problems. Acting as technology portfolio managers for large and small companies who want to focus on nimble and agile business execution, these platforms are democratizing the notion of multi-cloud for everyone’s benefit.
Companies will execute offensive data-driven strategies, and should expect to get defense for free: Effective May 25, 2018, the European General Data Protection Regulation (GDPR) will force organizations to meet a standard of managing data that many won’t be able to fulfill. They must evaluate how they’re collecting, storing, updating, and purging customer data across all functional areas and operational applications, to support “the right to be forgotten.” And they must make sure they continue to have valid consent to engage with the customer and capture their data.
Meeting regulations such as GDPR often comes at a high price of doing business not just for European companies, but multinational corporations in an increasingly global landscape. Companies seeking quick fixes often end up licensing specialized technology to meet such regulations, while others resign themselves to paying fines that may be levied, as they determine that the cost to fix their data outweighs the penalties that might be incurred. With security and data breaches also making high-profile headlines in 2017, it’s become an increasingly tough environment in which to do business, as the very data that companies have collected in the hopes of executing offensive data-driven strategies, weighs on them heavily, crushing their ability to be agile.
As previously outlined, organizing data for the benefit of machine learning, or other initiatives results in clean, reliable data that is connected and forms a trusted foundation. A natural byproduct is a defensive data strategy, with the ability to meet regulations such as GDPR, and to ensure compliant, secure access by all parties to sensitive data. This is an amazing two-fer from which regulatory teams and CDOs can both benefit.
Whatever the industry or business need, organizing data in 2018 should be a top priority for companies big and small.
- Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Contributors: Post a Blog | Ask a Question
- Follow us: @DataScienceCtrl | @AnalyticBridge
- Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics
- What is Data Science? 24 Fundamental Articles Answering This Question
- Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
- Advanced Machine Learning with Basic Excel