Bryan Plaster, VP Customer Success At Trifacta, tells us about how to break out of data siloes and use Centres of Excellence to revolutionise R&D
New advances in electronic medical records (EMRs), wearable technology, and other Internet of Things devices are creating huge quantities of new data sources for researchers to leverage. Pharmaceutical organisations already have the potential to acquire an impressive amount of data as they grow and mature, which can often reach the petabyte range.
There’s no lack of data to collect in healthcare, and yet the time required to analyse it can be challenging for pharma researchers. In particular, because of the complexity and often-siloed nature of data they are dealing with. The largest life science and pharmaceutical companies today have existed for decades, endured several acquisitions, and have several data silos to show for it – getting the data needed for any given initiative can require navigating across hundreds of data silos.
From an R&D perspective, this leaves researchers and scientists struggling to gather the right data across data silos or forgoing historic data altogether. Part of that challenge also derives from the variance of data standards and the wide-ranging types of data that have been collected – from internal clinical trial data to external healthcare data.
To combat their own data silo challenges, we’ve seen many companies such as GlaxoSmithKline investing in dedicated analytics centre of excellence (CoE) to centralise and standardise historical data, and better leverage it for drug development. Concentrating big data knowledge and best practices allows organisations to better enable business users with the right tools while demonstrating realistic and attainable goals.
Technical experts are essential to a successful CoE, but it is involvement from business users that is ultimately needed as part of the initiative. Creating a culture of data inclusion within the organisation allows as many users as possible to drive more business value and obtain a higher ROI. Technical experts often don’t have the capacity to understand the inner workings of different departments and business units as closely as the employees working in them; business users must be able to drive their own data projects to derive the best insights, business users don’t want to wait on a technical project in the standard backlog model for obtaining their data.
Best practices for a CoE show that this business-and-IT-collaboration are formed from the maturing set of processes that grow at the same rate as your initiative success. The top CoEs are based on a standard called Capability and Maturity Model, which allows for areas of focus with both business and IT to follow a maturation path of increasing levels. For example, a data modernisation initiative could be divided into areas of architecture, methodology, organisation and use cases, and each area would be rated at a different level to assess and show this collaboration.
In a CoE model, the convergence of business and IT often happens during the data preparation process. Even after data has been made accessible, business users need to be able to prepare it for their own specific needs, such as filtering out information that they don’t require or creating new columns to fit certain standards. However, this is a challenging task; the data produced within different areas of the organisation can be recorded in different formats and collected through different interfaces. Consolidation is timely and the accuracy is at risk, particularly when using manual tools.
Our research found that Excel continues to be the primary tool for data preparation – 37 per cent of data analysts and 30 per cent of IT professionals use it more than other tools to prepare data. The reality is that 92 per cent of data analysts would choose to add value by focusing on another type of analytic activity rather than data preparation, yet 65 per cent are spending at least half their time preparing data for analytic use.
The use of new technologies to streamline data preparation can have a huge impact. This is important because it’s the first step in the process of getting the data ready for the machine learning (ML) algorithms. ML learns from the data it is provided with, and the more clean, high quality data it has access to, the better it can learn and produce rich results – thus, providing robust data for researchers. Machine learning can also serve as a function to provide faster access and clearer representations of complex data to accelerate clinical researchers’ understanding of what’s in the data and how it can be used.
If pharmaceutical companies are unable to prepare the volumes of data required by the ML algorithms, the models may not be robust enough to provide accurate results, and therefore delay them further from meeting necessary deadlines.
However, the benefits of the technology are overshadowed by the current skills shortage of people trained in data science and machine learning. Without the right resources and team structures in place, pharmaceutical companies will further struggle to advance their efforts. They must therefore empower the individuals that are familiar with the data and how it relates to business objectives to transform it themselves to drive their work. Those that need to be closely aligned with the data should be able to access it and use it, working towards these objectives collectively.
In order to both accelerate time spent preparing data and meet the architectural needs of the CoE, pharmaceutical companies are looking to data preparation platforms. The intuitive experience meets the needs of business users, while allowing for governance and oversight from IT. Leading pharma customers leverage Trifacta’s data preparation platform to reduce the time required to structure, clean, enrich, validate, and publish the data required for use in the organisation; and enable collaboration between the CoE and business units. Top industry analysts refer to data preparation as the combination of faster time to insight and improved thrust. The catalyst is the human-computer interaction that learns with you (machine learning) that is important for a scalable data management strategy.
Image source: Gartner: Market Guide for Data Preparation, December 2017
How important is the data preparation process? To put this into perspective, Forrester reports that data analysts spend up to 80 per cent time of their time preparing data for analysis; but since implementing the Trifacta data preparation solution within its CoE, GSK has been able to dramatically speed this process up within its CoE. The result of which has been reduced time spent on clinical trial design. GSK is now moving even closer to its vision of reducing drug development time in half.
With the race to be the first-to-market with revolutionary new drugs, along with the plethora of data available to life science and pharma organisations also flourishing, building or optimising an analytics CoE is a good strategy for competitive advantage. But collecting and preparing the data ready for analysis is only the first step in the process towards a CoE. Only by expanding access to the wider business users can the organisation generate the best possible value. With data in their hands, organisations should be excited to see what could be achieved.
Recommendations on how we should use AI, genomics and medtech in the NHS – click here for 98 pages to guide us to the future. ‘The greatest challenge is the culture shift in learning and innovation, with a willingness to embrace technology for system-wide improvement. An ambitious drive “towards the NHS becoming the world’s largest learning organisation”’.
You're the expert! Write for The Engine or share your articles, papers and researchAdd your content
Add your content
Sign up for Ignition, our regular, ideas-packed newsletter