Commentary
Article
As the data management landscape continues to shift and change with AI and technological advancements all around us, it is essential to maintain good data practices and ensure data integrity in pharmaceutical research and development.
Data integrity is an ongoing concern across all research and development (R&D) organizations, no matter what part of the research lifecycle they’re navigating. These concerns extend beyond the potential for delayed timelines or cost overruns into something bigger: Establishing a culture of quality, ensuring product efficacy and patient safety, and being a trusted brand, partner, or provider.
Prioritizing Data Integrity in the Lab
Good data practices throughout the R&D process can positively impact data integrity in the lab. Companies must be able to defend the fidelity and confidentiality of all records and data generated throughout a product’s entire lifecycle, starting with the earliest points in research, including raw data, metadata, and transformed data. To do this, companies must have the right processes and technologies in place to ensure proper:
These factors—each challenging in their own right—are all intertwined, adding to the complexity of upholding good data practices in the modern lab.
A Shifting Data Management Landscape
As R&D organizations digitize their data to make analytics at scale possible, best practices for data management must also evolve. Teams must have clear strategies for identifying and mitigating threats to data integrity, including technological, managerial, and external risks. However, this is no small task. In fact, in the realm of Pharmaceuticals, the FDA reports increasing data integrity violations in recent years.1
Data integrity is at risk in many cases because the complexity of R&D data, processes, and technologies present numerous opportunities for good data practices to go awry. The most common type of warnings and violations cited by the FDA include data loss, missing metadata, non-contemporaneous collection or backdating, data deletion and copying, sample elimination or reprocessing, poorly investigated out-of-specification results, data access and security issues, and inadequate or disabled audit trails.2 Missteps like these at any point in the R&D process can impact the overall research validity.
Data integrity and security breaches could potentially lead to incorrect or non-recreatable research results, raising implications on patient safety and product efficacy, or generating violations that might cause a drug to be rejected at submission or pulled from the market later.3
Multimodal R&D
Companies hoping to drive innovation are diversifying their R&D efforts and working across different areas of science with novel modalities. As a result, data are pouring from wide-ranging sources via different means and in different formats. An organization or institution may have several different internal research groups collecting data from thousands of pieces of specialty equipment or instruments; in parallel, it could also be undertaking complex post-acquisition or legacy-data migration activities, all while working with multiple external CROs who have their own distinct systems and processes.
All these different data come from teams that work not only across different modalities and specialty areas of science, but also across different locations globally, each with its own compliance standards and regulations. This incredible volume and diversity of multimodal R&D data create lab integration and data management challenges that can risk compromising data integrity and security. Many companies are struggling to keep pace with a vast volume of diverse data and metadata needed to inform decision making throughout the R&D process.
Collaboration
Ensuring the success of R&D at scale means improving data flow between research groups so they can build from their collective knowledge. The importance of data sharing in advancing science was recently underscored by the National Institutes for Health, which established new 2023 data management and sharing policies to confirm findings, encourage reuse, and spur innovation.4
Whether it’s chemists and biologists collaborating on chemically modified biologics or internal and external partners working on projects across modalities and diseases, teamwork is more important than ever. Unfortunately, it’s not always easy. Many R&D groups, who have long worked in relative isolation, are now required to collaborate and share data, which requires shifts in mindset and culture. It also requires a governance and execution shift. Bespoke and insulated research teams don’t have the systems and processes in place to share and hand off well-annotated data while at the same time controlling access, tracking changes, and ensuring good data practices are followed by all participants and collaborators.
For many companies, it’s hard to facilitate efficient and secure data sharing that doesn’t compromise data integrity. Even the most erudite collaborators have approaches to interaction with instruments, software, workflows, and data types that don’t align with each other; this complicates collaboration. Structured and unstructured data end up scattered in multiple repositories and across different mediums rather than within a secure, centralized, standardized data pool that appropriate collaborators can access and that leverages a well-defined data governance framework.
Data sharing challenges are growing so common that they’ve prompted calls to establish better data management standards. One well-known example is the Findable, Accessible, Interoperable and Reusable (FAIR) guiding principles for scientific data management, which promote the adoption of technology and processes by both humans and machines alike.5 Becoming FAIR complaint requires changes in format, model, and storage of data, as well as the ways that instruments, software, and systems are integrated. While this can seem overwhelming, the changes can be done incrementally; it’s not an all-or-nothing proposition. Whether a company is building a comprehensive FAIR-compliant informatics ecosystem or adopting a data analysis and graphing solution that embraces FAIR data principles, moving toward implementing FAIR-aligned methods can pay dividends in time savings, reproducibility of research, improved knowledge sharing, and AI-readiness.
Artificial Intelligence
As AI arrives in R&D, organizations and institutions will need data infrastructures to capture and manage the proprietary data that will differentiate their research in a world where AI is everywhere. For many universities and health companies, becoming AI-ready means first adopting technology and process changes to support exponential growth in data volumes, elimination of data silos, integration of bespoke software and systems, and normalization of data.
The ultimate goal is that any data created and captured throughout the R&D process will be trustworthy, well-structured, correlated, shareable, and model-ready. While achieving these aligned data standards is uniquely challenging in scientific R&D because of the complexity of the workflows, data types, software, and systems, it is, nonetheless, essential. Global compliance regulations are currently being updated to guide the use of AI and machine learning in medical and general research.6,7,8
In March 2024, the European Union passed an overarching Artificial Intelligence Act. This landmark law aims to protect human health, safety, and fundamental rights as AI is increasingly relied upon for innovation across a broad spectrum of industries, academia, government, and civil organizations.9 Now is the time for companies to ensure that their existing systems and processes support the regulatory and ethical challenges of using AI in research, including assurance of data integrity, security, traceability, and bias limitation.
Good Data Practices
Alignment of data management and integrity are vital to long-term research success and preparation for the automated, connected, and collaborative future of research. Fortunately, today's scientists have a wide range of tools to easily manage, search, and visualize their R&D data, with the future being led by solutions that can unite all those applications that produce and analyze data within one secure data-management platform.