What Is This Protocol About? (Introduction)
- This protocol is for a computational pipeline used to discover biomedical knowledge through the use of biomedical knowledge graphs (BKGs).
- It leverages graph learning techniques and artificial intelligence (AI) to mine and interpret data.
- We demonstrate how the protocol can be used for drug repurposing (finding new uses for existing drugs) specifically for Parkinson’s disease (PD).
What Are Biomedical Knowledge Graphs (BKGs)?
- BKGs are networks that store vast amounts of biomedical information, like how diseases, drugs, genes, and symptoms are related.
- They help to organize and visualize complex data in a way that makes it easier to find connections and patterns.
- By using AI and graph learning, researchers can discover new insights from these connections.
Steps to Set Up the Protocol
- 1. Data Collection: Start by downloading the required data files from a repository on GitHub.
- 2. Install Python and Necessary Packages: Use Anaconda to set up a Python environment and install packages like NumPy, pandas, PyTorch, and DGL-KE for graph learning.
- 3. Data Preprocessing: The BKG data must be processed to extract “triplets” (connections between entities like drugs, diseases, and genes). This is done using specific Python scripts.
How Does the Knowledge Graph Embedding Work?
- Embedding is the process of converting the relationships in a knowledge graph into machine-readable “vectors” or numerical representations.
- This helps AI algorithms to “understand” and “learn” from the graph data.
- The protocol uses four different embedding models: TransE, TransR, ComplEx, and DistMult. Each model is trained to represent the data in different ways.
Training the Models
- After preprocessing the data, each embedding model is trained on the data using a command line interface.
- The training process involves adjusting the model’s internal parameters (like learning rate and hidden dimensions) to improve its accuracy.
- The training might take a few hours, depending on the complexity of the data and the power of the computer you are using.
Using the Model for Drug Repurposing
- The main task of this protocol is to identify existing drugs that could be repurposed to treat Parkinson’s disease (PD).
- The pipeline predicts drugs that might treat or alleviate PD by analyzing the relationships between drugs and diseases in the graph.
- Once the drugs are predicted, they are ranked based on their potential to treat PD, and the top candidates are identified.
Visualizing Results
- After generating drug repurposing predictions, the protocol visualizes the connections between PD and predicted drug candidates in a “contextual subnetwork”.
- This helps to see how each drug is related to PD through different entities like genes and other diseases.
- The network is visualized using a graph database called Neo4j, which helps to display the shortest paths between PD and the drug candidates.
Expected Outcomes
- By following these steps, researchers can generate new knowledge, like identifying potential new treatments for Parkinson’s disease.
- The process can be adapted to other diseases or biomedical tasks, such as predicting disease-risk genes or identifying drug-drug interactions.
Limitations
- The knowledge graph used in this protocol (iBKH) is not complete and may lack certain types of biomedical data (like proteins or mutations).
- The accuracy of the results depends on the quality of the data and the performance of the graph learning algorithms.
- Further research is needed to incorporate additional models and data to improve the pipeline.