Protocol to implement a computational pipeline for biomedical discovery based on a biomedical knowledge graph Michael Levin Research Paper Summary

PRINT ENGLISH BIOELECTRICITY GUIDE

PRINT CHINESE BIOELECTRICITY GUIDE


What Is This Protocol About? (Introduction)

  • This protocol is for a computational pipeline used to discover biomedical knowledge through the use of biomedical knowledge graphs (BKGs).
  • It leverages graph learning techniques and artificial intelligence (AI) to mine and interpret data.
  • We demonstrate how the protocol can be used for drug repurposing (finding new uses for existing drugs) specifically for Parkinson’s disease (PD).

What Are Biomedical Knowledge Graphs (BKGs)?

  • BKGs are networks that store vast amounts of biomedical information, like how diseases, drugs, genes, and symptoms are related.
  • They help to organize and visualize complex data in a way that makes it easier to find connections and patterns.
  • By using AI and graph learning, researchers can discover new insights from these connections.

Steps to Set Up the Protocol

  • 1. Data Collection: Start by downloading the required data files from a repository on GitHub.
  • 2. Install Python and Necessary Packages: Use Anaconda to set up a Python environment and install packages like NumPy, pandas, PyTorch, and DGL-KE for graph learning.
  • 3. Data Preprocessing: The BKG data must be processed to extract “triplets” (connections between entities like drugs, diseases, and genes). This is done using specific Python scripts.

How Does the Knowledge Graph Embedding Work?

  • Embedding is the process of converting the relationships in a knowledge graph into machine-readable “vectors” or numerical representations.
  • This helps AI algorithms to “understand” and “learn” from the graph data.
  • The protocol uses four different embedding models: TransE, TransR, ComplEx, and DistMult. Each model is trained to represent the data in different ways.

Training the Models

  • After preprocessing the data, each embedding model is trained on the data using a command line interface.
  • The training process involves adjusting the model’s internal parameters (like learning rate and hidden dimensions) to improve its accuracy.
  • The training might take a few hours, depending on the complexity of the data and the power of the computer you are using.

Using the Model for Drug Repurposing

  • The main task of this protocol is to identify existing drugs that could be repurposed to treat Parkinson’s disease (PD).
  • The pipeline predicts drugs that might treat or alleviate PD by analyzing the relationships between drugs and diseases in the graph.
  • Once the drugs are predicted, they are ranked based on their potential to treat PD, and the top candidates are identified.

Visualizing Results

  • After generating drug repurposing predictions, the protocol visualizes the connections between PD and predicted drug candidates in a “contextual subnetwork”.
  • This helps to see how each drug is related to PD through different entities like genes and other diseases.
  • The network is visualized using a graph database called Neo4j, which helps to display the shortest paths between PD and the drug candidates.

Expected Outcomes

  • By following these steps, researchers can generate new knowledge, like identifying potential new treatments for Parkinson’s disease.
  • The process can be adapted to other diseases or biomedical tasks, such as predicting disease-risk genes or identifying drug-drug interactions.

Limitations

  • The knowledge graph used in this protocol (iBKH) is not complete and may lack certain types of biomedical data (like proteins or mutations).
  • The accuracy of the results depends on the quality of the data and the performance of the graph learning algorithms.
  • Further research is needed to incorporate additional models and data to improve the pipeline.

常见问题 (引言)

  • 该协议用于通过使用生物医学知识图谱 (BKGs) 实现计算管道,以发现生物医学知识。
  • 它利用图学习技术和人工智能 (AI) 来挖掘和解释数据。
  • 我们展示了如何将此协议用于帕金森病 (PD) 的药物再利用(寻找现有药物的新用途)。

什么是生物医学知识图谱 (BKGs)?

  • BKGs 是存储大量生物医学信息的网络,例如疾病、药物、基因和症状之间的关系。
  • 它们有助于以一种更容易发现连接和模式的方式来组织和可视化复杂数据。
  • 通过使用人工智能和图学习,研究人员可以从这些连接中发现新的见解。

如何设置协议?

  • 1. 数据收集:首先从 GitHub 上的仓库下载所需的数据文件。
  • 2. 安装 Python 和必要的软件包:使用 Anaconda 设置 Python 环境,并安装诸如 NumPy、pandas、PyTorch 和 DGL-KE 等图学习软件包。
  • 3. 数据预处理:必须处理 BKG 数据以提取“三元组”(如药物、疾病和基因之间的连接)。这可以通过特定的 Python 脚本完成。

知识图谱嵌入是如何工作的?

  • 嵌入是将知识图谱中的关系转换为机器可读的“向量”或数值表示的过程。
  • 这有助于人工智能算法“理解”和“学习”图数据。
  • 协议使用了四种不同的嵌入模型:TransE、TransR、ComplEx 和 DistMult。每个模型以不同的方式训练数据表示。

如何训练模型?

  • 在预处理数据后,使用命令行界面训练每个嵌入模型。
  • 训练过程包括调整模型的内部参数(如学习率和隐藏维度)以提高其准确性。
  • 训练可能需要几个小时,具体取决于数据的复杂性和使用的计算机性能。

如何用于药物再利用?

  • 该协议的主要任务是识别可能被重新用于治疗帕金森病(PD)的现有药物。
  • 管道通过分析药物与疾病之间的关系,预测可能治疗或缓解 PD 的药物。
  • 预测后,这些药物按其治疗 PD 的潜力进行排名,并识别出排名靠前的候选药物。

结果可视化

  • 在生成药物再利用预测结果后,协议通过“上下文子网络”可视化帕金森病与预测的药物候选之间的连接。
  • 这有助于看到每种药物如何通过基因和其他疾病与 PD 相关联。
  • 网络通过名为 Neo4j 的图形数据库进行可视化,有助于显示 PD 与药物候选之间的最短路径。

预期结果

  • 按照这些步骤,研究人员可以生成新的知识,例如识别可能的新帕金森病治疗方法。
  • 该过程也可以适应其他疾病或生物医学任务,如预测疾病风险基因或识别药物间相互作用。

局限性

  • 本协议使用的知识图谱(iBKH)并不完整,可能缺少某些类型的生物医学数据(如蛋白质或突变)。
  • 结果的准确性取决于数据的质量和图学习算法的表现。
  • 需要进一步的研究来整合更多的模型和数据,以提高管道的性能。