Convergent Evolution: The Co-Revolution of AI & Biology with Prof Michael Levin & Dr.Leo Pio Lopez – YouTube Bioelectricity Podcast Notes

PRINT ENGLISH BIOELECTRICITY GUIDE

PRINT CHINESE BIOELECTRICITY GUIDE


Introduction: AI & Biology Convergence

  • The podcast explores the intersection of AI, biology, computer science, and philosophy, focusing on a recent paper by Levin and Lopez.
  • The paper integrates multiple biological datasets (genes, drugs, diseases) into a unified network model, demonstrating a novel link between Gaba and cancer.

Multimodal Data Integration and Network Embedding

  • Challenge: Integrating diverse “omics” data (gene, drug, disease interactions) into a unified representation.
  • Solution: Developed a “universal multi-layer network” approach, enabling the combination of various data types and scales. Existing, independent datasets for Genes (protein interactions), and drugs (combinations of uses), and Disease (symptom commonality) were reconciled into a single, cross-referencable dataset.
  • Network Embedding: Translated network nodes (genes, drugs, diseases) into vectors using a similarity measure based on “random walk with restart.”
  • Random Walk with Restart: An algorithm that explores the network from a seed node, creating a probability distribution representing node similarity. This distills relationships (connections, probabilities of joint walks) from multiple datasources down into simplified repreesntation of data proximities.
  • Machine Learning Application: Enabled machine learning on network data, which usually requires vector representations, facilitating tasks like link prediction.
  • Process: a link is chosen and walks (of differnet lengths?) happen at randon. those that involve crossing the selected link (in this sense ‘linking’ them) causes them to get pulled closed, such that links become predictive of path similarities.
  • Training and validation: 70% for training, 30% hold-out data for making novel link predictions between Nodes (e.g. ‘what is predicted drug for unknown sickness’)

Gaba-Cancer Link and Validation

  • Prediction: The model predicted a significant link between the neurotransmitter Gaba (specifically the Gaba A receptor) and cancer (melanoma).
  • Mechanism: Gaba A is a chloride channel that regulates cell electrical state. Perturbing Gaba signaling can disrupt cell communication.
  • Experimental Validation: Using memantine (a Gaba A agonist) on melanocytes (pigment cells) induced a melanoma-like phenotype.
  • Unlike usual tumors: *No primary tumor.* All melanocytes converted at the same time. This demonstrates the cancer phenotype can result *without genetic damage*, resulting instead solely by changing electrical networking capability.
  • Significance: Demonstrates cancer initiation *without genetic damage*, only by altering cell-cell electrical communication (physiological change, *not* genetic.)

AI for Biology: Black Box Modeling and Data Limitations

  • Grand Challenge of Biology: Understanding complex biological interactions (the “black box”). Current drugs act *when* and *after* mechanisms is known; ML seeks *preemptive*, mechanism-agnostic methods.
  • Black box challenge: most interventions rely on knowing *exactly* where in a mechanisms one wants to act, as opposed to a *model* predicting interactions and cascading causal paths (downstream mechanisms).
  • Emergent Models: Some AI models (like Evo) learn higher-order concepts directly from raw data (e.g., DNA), potentially showing less “inductive bias”.
  • Levin: “Our [paper/approach] is Data-Driven” … aggregating a ton of ‘BigData’, and seeking correlations/predictions to study further and extract “Theory” out of them (Data first, theories second).
  • Bootstrapping Knowledge: AI-driven predictions, validated by experiments, could iteratively refine biological knowledge.
  • Data Limitation: Lack of standardized, easily-integratable and outcome-focused data is a major hurdle, as opposed to the availability of a ton of small, micro, data of individual components (such as single cells)
  • Need data focused on larger level: Beyond just having raw DNA and protein interactions. Biology lack large scale collections that document overall body states, and how it changes in a body (example, the changes in location of your *Bones*, or amount of Bones), in respons eto various experimental changes.
  • ‘bioinformatics of *Shape*.’ describes anatomical+functional outcome measurements, in response to some change/perturbation to get cause->effects. Example, given *This* stimulis -> This change. Current amount of public data only “in the hundreds.”
  • Bioelectrical Data Missing: Current datasets largely lack crucial bioelectrical data, essential for cell communication.
  • Publication challenges: copying methodology and introduction paragraphs from papers you’ve already published is consider self-plagiarism, working *against* standardization (which needs consistency and no variation for parsing reasons)

Multiscale Competency, Intelligence, and Communication

  • Collective Intelligence: Biology exhibits collective intelligence across scales, with problem-solving (not just complexity) at each level (pathways, cells, tissues, organs).
  • Pathways Alone Have learning: habituation sensitization and pavlonian learning
  • Beyond Micromanagement: AI’s role may be less about precise, “bottom-up” pathway manipulation and more about “top-down” communication with biological systems. “Train a System” instead of ‘tweaking/clamping/forcing/fighting/hacking’ it.
  • Cognitive tools will be powerful in helping comunicate to intelligent (cognitively rich, problem-sovling, but unintellegent *like us*) systems, than any attempt to micro-manage the whole network.
  • Dogs vs Horses training analogies: successful use of ‘non-biomedical’ intelligence. Not micromanaging and fiddling with biology, but *communication/learning* via training, instead of ‘understanding what all neurons do’.
  • Theory of Mind: AI could help build a “theory of mind” of biological systems, understanding their “proto-cognitive” properties (goals, preferences). This would revolutionize how we study them.
  • No goal is “better” micromanging molecules. Real power in using these (learning algorithms?) as “Communication devices.”,” *Translators*”, “finding, unbiassed patterns and becom[ing] a translation tool”.

Data Limited Models vs Humans

  • Wellness prediction challenge (like next tokens), taking microlevel (“your strava runs” and food intake etc), and subjective well-being measure, can make wellness predictions difficult:
    • Paradox: delivering wellness predictions (of being in good moods or etc), might make someone immediately depressed.
    • What, actually are we optizing: we “don’t actually sure what to opimize” because long-run or short run, one value conflict. “multi-reward designs”. “scale reward not enough”, at minim, must take “long run” and short run trade-off”
  • Human limitations: “split brain patients” and multi-agent in the “the same person”: shows there “there exists multiple different opinions [..] and sometimes conflicting beliefs”. Not one solid person, but muliple and varied.
  • Current biological layers and computing: each computationallayer assumes other ones *won’t fail or vary* with *precision.* It makes it fragile. This assumption of no degradation in layers below limits power to handle dynamic, adaptive changes.

Implications for AI Architecture and Alignment

  • Evolution created solving machines *in various spaces*, like behavioral, etc.
  • Multiscale Architectures: Current AI (like transformers) often lacks the multiscale architecture of biological systems. Biological reliability requires *interpretion.* This gives rise to more “Problem solving” at various spaces.
  • Bio-Inspired AI: Future AI may need to embrace unreliable components, redundancy, and communication between layers with different competencies.
    • Cancer and the robots: “White robots don’t get Cancer.”. The AI itself can “go off and go do things”.
  • Giving up control and modularity: in Bioology the body wants easy controls of modules to act on, BUT also, an opposing drive makes things too “easy to control” and then you become easily ‘hackable’, from diseases (etc).
  • There is no way for us to reach an “*alignement*”, becuase *we (humans)* have and display fundamental disagrement across time and location; a “a single alignment goal, “doesnt exit for humans””
  • Robustness Through Noise: Biological systems thrive despite noise and defects; AI might learn from this resilience (e.g., Dropout in neural networks is a step).
    • Self modeling helps (be *more simple and easy*). Helps “robusteness,” “generailizations” “other” benefits, when you have more jobs (“multitasking”), and that helps you stay relevant despite small noices or small changes (“augemntation”)
  • Alignment Challenges: The concept of “alignment” is problematic due to inherent disagreements within humans and between different groups. Moving towards AI systems *more organic, and more integreated*.

Future of Digital Life, Agency, and Goals

  • Digital Life: Exploring the potential (and risks) of digital life and an “ecology of AI” is crucial.
  • The role of AI tools: to recognize communication/relationship with diverse forms of intelligences all around. Prosthetics, recognition.
  • Prosthetics and Outsourcing: Humans have *already* outsourced significant aspects of life, and AI will continue this trend, raising ethical questions about what it means for how our lives are defined by, versus technology (will power, relatioship guiders..).
    • This already exists! “tothbrush, education,” are outsourced. “The idea of having giving up our core competencices [..] this alreayd happened”.
  • Painting positive futures: We have to get clearer, as a group. Not juts a negative-voiding goals (current “ai safty”): “Everybody doing. Everyone specify.. “we are closter to to achieve.”
  • Agency and Goals: Understanding the origins and management of goals in collective systems is a fundamental and possibly existential challenge.

导言:人工智能与生物学的融合

  • 本次播客探讨了人工智能、生物学、计算机科学和哲学的交叉领域,重点关注莱文和洛佩兹最近发表的一篇论文。
  • 该论文将多个生物数据集(基因、药物、疾病)整合到一个统一的网络模型中,揭示了Gaba与癌症之间的新联系。

多模态数据集成和网络嵌入

  • 挑战:将不同的“组学”数据(基因、药物、疾病相互作用)集成到一个统一的表示中。
  • 解决方案:开发了一种“通用多层网络”方法,能够结合各种数据类型和尺度。 将现有的、独立的基因数据集(蛋白质相互作用)、药物数据集(使用组合)和疾病数据集(症状共性)整合到一个单一的、可交叉引用的数据集中。
  • 网络嵌入:使用基于“带重启的随机游走”的相似性度量,将网络节点(基因、药物、疾病)转换为向量。
  • 带重启的随机游走:一种从种子节点探索网络的算法,创建一个表示节点相似性的概率分布。这从多个数据源中提取关系(连接、联合游走的概率),简化为数据接近度的表示。
  • 机器学习应用:支持在通常需要向量表示的网络数据上进行机器学习,从而促进链接预测等任务。
  • 过程:选择一个链接,并进行不同长度的随机游走。 那些涉及穿过所选链接的(在这个意义上“连接”它们)导致它们被拉近,使得链接可以预测路径相似性。
  • 训练和验证:70%用于训练,30%保留数据用于在节点之间进行新的链接预测(例如“预测未知疾病的药物”)。

Gaba-癌症关联及其验证

  • 预测:该模型预测神经递质Gaba(特别是Gaba A受体)与癌症(黑色素瘤)之间存在显著联系。
  • 机制:Gaba A是一种氯离子通道,调节细胞电状态。 干扰Gaba信号传导会破坏细胞通讯。
  • 实验验证:在黑色素细胞(色素细胞)上使用美金刚(一种Gaba A激动剂)诱导了类似黑色素瘤的表型。
  • 不同于一般的肿瘤: *没有原发肿瘤*。 所有黑色素细胞同时转化。 这表明癌症表型可以*不通过基因损伤*而产生,而是仅通过改变电网络能力而产生。
  • 意义:证明癌症的发生*不需要基因损伤*,仅通过改变细胞间的电通讯(生理变化,*而不是*基因变化)即可。

用于生物学的人工智能:黑箱建模和数据局限性

  • 生物学的巨大挑战:理解复杂的生物相互作用(“黑箱”)。目前的药物*在*机制已知时和*之后*起作用; 机器学习寻求*先发制人的*、与机制无关的方法。
  • 黑箱挑战:大多数干预措施依赖于知道*确切地*想在机制的哪个位置起作用,而不是*模型*预测相互作用和级联因果路径(下游机制)。
  • 涌现模型:一些人工智能模型(如Evo)直接从原始数据(如DNA)中学习高阶概念,可能显示出较少的“归纳偏置”。
  • 莱文:“我们的[论文/方法]是数据驱动的”… 聚合大量的“大数据”,并寻找相关性/预测以进一步研究并从中提取“理论”(数据优先,理论其次)。
  • 知识的引导:由人工智能驱动的预测,经实验验证,可以迭代地改进生物学知识。
  • 数据局限性:缺乏标准化的、易于集成和以结果为导向的数据是一个主要障碍,而不是大量小的、微观的、单个组件的数据(如单个细胞)的可用性。
  • 需要关注更高级别的数据:不仅仅是原始的DNA和蛋白质相互作用。 生物学缺乏记录整体身体状态以及身体如何变化(例如,你的*骨骼*的位置或骨骼数量的变化)的大规模集合,以响应各种实验变化。
  • “*形态*的生物信息学” 描述了解剖+功能结果测量,以响应一些变化/扰动,以获得因果关系。 例如,给定*这个*刺激 -> 这个变化。 目前的公共数据量只有“几百个”。
  • 生物电数据缺失:当前的数据集在很大程度上缺乏关键的生物电数据,而生物电数据对于细胞通讯至关重要。
  • 发表挑战:从您已经发表的论文中复制方法论和引言段落被认为是自我剽窃,这*不利于*标准化(标准化需要一致性和无变化,以便于解析)。

多尺度能力、智能和沟通

  • 集体智能:生物学在各个尺度上都表现出集体智能,每个层次(通路、细胞、组织、器官)都有问题解决(不仅仅是复杂性)。
  • 通路本身具有学习能力:习惯化、敏化和巴甫洛夫学习。
  • 超越微观管理:人工智能的作用可能不是精确的、“自下而上”的通路操纵,而是与生物系统进行“自上而下”的沟通。 “训练一个系统”而不是“调整/钳制/强迫/对抗/黑客攻击”它。
  • 认知工具将在帮助与智能(认知丰富、解决问题,但像我们一样不智能)系统进行沟通方面发挥强大作用,而不是任何试图微观管理整个网络的尝试。
  • 狗与马的训练类比:成功使用“非生物医学”智能。 不是微观管理和摆弄生物学,而是通过训练进行*沟通/学习*,而不是“理解所有神经元的作用”。
  • 心智理论:人工智能可以帮助建立生物系统的“心智理论”,了解它们的“原始认知”特性(目标、偏好)。 这将彻底改变我们研究它们的方式。
  • 没有目标比微观管理分子“更好”。 使用这些(学习算法?)作为“沟通设备”、“*翻译器*”、“发现无偏见的模式并成为翻译工具”的真正力量。

数据受限模型 vs 人类

  • 健康预测挑战(如下一个标记),采用微观层面(“你的Strava跑步”和食物摄入等)和主观幸福感测量,会使健康预测变得困难:
    • 悖论:提供健康预测(心情好等)可能会立即使某人感到沮丧。
    • 实际上,我们在优化什么:我们“实际上不确定要优化什么”,因为长期或短期,一个价值观冲突。 “多奖励设计”。 “规模奖励不够”,至少,必须考虑“长期”和短期权衡”
  • 人的局限性:“裂脑患者”和“同一个人”中的多智能体:表明“存在多个不同的观点[…]并且有时是相互冲突的信念”。 不是一个坚实的人,而是多个和多样化的。
  • 当前的生物层和计算:每个计算层都假定其他层*不会失败或变化*,并且具有*精度*。 这使得它很脆弱。 这种对下层无退化的假设限制了处理动态、自适应变化的能力。

对人工智能架构和对齐的影响

  • 进化在各种空间中创造了解决问题的机器,例如行为等。
  • 多尺度架构:当前的人工智能(如Transformer)通常缺乏生物系统的多尺度架构。 生物可靠性需要*解释*。 这会在各种空间中产生更多的“问题解决”。
  • 仿生人工智能:未来的人工智能可能需要拥抱不可靠的组件、冗余以及具有不同能力的层之间的通信。
    • 癌症和机器人:“白色机器人不会得癌症”。 人工智能本身可以“离开并去做事”。
  • 放弃控制和模块化:在生物学中,身体希望模块易于控制,但同时,一种相反的驱动力使得事情过于“易于控制”,然后您很容易被疾病(等)“入侵”。
  • 我们无法实现“*对齐*”,因为*我们(人类)*在时间和地点上存在根本性的分歧;一个“单一的对齐目标”,“对人类来说不存在””
  • 通过噪声实现鲁棒性:生物系统尽管存在噪声和缺陷也能茁壮成长; 人工智能可能会从这种弹性中学习(例如,神经网络中的Dropout就是一个步骤)。
    • 自我建模有所帮助(变得*更简单和容易*)。 有助于“鲁棒性”、“泛化性”、“其他”好处,当您有更多工作时(“多任务处理”),这可以帮助您在小噪声或小变化(“增强”)的情况下保持相关性
  • 对齐挑战:“对齐”的概念是有问题的,因为人类内部和不同群体之间存在固有的分歧。 向*更具机性和更具整体性*的人工智能系统迈进。

数字生命、自主性和目标的未来

  • 数字生命:探索数字生命和“人工智能生态”的潜力(和风险)至关重要。
  • 人工智能工具的作用:识别与各种形式的智能的沟通/关系。 假肢、识别。
  • 假肢和外包:人类*已经*外包了生活的许多方面,人工智能将继续这一趋势,提出了关于我们的生活在多大程度上由技术定义(与意志力、关系指导者……)相对的伦理问题。
    • 这已经存在! “牙刷、教育”都是外包的。 “放弃我们核心能力的想法[…]这已经发生了”。
  • 描绘积极的未来:我们必须作为一个群体更加清晰。 不仅仅是避免消极的目标(当前的“人工智能安全”):“每个人都在做。 每个人都具体说明…“我们更接近实现了。”
  • 自主性和目标:理解集体系统中目标的起源和管理是一个基本的、可能是存在的挑战。