Toursun Synbio is committed to developing lightweight and controllable human intelligence models, reducing the difficulty, cycle, and cost of developing new synthetic biological products in the field of biomedicine. With a focus optimizing the design and manufacturing process of large molecule pharmaceuticals and enzyme engineering, Toursun Synbio seeks to overcome key bottlenecks in traditional protein engineering by studying rapid and effective large molecule modification schemes. Toursun Synbio has established partnerships with esteemed institutions such as Shanghai Jiao Tong University, the National Center for Applied Mathematics in Shanghai, the Institute of Natural Sciences, the Chongqing Artificial Intelligence Research Institute, and the Artificial Intelligence Biomedicine Center of Zhangjiang Institute for Advanced Study. Through these collaborations, the company aims to promote the integration of industry, academia, and research, facilitating the transformation of laboratory findings into practical industrial applications. One of the company's notable achievements is the development of an AI platform for directed evolution of enzyme proteins and sequence generation. This platform assists biological and chemical experts in modifying proteins to suit specific application scenarios, providing a valuable tool for advancing research and development in the field of synthetic biology.
As the "chip" of the biotechnology industry, enzymes play an important biocatalytic role in fields such as chemical engineering, pharmaceuticals, and energy. TourSun Synbio is committed to developing advanced computational methods based on protein sequence semantic information, coevolutionary information, amino acid geometric microenvironments, molecular dynamics, and other aspects to establish suitable artificial intelligence computational models for the directed evolution of enzymes, designing artificial enzyme molecules with high activity, selectivity, and yield, and promoting the development of downstream industries. TourSun Synbio is also focusing on the design of large molecule drugs such as new antibody drugs (such as the COVID-19 vaccine), next-generation immune checkpoint regulators, multifunctional antibodies, G protein-coupled receptor antibodies, and antibody-drug conjugates. Efficient and reliable antibody design is a core industrial technology in the design of large molecule drugs and an important direction for computational protein design. In designing deep learning models, mathematical methods such as diffusion models and stochastic differential equations are used to simulate the docking process of antigen-antibody interactions, thus designing antibody sequences and predicting antibody structures.
ProteinEngine is an AI integrated protein design platform that seamlessly connects various tools through API calls, empowering large language models with distinct roles for task delegation, resolution, and result communication, aiming to enhance deep learning performance in protein engineering tasks. This highly modularized and extensible platform provides potent AI acceleration for protein design. It has been deployed on WeChat, where authorized users can access the portal and conveniently converse with PE in natural language. PE caters to users' diverse needs in protein design (folding, inverse folding, de novo sequence design, antibody-antigen docking prediction, etc.).
GraDe-IF offers a novel sequence generation AI model based on generative AI, capable of producing biologically meaningful and reliable new sequences that can fold into structures specified by sequence similarity. GraDe-IF is based on a protein graph denoising diffusion model, using a given protein backbone to guide the diffusion of amino acid types and encoding an amino acid substitution matrix as biological prior knowledge to generate diverse protein sequences for a given protein backbone structure. This method integrates equivariant graph neural networks, enabling the generation of sequences with high reliability and accuracy, particularly achieving a high recovery rate in conservative regions.
LGN is a lightweight and generalizable geometric deep learning framework used to study multi-site directed evolution of proteins. It achieves the same or even better pre-training performance with only 1% or even lower computational cost compared to existing methods, allowing users to make more fine-grained adjustments to the model for specific proteins and obtain better protein design ideas.
This article explores the stability mechanism of thermophilic cytochrome P450, a type of heme protein that is widely present in bacteria, animals, and human bodies. It was discovered that thermophilic cytochrome P450 has high flexibility in its natural folding state, allowing it to acquire heat resistance by reducing the driving force of entropy during unfolding. This discovery provides a new approach based on entropy for the rational design of heat-resistant proteins.
SESNet explores an effective encoding method for amino acid microenvironments and achieves good prediction results on multi-site mutations (especially higher-order mutations greater than 4 sites). In addition to utilizing protein sequence and structural information, the training process also incorporates local and global evolutionary background of homologous sequences, and uses unsupervised learning for data augmentation and model pre-training.
ACMP comes from dynamic particle systems and transforms message passing using an attraction-repulsion mechanism: aggregating similar features and disparate features, while using external potential energy to ensure system stability. This approach can avoid the common problem of over-smoothing in graph representation learning, and can effectively extract key features and separate interference information on heterogeneous graphs represented by biomolecules.
This research work designs a widely applicable spectral graph neural network structure that can use low-pass and high-pass filters to separate the input graph signal into multi-resolution and multi-scale representations, thereby obtaining global features and local information representations. The former helps capture the overall features of the graph signal after denoising, while the latter is crucial for defining specific aspects of the graph in certain applications, such as the CDR region of antibody proteins, key mutation sites in enzyme engineering, and tumor cells in medical images.
Traditional graph neural networks cannot capture the multi-level interaction relationships that exist in many complex systems. To overcome this limitation, this paper proposes a message passing simplicial complex network (MPSN), which uses Weisfeiler-Lehman (SWL) coloring to distinguish non-isomorphic simplicial complexes, and demonstrates that MPSN is equivalent to SWL and no lower than the 3-WL test limit of GNN. MPSN exhibits superior performance over GNN in trajectory prediction and large-scale molecular prediction.
This article explores the difficulty of graph neural networks in handling long-distance interactions, which can be attributed to the strong coupling between the computational graph and the input graph structure. The article extends recent theoretical advances in simplicial complexes to regular cell complexes, which flexibly incorporate rich topological information from simplicial complexes and graphs. This extension provides a powerful graph "lifting" transformation, each of which leads to a unique layer-wise message passing process (cellularized wavelet networks, CWNs), whose network expressive power is no less than the 3-WL test and can outperform the best GNN performance. This approach is beneficial for high-order topological signal modeling and node distance compression, and achieves outstanding results surpassing GNN in molecular graph prediction, in particular.
FMP is a graph neural network based on message passing networks and a multi-scale wavelet framework, which is used to capture non-linear and long-range node relationships on graphs. This design can integrate feature representations of multi-hop neighboring nodes during node information propagation with lower computational cost, avoiding excessive smoothing using non-decaying Dirichlet energy during propagation. Additionally, this method can also reconnect graph nodes to avoid graph representation learning bottlenecks caused by excessive compression.
Associate Professor, Institute of Natural Sciences, School of Mathematical Sciences, Department of Computer Science and Technology, Zhangjiang IAS, at Shanghai Jiao Tong University Adjunct Lecturer/PhD at UNSW Former Research Scientist at MPI Research on GNNs, Geometric Deep Learning, Protein Design
PhD at Johns Hopkins University
MBA from Tsinghua University Former Investment Director of Sina Group Assistant to the Chairman of a US listed company
Member of Academia Europaea Member of the European Academy of Sciences Director of Institute of Natural Sciences, Shanghai Jiao Tong University Co-Director of Shanghai National Center for Applied Mathematics Former Head of Department of Mathematics, Vilas Distinguished Achievement Professor, University of Wisconsin-Madison Former Head of School of Mathematical Sciences, Shanghai Jiao Tong University Research on Multiscale Equations, Kinetics Equations, Fundamentals of Graph Neural Network Algorithms
Member of Academia Europaea Professor of Computational Biology and AI at University of Cambridge Fellow of ELLIS Professor at Cambridge Center for AI in Medicine Research on GNN modeling, Medical AI Main Inventor of Graph Attention Networks
Researcher at the School of Life Sciences, Shanghai Jiao Tong University Postdoctoral fellow at the University of Illinois at Urbana-Champaign Ph.D. in Microbiology from Chinese Academy of Sciences Committee member of Synthetic Biology, Shanghai Society for Biotechnology Committee member of Synthetic Biology, China Association of Medicinal Biotechnology Research on Synthetic Biology, Cell Factory