1   The Challenge
Effective therapeutics for prevalent diseases require deep insight into molecular, genetic, and cellular factors, yet this knowledge is scattered across diverse sources, posing major challenges for data integration and analysis.
 2   Our Solution
Here, we present CROssBARv2, a heterogeneous knowledge graph (KG) based system to facilitate systems biology and drug discovery/repurposing. CROssBARv2 collects large-scale biological data from 32 data sources and stores them in a Neo4j-based graph database.
 3   Content
CROssBARv2 consists of 2,709,502 nodes and 12,688,124 relationships between 14 node types (i.e., protein, gene, organism, domain, biological process, molecular function, cellular component, drug, compound, disease, pathway, phenotype, EC number, and side effect).
 4   Easy Access
We developed a large language model interface to convert natural language queries into Neo4j's Cypher query language back and forth to access information within the KG and answer user questions without LLM hallucinations. Other means of interacting with CROssBSRv2 are our GraphQL API and the Neo4j browser.
CROssBARv2 is expected to contribute to life sciences research considering (i) the discovery of biological mechanisms at the molecular level and (ii) the development of effective therapeutic strategies.
Node | Identifier | Source Database | ||
1 | Protein | UniProt | UniProt | |
2 | Gene | Entrez | UniProt | |
3 | Organism | NCBI Tax ID | UniProt | |
4 | Drug | Drugbank | Drugbank | |
5 | Compound | ChEMBL | ChEMBL | |
6 | Side effect | Meddra | SIDER, OffSIDES, ADReCS | |
7 | Disease | MONDO | MONDO | |
8 | Phenotype | HPO | HPO | |
9 | Domain | InterPro | InterPro | |
10 | Pathway | KEGG, Reactome | KEGG, Reactome | |
11 | EC number | EC number | Expasy ENZYME | |
12 | Molecular function | GO | GOA, Expasy ENZYME | |
13 | Cellular component | GO | GOA | |
14 | Biological process | GO | GOA |
Interact with the CROssBARv2 database using natural language. Through the Graph Explorer, you can navigate direct relationships between entities, retrieving structured facts and connections from the graph. The Semantic Search feature, powered by embeddings, enables the discovery of biologically meaningful patterns by identifying similarities between entities, that go beyond direct graph links.
Access the CROssBARv2 database programmatically using a flexible GraphQL interface. You can build custom, nested queries to retrieve precisely the data you need. It's ideal for integrating CROssBARv2 into your analytical workflows or applications, and supports seamless development with tools like Apollo Studio.
Explore the CROssBARv2 knowledge graph visually through the Neo4j Browser interface. This interactive tool lets you run Cypher queries, visualize nodes and relationships, and investigate the structure of the graph in detail.
1Biological Data Science Lab, Dept. of Computer Engineering & AI Engineering, Hacettepe University, Ankara, Turkey
2Dept. of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
For further inquiries, please contact us.