CROssBARv2: A Unified Biomedical Knowledge Graph for Heterogeneous Data Representation and LLM-Driven Exploration

 1   The Challenge

Effective therapeutics for prevalent diseases require deep insight into molecular, genetic, and cellular factors, yet this knowledge is scattered across diverse sources, posing major challenges for data integration and analysis.

 2   Our Solution

Here, we present CROssBARv2, a heterogeneous knowledge graph (KG) based system to facilitate systems biology and drug discovery/repurposing. CROssBARv2 collects large-scale biological data from 32 data sources and stores them in a Neo4j-based graph database.

 3   Content

CROssBARv2 consists of 2,709,502 nodes and 12,688,124 relationships between 14 node types (i.e., protein, gene, organism, domain, biological process, molecular function, cellular component, drug, compound, disease, pathway, phenotype, EC number, and side effect).

 4   Easy Access

We developed a large language model interface to convert natural language queries into Neo4j's Cypher query language back and forth to access information within the KG and answer user questions without LLM hallucinations. Other means of interacting with CROssBSRv2 are our GraphQL API and the Neo4j browser.

CROssBARv2 is expected to contribute to life sciences research considering (i) the discovery of biological mechanisms at the molecular level and (ii) the development of effective therapeutic strategies.

CROssBARv2 Knowledge Graph

Unique Databases
34
Node Types
14
e.g., genes, proteins, diseases
Relationship Types
33
e.g., associations, interactions
Nodes
2,709,502
Relationships
12,688,124
Biological entity (node) types

Chat with the CROssBARv2

Interact with the CROssBARv2 database using natural language. Through the Graph Explorer, you can navigate direct relationships between entities, retrieving structured facts and connections from the graph. The Semantic Search feature, powered by embeddings, enables the discovery of biologically meaningful patterns by identifying similarities between entities, that go beyond direct graph links.

Programmatic Access with Apollo GraphQL API

Access the CROssBARv2 database programmatically using a flexible GraphQL interface. You can build custom, nested queries to retrieve precisely the data you need. It's ideal for integrating CROssBARv2 into your analytical workflows or applications, and supports seamless development with tools like Apollo Studio.

Manually explore with Neo4j Browser

Explore the CROssBARv2 knowledge graph visually through the Neo4j Browser interface. This interactive tool lets you run Cypher queries, visualize nodes and relationships, and investigate the structure of the graph in detail.

CROssBARv2 Team

Bünyamin Şen1,2 Erva Ulusoy1,2 Melih Darcan1 Mert Ergün1 Tunca Doğan1,2

1Biological Data Science Lab, Dept. of Computer Engineering & AI Engineering, Hacettepe University, Ankara, Turkey

2Dept. of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey

For further inquiries, please contact us.