Research
Public Cancer Data Bank
A metadata-first registry of open oncology datasets — sources, datasets, drugs, combinations, trials, and ingestion jobs.
The Data Bank is a curated catalog of public oncology resources. It tracks who hosts them, what they contain, what license they're under, what access level they have, and which cancer types they cover. The intent is to make the public landscape easy to navigate without re-hosting anything you don't have a right to.
What you'll find
- Sources — 15 seeded (NCI CRDC, GDC/TCGA, cBioPortal, DepMap/CCLE, GDSC, PRISM, DrugComb, NCI ALMANAC, CPTAC/PDC, IDC, GEO, ClinicalTrials.gov, SEER, Open Targets, PubMed).
- Datasets — organized and grouped by cancer type ('Cancers' tab).
- Drugs and combinations.
- Clinical trials.
- Ingestion jobs — every connector run records a job with status and a log.
Live connectors
cBioPortal (multi-cancer studies), GDC/TCGA (project summaries for ~7 projects), and ClinicalTrials.gov (pan-cancer trial metadata) can be run on demand from the admin pages. Each run produces an ingestion job record.
Hard rule
Controlled-access patient-level data is NEVER mirrored without IRB and data-use agreements. Connectors are metadata-first by design.