Back to
Projects List
dcm2parquet
Key Investigators
- Vamsi Thiriveedhi (BWH, USA)
- Andrey Fedorov (BWH, USA)
- Steve Pieper (Isomics, Inc., USA)
Presenter location: In-person
Project Description
As of now, we are not aware of a tool other than Google Cloud’s HealthCare API that can extract DICOM Header for a dataset and enable querying the metadata via SQL. We aim to explore if it is possible to do this ‘in-house’ so that those researchers who can not upload their data to Google DICOM stores or do not have access to Google Cloud can benefit. To that end, we found duckdb, a fast in-process analytical database/client to be able to query highly complex nested data. Using duckdb, and pydicom, our goal is extract DICOM header in a way that is similar to Bigquery export feature in Google Cloud’s HealthCare API.
Objective
- Convert DICOM header to parquet preserving the nesting
- Figure out a way to dynamically update schema and data manipulations necessary
- Make the tool available on Hugging Face by integrating with idc-index, to seamlessly experiment with existing data in IDC
Approach and Plan
- Create a function to extract metadata at the series level first, assuming schema is consistent with in a series.
- Identify which columns and fields in the nested hierarchy, have inconsistent schema in a dataset, and choose most exhaustive datatype. For example b/w a string and array of strings, string datatype will be updated to array. Fill the missing columns, fields with nulls
Progress and Next Steps
- Able to extract metadata at series level with out any problems. The app reflects the progress made up to this point
- Next, inspired from how Bigquery displays the schema, we aim to replicate that. After, we will compare the common columns between Image series and determine if any data manipulation is necessary.
data:image/s3,"s3://crabby-images/b02d4/b02d4b7f54d97968453acd6b05f6bd27dca2a7da" alt="image"
Illustrations
We hosted the app on Hugging Face space at https://huggingface.co/spaces/vkt1414/dcm2parquet
data:image/s3,"s3://crabby-images/89a32/89a32261778ba26d87f32865786da92e9ca2d39c" alt="image"
Background and References