Back to
Projects List
Development/refinement of the idc-index python interface to Imaging Data Commons
Key Investigators
- Andrey Fedorov (BWH, USA)
- Vamsi Thiriveedhi (BWH, USA)
- Daniela Schacherer (Fraunhofer MEVIS, Germany)
- Leonard Nürnberg (Maastricht University, Netherlands)
- Steve Pieper (Isomics Inc, USA)
- Jean-Christophe Fillion-Robin (Kitware Inc, USA)
- Daniela Schacherer (Fraunhofer MEVIS, Germany)
- Chris Bridge (MGH, USA)
Presenter location: Remote
Project Description
idc-index
is a lightweight python package that wraps mini-index of the data available in Imaging Data Commons and the s5cmd download tool. With this package, one can search basic attributes of IDC data, build subset and download corresponding files without login, and without setting up any prerequisites specific to either Google or AWS as easy as below:
$ pip install 'idc-index==0.2.11'
from idc_index import index
client = index.IDCClient()
client.download_from_selection(collection_id="nsclc_radiomics", downloadDir="./my_copy")
Its basic functionality is demonstrated in this tutorial: https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/labs/idc_rsna2023.ipynb.
SlicerIDCBrowser already relies on idc-index
for searching and downloading data from IDC.
Objective
- Raise awareness about
- Improve functionality
- Collect feedback to prioritize future developments
Approach and Plan
- collect feedback about what functionality would be useful to add or how to refine the API
- discuss capabilities that would be needed to support digital pathology use cases
- refine organization of the underlying index and exposed metadata attributes
- finish setting up GitHub actions to simplify updates and python package publishing
- documentation
- discuss with python packaging experts what is the recommended practice handling attachments/binary dependencies for a python package (ie, see https://github.com/ImagingDataCommons/idc-index/issues/3 and https://github.com/ImagingDataCommons/idc-index/issues/27)
Progress and Next Steps
- Refinement and testing to fix regressions in 0.2.9
- Discussed with Leo
- Discussion with @pieper re utility. Feedback: “Speaking for myself, this exercise made wish we had some api documentation for idc-index. Also is there a way to report progress during the download? Also some better error messages would help. I tried pasting the collection name from the portal as the collection_id and I get a pyhon error about a manifest not existing. I had to use the collection query to figure out what the mapping rule is. It would be nice if the idc-index methods could include a mapping so that either version of the collection string is accepted. Otherwise it worked well though and this is definitely a nice way to access the data!”
-
@pieper was curious if it was possible to retrieve instanace level urls from SeriesInstanceUID. @vkt1414 created a demo notebook https://colab.research.google.com/drive/1va1xHMe1pgqZqp7RpI1VxqBKBOiGD-TW?usp=sharing - added to the package as a new API endpoint
- need to have documentation (relevant discussion https://github.com/encode/httpx/discussions/1220)
- Added API for getting intance-level URLs and viewer URLs
- Started working on the documentation
- JC is contributing a PR to refactor and introduce improvements to packaging and github actions https://github.com/ImagingDataCommons/idc-index/pull/32
- Discussed how to improve API with Leo and Steve; need to document specific usage scenarios of what the users would like to achieve, and use those to drive revisions of the API
- Discussed the scope of support of slide microscopy metadata queries - need to investigate how to best represent those, since these are instance-level attributes, while currently idc-index is series-based.
ContainerID,
PixelSpacing,
Rows,
Columns,
TotalPixelMatrixRows,
TotalPixelMatrixColumns,
ImageType,
TransferSyntaxUID,
SpecimenDescriptionSequence>
PrimaryAnnotationStructureSequence(PASS)>Code scheme, value..
SpecimenUID,
and several others under PASS,
SpecimenPrepStepContentItemSequence>Coding terms,
OpticalPathSequence,
IlluminationTypeCodeSequence,
IlluminationColorCodeSequence,
Wavelength,
PyramidUID,
PyramidLabel
Illustrations
No response
Background and References