NSF MRI: High Performance Digital Pathology Using Big Data and Machine Learning

What's New

(20180523) A sample of our digital pathology database, v0.0.0 is now available. This sample contains 6 patients and 3 tumor types (breast, gastrointestinal, urology).

(20180521) The app server that hosts Leica/Aperio's eSlide Manager software has been successfully installed.

(20180515) We passed a major milestone by completing scanning of over 3,000 pathology slides. These digital slides will be crucial in training our deep-learning system, as well as creating our database. Learn more here.

Read More

Project Summary

In this NSF-funded project, we are developing a a digital imaging system using big data and machine learning algorithms to automatically characterize pathology slides. We have developed a sustainable facility to rapidly collect automatically annotated whole slide images. This project is producing the necessary data resources to support the development of high performance deep learning models.

Over 10M slides read each year in the U.S. alone. Tapping into a fraction of this data will allow significant advancement of the science. Healthcare providers and machine learning researchers will be able to access an open source high-quality searchable archive of clincial data. More information on this project can be found here.

A Cost-Effective Image Management Platform

This NSF Major Research Instrumentation (MRI) grant supported the purchase of a Leica Aperio AT2 scanner as the platform used to convert pathology slides to digital images. This scanner can scan 50 high quality TIFF images with lossless compression per hour.

We have also developed a very cost-effective Petabyte file store based on off-the-shelf components to store our database of 1M pathology images. To learn more about our clustered computing environment being developed to support this research program, read this overview.