Name | PAMONO Dataset Documentation [v2.0] | ||
---|---|---|---|
Description |
=================================================================== ================================== 0. TERMINOLOGY The terms "virus", "particle" and "nano-object" are used synonymously in the file names, directory names and data described in this documentation. All these terms denote the same objects of interest, that is true positive nano-objects that are sought in the data. 1. DIRECTORY/FILE STRUCTURE Each .zip file contains one dataset, i.e. data recorded during one PAMONO experiment. This encompasses real data, as measured by the PAMONO sensor, and synthetic data with a ground truth segmentation/classification. Synthetic data is based on real data in combination with the signal model described in [1] (German) and in Chapter 4 of [2] (English). 1.1 DATASETS Names of the .zip files describe the most important properties of the contained dataset. The information in these names consists of particle size in nanometers (nm), the date when the experiment took place, and an optional identifier in case of multiple experiments from one day. The mapping from the names used in Table 7.1 of [2] to the .zip file names used here is: 1.2. REAL DATA The directory named "real" inside the base directory of each dataset contains data recorded by the PAMONO sensor. It is divided into directories "background" and "particles". The "background" directory contains measurements from before particles were injected into the liquid and hence only shows background, artifacts and noise. It is the data used as the particle-less background in synthesis as according to [1,2]. The "particles" directory contains measurements from after particles were injected into the liquid and hence shows particle adhesions. This is the real data to be finally analyzed. 1.3 SYNTHETIC DATA The directory named "synthetic" inside the base directory of each dataset contains synthetic images (.png) annotated with ground truth (.csv). A small set of template particles extracted from the "real/particles" images was synthesized on the "real/background" images. The background images are split into three temporally coherent batches of equal length, stored in the folders "1", "2" and "3". These are intended to be used as training, validation and test data, respectively. These roles can be permuted, hence the names are numbers instead of "training", "validation" and "test" to avoid confusion. 2. DIRECTORIES AND FILES IN THE "synthetic" FOLDERS 2.1 ".png" FILES IN DIRECTORIES "1", "2" and "3" These ".png" files are the synthesized images, i.e. virus templates taken from real data, synthesized on real background, artifacts and noise, as according to the sensor model proposed in [1] and in Chapter 4 of [2]. 2.2 "NanoSynthMLPolygonFormFactors.csv" FILE This file contains the synthetic ground truth segmentation of all particles appearing in the image files from 2.1. This segmentation contains only particle positives, so a classification is implicitly given as well. The semicolon ";" is used as the column separator in "NanoSynthMLPolygonFormFactors.csv". The first line contains column names, the remaining lines contain the data. The most important columns are (in the order as they appear in the file): label: class label of the example in the current line. All labels are "virus" here because synthetic ground truth only contains viruses and no false positive detections. As stated in 0., the term 'virus' is synonymous with the terms 'particle', 'nano-object' or 'true positive detection'. fileName: name of the ".png" file on which the particle was found. Note that this name is absolute and might need adaptation to you local directory structure. x*, y*: x and y coordinates of the manually segmented polygon delineating the particle adhesion. This polygon is typically larger than machine-detected polygons because it is intended to cover the entire region affected by the particle signal and its nearest surroundings. For this reason, these polygons (and features derived from them) should not be used in learning a model to classify machine-detected data. The polygons in this file rather serve as a basis to which the machine-detected polygons can be matched in order to measure detection quality. 2.3 DIRECTORY "particles_component" For synthetic data, a perfectly separated particle component is available, which is stored in the subdirectory "particles_component". It contains only the target particles to be detected without any background/artifacts/noise or other effects impeding analysis. Hence this is the pure signal of interest. In Equation 4.1 of [2], this signal corresponds to the T component, i.e. the target signal. 2.4 DIRECTORY "background_component" This directory contains the complementary data to the directory "particles_component": This is solely the irrelevant background, artifacts and noise, without the particles. The images described in 2.1 are composed as according to Equation 4.1 in [2], using the images from 2.4 as the B(ackground), A(rtifacts) and N(oise) components, modulated with the images from 2.3 as the T(arget particles) component. =================================================================== [1] Siedhoff, D., Libuschewski, P., Weichert, F., Zybin, A., Marwedel, P., Müller, H. (2014). "Modellierung und Optimierung eines Biosensors zur Detektion viraler Strukturen" In: Bildverarbeitung für die Medizin 2014 (pp. 108-113). Springer Berlin Heidelberg. [2] Siedhoff, D. (2016). "A parameter-optimizing model-based approach to the analysis of low-SNR image sequences for biological virus detection" PhD thesis. TU Dortmund University. DOI: http://dx.doi.org/10.17877/DE290R-17272 [3] Siedhoff, D., Fichtenberger, H., Libuschewski, P., Weichert, F., Sohler, C. and Müller, H. (2014) "Signal/Background Classification of Time Series for Biological Virus Detection" In: Pattern Recognition. Ed. by X. Jiang, J. Hornegger, and R. Koch. Vol. 8753. Lecture Notes in Computer Science. Springer Berlin Heidelberg. URL: https://link.springer.com/chapter/10.1007/978-3-319-11752-2_31 |
||
License | This documentation.txt is made available under the Open Database License: Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/odbl/ | ||
Contact | Dr. Siedhoff, Dominic | ||
SFB Part Project |
SFB876-B2
|
||
DatasetFile | documentation.txt (7 KB) | ||
Publication |
|