|
Example of a Data Analysis Report
This report describes the generation of a classifier
for differentiating between cancerous and normal samples from tissue (cell
lysates). The classifier was generated from 15 spectra in each group, and was
tested in a blinded manner on 50 test samples. The individual sections are
pre-processing, classifier generation,
and testing.
Data Pre-Processing
This section contains a description of the used
parameters in the parameter table, summary
information on individual spectra in the spectrum info
table, and visual guides in gel plots,
and graphs of group averages and differences.
The values of the parameters used in the preprocessing of the spectra.
|
Parameter Name
|
Parameter Value
|
|
Window Width Background
|
(22000,50000):(42000,150000) Ppm
|
|
Window Width Noise
|
(22000,100000):(42000,300000) Ppm
|
|
Peak Width
|
(3000,4000):(10000,1500):(22000,2200):(42000,8000)
Ppm
|
|
Background Subtraction
|
Not Performed
|
|
Normalization Values
|
(normalize True, type TIC,
value 100 )
|
|
S/N Cut-off
|
2
|
|
Alignment
|
Not Performed
|
Window width background:
The values of the window widths used to estimate the background: position, value,
units, for details ask one of our analysts.
Window width noise:
The values of the window widths used to estimate the noise strength: position,
value, units, for details ask one of our analysts.
Peak width:
The values of the window widths used to locate peaks: position, value, units.
Background subtraction:
Flag(On/Off) indicating, whether background was subtracted.
S/N cut-off:
The value of the cut-off signal-noise ratio, above which peaks are detected.
Alignment:
Flag On/Off) indicating whether spectra were aligned together with the alignment
accuracy.
This table contains summary information on a spectrum-by-spectrum basis.
|
Int.BG
|
Int.Ns
|
Norm.
|
Peaks
|
File
|
Group
|
|
-.894
|
72.379
|
.001
|
162
|
spectrum1-1.txt
|
Cancer
|
|
-.358
|
62.871
|
.001
|
197
|
spectrum1-2.txt
|
Cancer
|
|
-1.028
|
65.003
|
.001
|
161
|
spectrum1-3.txt
|
Cancer
|
|
-1.05
|
68.341
|
.001
|
183
|
spectrum1-4.txt
|
Cancer
|
|
-.743
|
70.865
|
.001
|
150
|
spectrum1-5.txt
|
Cancer
|
|
-.931
|
62.423
|
.001
|
175
|
spectrum1-6.txt
|
Cancer
|
|
-.444
|
59.669
|
.001
|
192
|
spectrum1-7.txt
|
Cancer
|
|
-1.743
|
96.621
|
.001
|
143
|
spectrum1-8.txt
|
Cancer
|
|
-2.793
|
79.728
|
.001
|
166
|
spectrum1-9.txt
|
Cancer
|
|
.612
|
63.925
|
.001
|
180
|
spectrum1-11.txt
|
Cancer
|
|
.441
|
70.639
|
.001
|
157
|
spectrum1-11.txt
|
Cancer
|
|
1.515
|
54.466
|
.001
|
172
|
spectrum1-12.txt
|
Cancer
|
|
-.277
|
67.324
|
.001
|
168
|
spectrum1-13.txt
|
Cancer
|
|
-.784
|
67.602
|
.001
|
165
|
spectrum1-14.txt
|
Cancer
|
|
1.171
|
71.786
|
.001
|
155
|
spectrum1-15.txt
|
Cancer
|
|
.353
|
65.777
|
.001
|
160
|
spectrum2-1.txt
|
Normal
|
|
2.229
|
67.219
|
.001
|
162
|
spectrum2-2.txt
|
Normal
|
|
.986
|
62.651
|
.001
|
171
|
spectrum2-3.txt
|
Normal
|
|
-.001
|
69.592
|
.001
|
167
|
spectrum2-4.txt
|
Normal
|
|
1.311
|
52.014
|
.001
|
181
|
spectrum2-5.txt
|
Normal
|
|
1.409
|
56.287
|
.001
|
179
|
spectrum2-6.txt
|
Normal
|
|
1.465
|
57.981
|
.001
|
171
|
spectrum2-7.txt
|
Normal
|
|
2.467
|
65.329
|
.001
|
158
|
spectrum2-8.txt
|
Normal
|
|
.257
|
63.79
|
.001
|
161
|
spectrum2-9.txt
|
Normal
|
|
2.011
|
57.055
|
.001
|
173
|
spectrum2-10.txt
|
Normal
|
|
-.122
|
74.346
|
.001
|
160
|
spectrum2-11.txt
|
Normal
|
|
1.473
|
62.412
|
.001
|
168
|
spectrum2-12.txt
|
Normal
|
|
1.054
|
61.719
|
.001
|
181
|
spectrum2-13.txt
|
Normal
|
|
-.026
|
72.49
|
.001
|
164
|
spectrum2-14.txt
|
Normal
|
Int. BG: The value of the
total background integrated over the whole M/Z range.
Int. Ns : The value of
the total estimated noise integrated over the whole M/Z range.
Norm.: The value
of the normalization factor to normalize all spectra to a common total ion current.
Peaks:
The number of peaks found in the spectrum.
File:
The name of the file.
Group:
The group label.
Return
to top.
A simultaneous depiction of all
spectra used in a given run. The horizontal axis corresponds to the M/Z
axis, and the spectra are stacked along the vertical axis. The grey scale
corresponds to the intensity (counts), darker corresponds to larger
values. In the reports we show two gel plots, once with the detected
peaks as green lines and one without. A horizontal red line separates
groups.
Gel plot (with peaks):
Gel plot:
Return
to top.
Each report contains a graph of the
medians of each group in the pair wise comparison

Group Cancer in blue, group Normal in red.
Return
to top.
Plot of Group
Differences:
The difference of the group medians
as a function of M/Z is a first indication for the location and amplitude of
potential biomarkers.
This section contains a table of
features (putative bio-markers) used in the classifier, the
results of cross-validation, and graphs of all
features used in the classifier. A complete list of all potential
markers is part of the full report, but has been omitted here for brevity.
Features are defined as peaks that
are common in more than four files. The width of a feature is the misalignment
error plus the peak width at the features M/Z value. The value of a feature is
defined as the integral of the background subtracted spectrum over the width of
the feature.
|
MZ
|
ID
|
Cancer
Avg
|
Normal Avg
|
Wil P
|
T p
|
CV1
|
CV2
|
|
6664
|
75
|
.67
|
.326
|
.000007
|
.000416
|
.488
|
.192
|
|
6667
|
76
|
.606
|
.31
|
.000043
|
.001456
|
.526
|
.202
|
|
7601
|
89
|
.146
|
.38
|
.000238
|
.000046
|
.624
|
.433
|
|
7608
|
90
|
.192
|
.493
|
.000238
|
.000004
|
.541
|
.355
|
|
7618
|
91
|
.293
|
.716
|
.000043
|
.0000003
|
.434
|
.296
|
|
7629
|
92
|
.436
|
.969
|
.000007
|
.0000001
|
.354
|
.252
|
|
7644
|
93
|
.659
|
1.185
|
.000043
|
.0004
|
.263
|
.194
|
|
8440
|
107
|
.746
|
.108
|
.000001
|
.002594
|
1.001
|
.284
|
|
8563
|
108
|
.732
|
.167
|
.000001
|
.000001
|
.483
|
.405
|
|
8580
|
109
|
.592
|
.185
|
.0000001
|
.0000001
|
.361
|
.354
|
|
10113
|
127
|
.34
|
1.078
|
.000238
|
.000019
|
.542
|
.486
|
|
10142
|
128
|
.257
|
.708
|
.000007
|
.000002
|
.386
|
.39
|
|
10173
|
129
|
-.007
|
.16
|
.000238
|
.000131
|
-9.206
|
.809
|
|
10522
|
133
|
.234
|
.065
|
.000043
|
.000009
|
.486
|
.62
|
|
10839
|
135
|
.702
|
.316
|
.000238
|
.010559
|
.768
|
.24
|
|
10887
|
137
|
.304
|
.151
|
.000238
|
.000724
|
.476
|
.396
|
|
15189
|
178
|
.419
|
1.145
|
.000043
|
.000109
|
.773
|
.467
|
|
15263
|
179
|
1.093
|
2.758
|
.000001
|
.0000009
|
.325
|
.243
|
|
15481
|
185
|
.263
|
.671
|
.000238
|
.000009
|
.775
|
.31
|
|
16313
|
190
|
.099
|
.053
|
.000238
|
.001597
|
.423
|
.549
|
|
17207
|
194
|
.184
|
.087
|
.000043
|
.000989
|
.533
|
.314
|
|
17261
|
195
|
.121
|
.059
|
.000043
|
.000002
|
.283
|
.352
|
|
18435
|
199
|
.123
|
.066
|
.000238
|
.000032
|
.317
|
.337
|
|
25239
|
213
|
.151
|
.339
|
.000007
|
.0004
|
.398
|
.219
|
|
25468
|
214
|
.128
|
.251
|
.000043
|
.000001
|
.421
|
.212
|
|
26635
|
215
|
.243
|
.487
|
.000007
|
.0000001
|
.333
|
.192
|
|
33497
|
223
|
.184
|
.364
|
.000043
|
.003986
|
.925
|
.389
|
|
38578
|
228
|
.078
|
.201
|
.000238
|
.000024
|
.735
|
.371
|
|
38870
|
229
|
.109
|
.231
|
.000238
|
.000021
|
.539
|
.308
|
MZ:
The M/Z position of a feature.
ID:
An ID tag used for referencing.
Gr1 Avg: The average of the
feature in group 1.
Gr2 Avg: The average of the
feature in group 2.
Wil
p:
The p-value of this feature obtained by applying a Wilcoxon rank test
disregarding all correlations with other features testing the hypotheses that
both groups are equal.
T
p:
The p-value of this feature obtained by applying a T-test disregarding all
correlations with other features testing the hypotheses that both groups are
equal assuming Gaussian distributions with equal variances.
CV1:
The coefficient of variation for values in group 1.
CV2:
The coefficient of variation for values in group 1.
Return
to top.
This table summarizes leave-one-out
cross validation runs using a probabilistic KNN classifier.
As a classifier we use a
probabilistic version of a k-nearest neighbor classifier. The algorithm returns
a probability for each possible class label. If the difference between
the highest probability label and the next one is smaller than the parameter
p-diff, the spectrum is labeled as undefined, and leads to a miss in
cross-validation.
PDiff: 0.12 Cancer Size: 15 Normal Size: 15
|
k
|
Errors
|
Misses
|
G1 Errors
|
G1 Misses
|
G2 Errors
|
G2 Misses
|
|
5
|
0
|
2
|
0
|
1
|
0
|
1
|
|
13
|
1
|
0
|
0
|
0
|
1
|
0
|
|
3
|
1
|
0
|
1
|
0
|
0
|
0
|
|
9
|
1
|
1
|
0
|
0
|
1
|
1
|
|
11
|
1
|
1
|
0
|
0
|
1
|
1
|
|
7
|
1
|
2
|
0
|
1
|
1
|
1
|
|
1
|
2
|
0
|
2
|
0
|
0
|
0
|
p-diff:
The minimum required difference in class
probabilities.
K:
The number of nearest neighbors used in the classifier.
Errors:
The number of cross-validation errors.
Misses: The
number of spectra labeled as undefined.
G1 Errors: The number of errors in group 1.
G1 Misses: The number of undefined spectra in
group 1.
G2 Errors: The number of errors in group
2.
G2 Misses: The number of undefined spectra in
group 2.
Return
to top.
For each pair wise classification we
show graphs of each feature by plotting the group medians over the feature
width together with their 25th and 75th percentile.
Features used in classifier, plots of
medians: Cancer in blue, normal in red
Return
to top.
In this example we were given 50 test spectra, and
applied the above classifier blindly to these spectra. The user then send the
real labels for each of these test spectra. The results are summarized below,
showing that the above classifier performs very well on unknown samples.
|
File
|
True group
|
Assigned label
|
|
spectrum1-16.txt
|
Cancer
|
Cancer
|
|
spectrum1-17.txt
|
Cancer
|
Cancer
|
|
spectrum1-18.txt
|
Cancer
|
UNDEFINED
|
|
spectrum1-19.txt
|
Cancer
|
Cancer
|
|
spectrum1-20.txt
|
Cancer
|
Cancer
|
|
spectrum1-21.txt
|
Cancer
|
Cancer
|
|
spectrum1-21.txt
|
Cancer
|
UNDEFINED
|
|
spectrum1-22.txt
|
Cancer
|
UNDEFINED
|
|
spectrum1-23.txt
|
Cancer
|
Cancer
|
|
spectrum1-24.txt
|
Cancer
|
Cancer
|
|
spectrum1-25.txt
|
Cancer
|
Cancer
|
|
spectrum1-26.txt
|
Cancer
|
Cancer
|
|
spectrum1-27.txt
|
Cancer
|
Cancer
|
|
spectrum1-28.txt
|
Cancer
|
Cancer
|
|
spectrum1-29.txt
|
Cancer
|
Cancer
|
|
spectrum1-30.txt
|
Cancer
|
Cancer
|
|
spectrum1-31.txt
|
Cancer
|
Cancer
|
|
spectrum1-32.txt
|
Cancer
|
Cancer
|
|
spectrum1-33.txt
|
Cancer
|
Cancer
|
|
spectrum1-34.txt
|
Cancer
|
Cancer
|
|
spectrum1-35.txt
|
Cancer
|
Cancer
|
|
spectrum1-36.txt
|
Cancer
|
Cancer
|
|
spectrum1-37.txt
|
Cancer
|
Cancer
|
|
spectrum1-38.txt
|
Cancer
|
Cancer
|
|
spectrum1-39.txt
|
Cancer
|
Cancer
|
|
spectrum2-15.txt
|
Normal
|
Normal
|
|
spectrum2-16.txt
|
Normal
|
Normal
|
|
spectrum2-17.txt
|
Normal
|
Normal
|
|
spectrum2-18.txt
|
Normal
|
Normal
|
|
spectrum2-19.txt
|
Normal
|
Normal
|
|
spectrum2-20.txt
|
Normal
|
Normal
|
|
spectrum2-21.txt
|
Normal
|
Normal
|
|
spectrum2-21.txt
|
Normal
|
Normal
|
|
spectrum2-22.txt
|
Normal
|
Normal
|
|
spectrum2-23.txt
|
Normal
|
Normal
|
|
spectrum2-24.txt
|
Normal
|
Normal
|
|
spectrum2-25.txt
|
Normal
|
Normal
|
|
spectrum2-26.txt
|
Normal
|
Normal
|
|
spectrum2-27.txt
|
Normal
|
Normal
|
|
spectrum2-28.txt
|
Normal
|
Normal
|
|
spectrum2-29.txt
|
Normal
|
Normal
|
|
spectrum2-30.txt
|
Normal
|
Normal
|
|
spectrum2-31.txt
|
Normal
|
Normal
|
|
spectrum2-32.txt
|
Normal
|
Normal
|
|
spectrum2-33.txt
|
Normal
|
Normal
|
|
spectrum2-34.txt
|
Normal
|
Normal
|
|
spectrum2-35.txt
|
Normal
|
Normal
|
|
spectrum2-36.txt
|
Normal
|
UNDEFINED
|
|
spectrum2-37.txt
|
Normal
|
Normal
|
|
spectrum2-38.txt
|
Normal
|
Normal
|
Summary
of the Results for the test case:
Total:
errors – 0, undefined 4 (8%), correct assignment – 92%
Cancer:
errors – 0, undefined 3 (12%), correct assignment – 88%
Normal:
errors – 0, undefined 1 (4%), correct assignment – 96%
Return
to top.
Biodesix has focused on applying signal processing, statistical analysis,
and an in-depth understanding of the physics of mass spectrometers to build a bio-analysis platform
that can be used to develop diagnostic "classifiers" for use by researchers, drug developers and clinicians.
These classifiers can reproducibly, reliably, and with clinically significant accuracy, predict and evaluate
normal biologic or pathogenic states or the probable pharmacological responses to a therapeutic intervention.
For more details and inquires about our Services: call 970.870.9041
|