Services Example Report
Example of a Data Analysis Report
This report describes the generation of a classifier for differentiating between cancerous and normal samples from tissue (cell lysates). The classifier was generated from 15 spectra in each group, and was tested in a blinded manner on 50 test samples. The individual sections are pre-processing, classifier generation, and testing.

Data Pre-Processing

This section contains a description of the used parameters in the parameter table, summary information on individual spectra in the spectrum info table,  and visual guides in gel plots, and graphs of group averages and differences.

Parameter Table
The values of the parameters used in the preprocessing of the spectra.

Parameter Name

Parameter Value

Window Width Background

(22000,50000):(42000,150000) Ppm

Window Width Noise

(22000,100000):(42000,300000) Ppm

Peak Width

(3000,4000):(10000,1500):(22000,2200):(42000,8000) Ppm

Background Subtraction

Not Performed

Normalization Values

 (normalize True, type TIC, value 100 )

S/N Cut-off

2

Alignment

Not Performed


Window width background:             
The values of the window widths used to estimate the background: position, value, units, for details ask one of our analysts.

Window width noise:        
The values of the window widths used to estimate the noise strength: position, value, units, for details ask one of our analysts.
Peak width:        
The values of the window widths used to locate peaks: position, value, units.
Background subtraction:        
Flag(On/Off) indicating, whether background was subtracted.
S/N cut-off:        
The value of the cut-off signal-noise ratio, above which peaks are detected.
Alignment:        
Flag On/Off) indicating whether spectra were aligned together with the alignment accuracy.

Parameter Table
This table contains summary information on a spectrum-by-spectrum basis.

Int.BG

Int.Ns

Norm.

Peaks

File

Group

-.894

72.379

.001

162

spectrum1-1.txt

Cancer

-.358

62.871

.001

197

spectrum1-2.txt

Cancer

-1.028

65.003

.001

161

spectrum1-3.txt

Cancer

-1.05

68.341

.001

183

spectrum1-4.txt

Cancer

-.743

70.865

.001

150

spectrum1-5.txt

Cancer

-.931

62.423

.001

175

spectrum1-6.txt

Cancer

-.444

59.669

.001

192

spectrum1-7.txt

Cancer

-1.743

96.621

.001

143

spectrum1-8.txt

Cancer

-2.793

79.728

.001

166

spectrum1-9.txt

Cancer

.612

63.925

.001

180

spectrum1-11.txt

Cancer

.441

70.639

.001

157

spectrum1-11.txt

Cancer

1.515

54.466

.001

172

spectrum1-12.txt

Cancer

-.277

67.324

.001

168

spectrum1-13.txt

Cancer

-.784

67.602

.001

165

spectrum1-14.txt

Cancer

1.171

71.786

.001

155

spectrum1-15.txt

Cancer

.353

65.777

.001

160

spectrum2-1.txt

Normal

2.229

67.219

.001

162

spectrum2-2.txt

Normal

.986

62.651

.001

171

spectrum2-3.txt

Normal

-.001

69.592

.001

167

spectrum2-4.txt

Normal

1.311

52.014

.001

181

spectrum2-5.txt

Normal

1.409

56.287

.001

179

spectrum2-6.txt

Normal

1.465

57.981

.001

171

spectrum2-7.txt

Normal

2.467

65.329

.001

158

spectrum2-8.txt

Normal

.257

63.79

.001

161

spectrum2-9.txt

Normal

2.011

57.055

.001

173

spectrum2-10.txt

Normal

-.122

74.346

.001

160

spectrum2-11.txt

Normal

1.473

62.412

.001

168

spectrum2-12.txt

Normal

1.054

61.719

.001

181

spectrum2-13.txt

Normal

-.026

72.49

.001

164

spectrum2-14.txt

Normal


Int. BG:          The value of the total background integrated over the whole M/Z range.

Int. Ns :           The value of the total estimated noise integrated over the whole M/Z range.

Norm.:           The value of the normalization factor to normalize all spectra to a common total ion current.

Peaks:            The number of peaks found in the spectrum.

File:                The name of the file.

Group:            The group label.

Return to top.

Gel plots

A simultaneous depiction of all spectra used in a given run.  The horizontal axis corresponds to the M/Z axis, and the spectra are stacked along the vertical axis.  The grey scale corresponds to the intensity (counts), darker corresponds to larger values.  In the reports we show two gel plots, once with the detected peaks as green lines and one without.  A horizontal red line separates groups.

 

Gel plot (with peaks):

 

 

Gel plot:

 

 

Return to top.

Plot of Group Averages

Each report contains a graph of the medians of each group in the pair wise comparison

 

                    Group Cancer in blue, group Normal in red.

Return to top.

Plot of Group Differences:

The difference of the group medians as a function of M/Z is a first indication for the location and amplitude of potential biomarkers.

 

 

Classifier Generation

This section contains a table of features (putative bio-markers) used in the classifier, the results of cross-validation, and graphs of all features used in the classifier. A complete list of all potential markers is part of the full report, but has been omitted here for brevity.

Features:

Features are defined as peaks that are common in more than four files. The width of a feature is the misalignment error plus the peak width at the features M/Z value. The value of a feature is defined as the integral of the background subtracted spectrum over the width of the feature.

MZ

ID

Cancer Avg

Normal Avg

Wil P

T p

CV1

CV2

6664

75

.67

.326

.000007

.000416

.488

.192

6667

76

.606

.31

.000043

.001456

.526

.202

7601

89

.146

.38

.000238

.000046

.624

.433

7608

90

.192

.493

.000238

.000004

.541

.355

7618

91

.293

.716

.000043

.0000003

.434

.296

7629

92

.436

.969

.000007

.0000001

.354

.252

7644

93

.659

1.185

.000043

.0004

.263

.194

8440

107

.746

.108

.000001

.002594

1.001

.284

8563

108

.732

.167

.000001

.000001

.483

.405

8580

109

.592

.185

.0000001

.0000001

.361

.354

10113

127

.34

1.078

.000238

.000019

.542

.486

10142

128

.257

.708

.000007

.000002

.386

.39

10173

129

-.007

.16

.000238

.000131

-9.206

.809

10522

133

.234

.065

.000043

.000009

.486

.62

10839

135

.702

.316

.000238

.010559

.768

.24

10887

137

.304

.151

.000238

.000724

.476

.396

15189

178

.419

1.145

.000043

.000109

.773

.467

15263

179

1.093

2.758

.000001

.0000009

.325

.243

15481

185

.263

.671

.000238

.000009

.775

.31

16313

190

.099

.053

.000238

.001597

.423

.549

17207

194

.184

.087

.000043

.000989

.533

.314

17261

195

.121

.059

.000043

.000002

.283

.352

18435

199

.123

.066

.000238

.000032

.317

.337

25239

213

.151

.339

.000007

.0004

.398

.219

25468

214

.128

.251

.000043

.000001

.421

.212

26635

215

.243

.487

.000007

.0000001

.333

.192

33497

223

.184

.364

.000043

.003986

.925

.389

38578

228

.078

.201

.000238

.000024

.735

.371

38870

229

.109

.231

.000238

.000021

.539

.308

            MZ:                 The M/Z position of a feature.

            ID:                  An ID tag used for referencing.

            Gr1 Avg:        The average of the feature in group 1.

            Gr2 Avg:        The average of the feature in group 2.

Wil p:              The p-value of this feature obtained by applying a Wilcoxon rank test disregarding all correlations with other features testing the hypotheses that both groups are equal.

T p:                 The p-value of this feature obtained by applying a T-test disregarding all correlations with other features testing the hypotheses that both groups are equal assuming Gaussian distributions with equal variances.

CV1:               The coefficient of variation for values in group 1.

CV2:               The coefficient of variation for values in group 1.

Return to top.

K-table

This table summarizes leave-one-out cross validation runs using a probabilistic KNN classifier.

As a classifier we use a probabilistic version of a k-nearest neighbor classifier. The algorithm returns a probability for each possible class label.  If the difference between the highest probability label and the next one is smaller than the parameter p-diff, the spectrum is labeled as undefined, and leads to a miss in cross-validation.

PDiff: 0.12 Cancer Size: 15 Normal Size: 15

k

Errors

Misses

G1 Errors

G1 Misses

G2 Errors

G2 Misses

5

0

2

0

1

0

1

13

1

0

0

0

1

0

3

1

0

1

0

0

0

9

1

1

0

0

1

1

11

1

1

0

0

1

1

7

1

2

0

1

1

1

1

2

0

2

0

0

0

            p-diff:             The minimum required difference in class probabilities.

            K:                    The number of nearest neighbors used in the classifier.

            Errors:            The number of cross-validation errors.

            Misses:           The number of spectra labeled as undefined.

            G1 Errors:      The number of errors in group 1.

            G1 Misses:     The number of undefined spectra in group 1.

            G2 Errors:      The number of errors in group 2.

            G2 Misses:     The number of undefined spectra in group 2.

Return to top.

 

 

Graphs of features

For each pair wise classification we show graphs of each feature by plotting the group medians over the feature width together with their 25th and 75th percentile.

Features used in classifier, plots of medians: Cancer in blue, normal in red

 

Return to top.

 

 

Blind testing

In this example we were given 50 test spectra, and applied the above classifier blindly to these spectra. The user then send the real labels for each of these test spectra. The results are summarized below, showing that the above classifier performs very well on unknown samples.

File

True group

Assigned label

spectrum1-16.txt

Cancer

Cancer

spectrum1-17.txt

Cancer

Cancer

spectrum1-18.txt

Cancer

UNDEFINED

spectrum1-19.txt

Cancer

Cancer

spectrum1-20.txt

Cancer

Cancer

spectrum1-21.txt

Cancer

Cancer

spectrum1-21.txt

Cancer

UNDEFINED

spectrum1-22.txt

Cancer

UNDEFINED

spectrum1-23.txt

Cancer

Cancer

spectrum1-24.txt

Cancer

Cancer

spectrum1-25.txt

Cancer

Cancer

spectrum1-26.txt

Cancer

Cancer

spectrum1-27.txt

Cancer

Cancer

spectrum1-28.txt

Cancer

Cancer

spectrum1-29.txt

Cancer

Cancer

spectrum1-30.txt

Cancer

Cancer

spectrum1-31.txt

Cancer

Cancer

spectrum1-32.txt

Cancer

Cancer

spectrum1-33.txt

Cancer

Cancer

spectrum1-34.txt

Cancer

Cancer

spectrum1-35.txt

Cancer

Cancer

spectrum1-36.txt

Cancer

Cancer

spectrum1-37.txt

Cancer

Cancer

spectrum1-38.txt

Cancer

Cancer

spectrum1-39.txt

Cancer

Cancer

spectrum2-15.txt

Normal

Normal

spectrum2-16.txt

Normal

Normal

spectrum2-17.txt

Normal

Normal

spectrum2-18.txt

Normal

Normal

spectrum2-19.txt

Normal

Normal

spectrum2-20.txt

Normal

Normal

spectrum2-21.txt

Normal

Normal

spectrum2-21.txt

Normal

Normal

spectrum2-22.txt

Normal

Normal

spectrum2-23.txt

Normal

Normal

spectrum2-24.txt

Normal

Normal

spectrum2-25.txt

Normal

Normal

spectrum2-26.txt

Normal

Normal

spectrum2-27.txt

Normal

Normal

spectrum2-28.txt

Normal

Normal

spectrum2-29.txt

Normal

Normal

spectrum2-30.txt

Normal

Normal

spectrum2-31.txt

Normal

Normal

spectrum2-32.txt

Normal

Normal

spectrum2-33.txt

Normal

Normal

spectrum2-34.txt

Normal

Normal

spectrum2-35.txt

Normal

Normal

spectrum2-36.txt

Normal

UNDEFINED

spectrum2-37.txt

Normal

Normal

spectrum2-38.txt

Normal

Normal

Summary of the Results for the test case:

Total: errors – 0, undefined 4 (8%), correct assignment – 92%

Cancer: errors – 0, undefined 3 (12%), correct assignment – 88%

Normal: errors – 0, undefined 1 (4%), correct assignment – 96%

 

Return to top.

Biodesix has focused on applying signal processing, statistical analysis, and an in-depth understanding of the physics of mass spectrometers to build a bio-analysis platform that can be used to develop diagnostic "classifiers" for use by researchers, drug developers and clinicians. These classifiers can reproducibly, reliably, and with clinically significant accuracy, predict and evaluate normal biologic or pathogenic states or the probable pharmacological responses to a therapeutic intervention.

For more details and inquires about our Services: call 970.870.9041
or e-mail us:   info@biodesix.com
 
DOWNLOAD

Example Report





READ THIS

Click on the icon above to read our June 6th JNCI publication.