2  Data processing in TIMSImaging and rank-sum test

2.1 Introcution

In this part, we process the MALDI-TIMS-MS1 dataset of bacterial-fungal co-culture, then find features with high intensity in the microbioal region, and export the processed data in the open imzML format.

import timsimaging

# enable visualization in the Jupyter notebook
from bokeh.io import show, output_notebook
output_notebook()
# disable FutureWarning
import warnings
warnings.filterwarnings('ignore')
Loading BokehJS ...

2.2 Load MALDI-TIMS-TOF raw data

bruker_d_folder_name = r"D:\dataset\Laura_Gordon\250321_JB182_Pen12.d"
dataset = timsimaging.spectrum.MSIDataset(bruker_d_folder_name)
dataset
MSIDataset with 12173 pixels
        mz range: 99.999-1100.005
        mobility range: 0.400-1.800
        

2.3 Understanding experiment setting

As the TIC image shows, there are 4 regions: G.arilaitensis + P.solitum co-culuture(top), P.solitum(bottom left), G.arilaitensis(bottom middle) and the media/matrix(bottom right)

dataset.image()

2.4 Peak processing

The first step is to extract features. Due to the heterogeneity of regions, we set sampling_ratio to 1 so that all pixels are used for mean spectrum calculation.

results = dataset.process(sampling_ratio=1, frequency_threshold=0.05, intensity_threshold=0.001, tolerance=3, window_size=[30, 7], visualize=True)
Computing mean spectrum...
Traversing graph...
Finding local maxima...
Summarizing...

Here we get 784 features in total, each of them corresponds to an ion image.

table, _ = timsimaging.plotting.feature_list(results["peak_list"])
show(table)
show(results["viz"])

2.5 Differential analysis using Wilcoxon rank-sum test

For precursor targets in following MS2 experiments, we want to select ions relevant to cell culture and exclude matrix ions. Specifically, a desired precursor should be associate with the microbioal culture region, with minimum intensity in the matrix region. We referred the paper below, which uses a non-spatial Wilcoxon rank-sum test.
Rauser, S., C. Marquardt, B. Balluff, S.-O. Deininger, C. Albers, E. Belau, R. Hartmer, D. Suckau, K. Specht, M. P. Ebert, M. Schmitt, M. Aubele, H. Höfler and A. Walch (2010). “Classification of HER2 Receptor Status in Breast Cancer Tissues by MALDI Imaging Mass Spectrometry.” Journal of Proteome Research 9(4): 1854-1863.

import numpy as np
from scipy.stats import mannwhitneyu
intensity_array = results["intensity_array"]

Here we use the same ROI setting as the original paper: the media/matrix region as group 1, while all other region as group 2, treat pixels as samples and test if there is a signifcant intensity difference between two groups.

dataset.set_ROI("matrix", xmin=200, ymin=100)
matrix = intensity_array.loc[dataset.rois["matrix"]]
cultures = intensity_array.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]

Apply RMS normalization to each group before the test:

# RMS normalization
rms = np.sqrt(np.mean(np.square(intensity_array), axis=1))
intensity_array_norm = intensity_array.div(rms, axis=0)

matrix = intensity_array_norm.loc[dataset.rois["matrix"]]
cultures = intensity_array_norm.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]

Compute the statistics and fold change.

stat, p = mannwhitneyu(cultures, matrix)
stats = results["peak_list"].copy()
stats["culture_mean"] = np.mean(cultures, axis=0).to_numpy()
stats["matrix_mean"] = np.mean(matrix, axis=0).to_numpy()
stats["log2foldchange"] = np.log2(np.mean(cultures, axis=0)/np.mean(matrix, axis=0)).to_numpy()
stats["neg_log10_pvalue"] = -np.log10(p)
stats
mz_values mobility_values total_intensity culture_mean matrix_mean log2foldchange neg_log10_pvalue
26 172.040229 0.627523 2306.307730 0.059480 0.068297 -0.199415 3.851708
41 187.055019 0.614947 1350.618829 0.042400 0.027598 0.619482 13.634440
45 189.070660 0.618844 3353.983734 0.101712 0.074469 0.449772 13.680900
47 190.050757 0.964792 884.310605 0.025086 0.027849 -0.150782 4.633008
75 201.059263 0.651327 1019.046168 0.033806 0.020592 0.715197 15.615291
... ... ... ... ... ... ... ...
6602 1079.112242 1.511486 4353.308634 0.138639 0.126743 0.129429 1.366319
6614 1088.066520 1.557763 1397.371889 0.044244 0.015218 1.539681 62.449167
6618 1088.067282 1.464957 1019.910129 0.033066 0.011269 1.553035 51.979610
6623 1089.070123 1.484550 835.687998 0.026375 0.009408 1.487155 49.901634
6636 1094.083050 1.511502 3821.645773 0.119431 0.066028 0.855034 45.896223

784 rows × 7 columns

2.6 Visualize results in a volcano plot

from bokeh.plotting import figure,show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.transform import factor_cmap

Now we can make a volcano plot. The features to be selected is on the top right, that present high signal-to-noise ratio and high significance score.

f = figure(
    title="Differential abundance",
    match_aspect=True,
    toolbar_location="right",
    x_axis_label="log2foldchange",
    y_axis_label="neg_log10_pvalue",
    x_range=(-10,10)
)
source = ColumnDataSource(stats)
volcano = f.scatter(x="log2foldchange",
          y="neg_log10_pvalue",
          source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
            ("m/z", "@mz_values{0.0000}"),
            ("1/K0", "@mobility_values{0.0000}"),
            ("index", "$index"),
        ],)
f.add_tools(hover)
show(f)

2.7 Corroborate differential features in literature

Here is the features reported in the literature, using the “find discriminating” funtion in SCiLS Lab. We retrieve them in the feature list and look where they are in the volcano plot.

target = np.array([
    417.2348,
    425.2625,
    428.2594,
    475.2532,
    532.3086,
    588.3737,
    615.3198,
    627.3479,
    655.2726,
])

All those features were detected by TIMSImaging, with minimal m/z differences.

indices = []
for m in target:
    mz_tol = m * 50 * 1e-6
    index = np.nonzero(stats['mz_values'].between(m-mz_tol, m+mz_tol))[0]
    indices.append(index[0])
stats.iloc[indices]
mz_values mobility_values total_intensity culture_mean matrix_mean log2foldchange neg_log10_pvalue
1757 417.245748 0.948254 2343.955229 0.079047 0.000924 6.418199 152.039042
1860 425.261619 0.975195 3731.217038 0.117055 0.002412 5.600804 129.732154
1900 428.261729 0.960216 3326.308223 0.111457 0.001042 6.741408 172.427855
2509 475.254336 1.006788 2615.506695 0.085422 0.002050 5.381051 154.746667
3280 532.309080 1.081939 1673.549659 0.056178 0.004467 3.652642 102.566166
4028 588.373038 1.151386 1445.407788 0.048771 0.001496 5.026967 95.400457
4378 615.320932 1.144504 1024.922287 0.036958 0.001301 4.828456 115.224511
4544 627.353146 1.177115 958.086092 0.031236 0.001508 4.372415 132.199016
4843 655.273642 1.272451 2910.554342 0.091324 0.002353 5.278412 140.796247
stats["target"] = "No"
stats["target"].iloc[indices] = "Yes"

Now we can view the position of these precursors in the volcano plot:

#func = lambda df: (df.log2foldchange>4)&(df.neg_log10_pvalue>50)&(df.matrix_mean<1000)
#source.add(np.where(func(stats), "Orange", "Steelblue"), name="color")
source = ColumnDataSource(stats)
a = figure(
    title="Differential abundance",
    match_aspect=True,
    toolbar_location="right",
    x_axis_label="log2foldchange",
    y_axis_label="neg_log10_pvalue",
    x_range=(-10,10)
)
volc = a.scatter(x="log2foldchange",
          y="neg_log10_pvalue",
                 
          color = factor_cmap("target", ["Steelblue", "Orange"], ["No", "Yes"]),
          #color = 'color',
          source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
            ("m/z", "@mz_values{0.0000}"),
            ("1/K0", "@mobility_values{0.0000}"),
            ("index", "$index"),
        ],)
a.add_tools(hover)
show(a)

Similar to the internal function in SCiLS, Wilcoxon rank-sum test outputs significant p values for these features. Deisotoping and applying an intensity filtration would result a more concrete precursor candidate list.

2.8 Export the processed data in imzML format

Finally, export processed data for further differential analysis in R.

timsimaging.spectrum.export_imzML(dataset, path=r"D:\dataset\laura_gordon", peaks=results)