2 Data processing in TIMSImaging and rank-sum test

2.1 Introcution

In this part, we process the MALDI-TIMS-MS1 dataset of bacterial-fungal co-culture, then find features with high intensity in the microbioal region, and export the processed data in the open imzML format.

import timsimaging

# enable visualization in the Jupyter notebook
from bokeh.io import show, output_notebook
output_notebook()
# disable FutureWarning
import warnings
warnings.filterwarnings('ignore')

Loading BokehJS ...

2.2 Load MALDI-TIMS-TOF raw data

bruker_d_folder_name = r"D:\dataset\Laura_Gordon\250321_JB182_Pen12.d"
dataset = timsimaging.spectrum.MSIDataset(bruker_d_folder_name)
dataset

MSIDataset with 12173 pixels
        mz range: 99.999-1100.005
        mobility range: 0.400-1.800

2.3 Understanding experiment setting

As the TIC image shows, there are 4 regions: G.arilaitensis + P.solitum co-culuture(top), P.solitum(bottom left), G.arilaitensis(bottom middle) and the media/matrix(bottom right)

dataset.image()

2.4 Peak processing

The first step is to extract features. Due to the heterogeneity of regions, we set sampling_ratio to 1 so that all pixels are used for mean spectrum calculation.

results = dataset.process(sampling_ratio=1, frequency_threshold=0.05, intensity_threshold=0.001, tolerance=3, window_size=[30, 7], visualize=True)

Computing mean spectrum...
Traversing graph...
Finding local maxima...
Summarizing...

Here we get 784 features in total, each of them corresponds to an ion image.

table, _ = timsimaging.plotting.feature_list(results["peak_list"])

show(table)

show(results["viz"])

2.5 Differential analysis using Wilcoxon rank-sum test

For precursor targets in following MS2 experiments, we want to select ions relevant to cell culture and exclude matrix ions. Specifically, a desired precursor should be associate with the microbioal culture region, with minimum intensity in the matrix region. We referred the paper below, which uses a non-spatial Wilcoxon rank-sum test.
Rauser, S., C. Marquardt, B. Balluff, S.-O. Deininger, C. Albers, E. Belau, R. Hartmer, D. Suckau, K. Specht, M. P. Ebert, M. Schmitt, M. Aubele, H. Höfler and A. Walch (2010). “Classification of HER2 Receptor Status in Breast Cancer Tissues by MALDI Imaging Mass Spectrometry.” Journal of Proteome Research 9(4): 1854-1863.

import numpy as np
from scipy.stats import mannwhitneyu

intensity_array = results["intensity_array"]

Here we use the same ROI setting as the original paper: the media/matrix region as group 1, while all other region as group 2, treat pixels as samples and test if there is a signifcant intensity difference between two groups.

dataset.set_ROI("matrix", xmin=200, ymin=100)
matrix = intensity_array.loc[dataset.rois["matrix"]]
cultures = intensity_array.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]

Apply RMS normalization to each group before the test:

# RMS normalization
rms = np.sqrt(np.mean(np.square(intensity_array), axis=1))
intensity_array_norm = intensity_array.div(rms, axis=0)

matrix = intensity_array_norm.loc[dataset.rois["matrix"]]
cultures = intensity_array_norm.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]

Compute the statistics and fold change.

stat, p = mannwhitneyu(cultures, matrix)
stats = results["peak_list"].copy()
stats["culture_mean"] = np.mean(cultures, axis=0).to_numpy()
stats["matrix_mean"] = np.mean(matrix, axis=0).to_numpy()
stats["log2foldchange"] = np.log2(np.mean(cultures, axis=0)/np.mean(matrix, axis=0)).to_numpy()
stats["neg_log10_pvalue"] = -np.log10(p)
stats

	mz_values	mobility_values	total_intensity	culture_mean	matrix_mean	log2foldchange	neg_log10_pvalue
26	172.040229	0.627523	2306.307730	0.059480	0.068297	-0.199415	3.851708
41	187.055019	0.614947	1350.618829	0.042400	0.027598	0.619482	13.634440
45	189.070660	0.618844	3353.983734	0.101712	0.074469	0.449772	13.680900
47	190.050757	0.964792	884.310605	0.025086	0.027849	-0.150782	4.633008
75	201.059263	0.651327	1019.046168	0.033806	0.020592	0.715197	15.615291
...	...	...	...	...	...	...	...
6602	1079.112242	1.511486	4353.308634	0.138639	0.126743	0.129429	1.366319
6614	1088.066520	1.557763	1397.371889	0.044244	0.015218	1.539681	62.449167
6618	1088.067282	1.464957	1019.910129	0.033066	0.011269	1.553035	51.979610
6623	1089.070123	1.484550	835.687998	0.026375	0.009408	1.487155	49.901634
6636	1094.083050	1.511502	3821.645773	0.119431	0.066028	0.855034	45.896223

784 rows × 7 columns

2.6 Visualize results in a volcano plot

from bokeh.plotting import figure,show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.transform import factor_cmap

Now we can make a volcano plot. The features to be selected is on the top right, that present high signal-to-noise ratio and high significance score.

f = figure(
    title="Differential abundance",
    match_aspect=True,
    toolbar_location="right",
    x_axis_label="log2foldchange",
    y_axis_label="neg_log10_pvalue",
    x_range=(-10,10)
)
source = ColumnDataSource(stats)
volcano = f.scatter(x="log2foldchange",
          y="neg_log10_pvalue",
          source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
            ("m/z", "@mz_values{0.0000}"),
            ("1/K0", "@mobility_values{0.0000}"),
            ("index", "$index"),
        ],)
f.add_tools(hover)
show(f)

2.7 Corroborate differential features in literature

Here is the features reported in the literature, using the “find discriminating” funtion in SCiLS Lab. We retrieve them in the feature list and look where they are in the volcano plot.

target = np.array([
    417.2348,
    425.2625,
    428.2594,
    475.2532,
    532.3086,
    588.3737,
    615.3198,
    627.3479,
    655.2726,
])

All those features were detected by TIMSImaging, with minimal m/z differences.

indices = []
for m in target:
    mz_tol = m * 50 * 1e-6
    index = np.nonzero(stats['mz_values'].between(m-mz_tol, m+mz_tol))[0]
    indices.append(index[0])
stats.iloc[indices]

	mz_values	mobility_values	total_intensity	culture_mean	matrix_mean	log2foldchange	neg_log10_pvalue
1757	417.245748	0.948254	2343.955229	0.079047	0.000924	6.418199	152.039042
1860	425.261619	0.975195	3731.217038	0.117055	0.002412	5.600804	129.732154
1900	428.261729	0.960216	3326.308223	0.111457	0.001042	6.741408	172.427855
2509	475.254336	1.006788	2615.506695	0.085422	0.002050	5.381051	154.746667
3280	532.309080	1.081939	1673.549659	0.056178	0.004467	3.652642	102.566166
4028	588.373038	1.151386	1445.407788	0.048771	0.001496	5.026967	95.400457
4378	615.320932	1.144504	1024.922287	0.036958	0.001301	4.828456	115.224511
4544	627.353146	1.177115	958.086092	0.031236	0.001508	4.372415	132.199016
4843	655.273642	1.272451	2910.554342	0.091324	0.002353	5.278412	140.796247

stats["target"] = "No"
stats["target"].iloc[indices] = "Yes"

Now we can view the position of these precursors in the volcano plot:

#func = lambda df: (df.log2foldchange>4)&(df.neg_log10_pvalue>50)&(df.matrix_mean<1000)
#source.add(np.where(func(stats), "Orange", "Steelblue"), name="color")
source = ColumnDataSource(stats)
a = figure(
    title="Differential abundance",
    match_aspect=True,
    toolbar_location="right",
    x_axis_label="log2foldchange",
    y_axis_label="neg_log10_pvalue",
    x_range=(-10,10)
)
volc = a.scatter(x="log2foldchange",
          y="neg_log10_pvalue",
                 
          color = factor_cmap("target", ["Steelblue", "Orange"], ["No", "Yes"]),
          #color = 'color',
          source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
            ("m/z", "@mz_values{0.0000}"),
            ("1/K0", "@mobility_values{0.0000}"),
            ("index", "$index"),
        ],)
a.add_tools(hover)
show(a)

Similar to the internal function in SCiLS, Wilcoxon rank-sum test outputs significant p values for these features. Deisotoping and applying an intensity filtration would result a more concrete precursor candidate list.

2.8 Export the processed data in imzML format

Finally, export processed data for further differential analysis in R.

timsimaging.spectrum.export_imzML(dataset, path=r"D:\dataset\laura_gordon", peaks=results)