import timsimaging
# enable visualization in the Jupyter notebook
from bokeh.io import show, output_notebook
output_notebook()
# disable FutureWarning
import warnings
warnings.filterwarnings('ignore')2 Data processing in TIMSImaging and rank-sum test
2.1 Introcution
In this part, we process the MALDI-TIMS-MS1 dataset of bacterial-fungal co-culture, then find features with high intensity in the microbioal region, and export the processed data in the open imzML format.
2.2 Load MALDI-TIMS-TOF raw data
bruker_d_folder_name = r"D:\dataset\Laura_Gordon\250321_JB182_Pen12.d"
dataset = timsimaging.spectrum.MSIDataset(bruker_d_folder_name)
datasetMSIDataset with 12173 pixels
mz range: 99.999-1100.005
mobility range: 0.400-1.800
2.3 Understanding experiment setting
As the TIC image shows, there are 4 regions: G.arilaitensis + P.solitum co-culuture(top), P.solitum(bottom left), G.arilaitensis(bottom middle) and the media/matrix(bottom right)
dataset.image()2.4 Peak processing
The first step is to extract features. Due to the heterogeneity of regions, we set sampling_ratio to 1 so that all pixels are used for mean spectrum calculation.
results = dataset.process(sampling_ratio=1, frequency_threshold=0.05, intensity_threshold=0.001, tolerance=3, window_size=[30, 7], visualize=True)Computing mean spectrum...
Traversing graph...
Finding local maxima...
Summarizing...
Here we get 784 features in total, each of them corresponds to an ion image.
table, _ = timsimaging.plotting.feature_list(results["peak_list"])show(table)show(results["viz"])2.5 Differential analysis using Wilcoxon rank-sum test
For precursor targets in following MS2 experiments, we want to select ions relevant to cell culture and exclude matrix ions. Specifically, a desired precursor should be associate with the microbioal culture region, with minimum intensity in the matrix region. We referred the paper below, which uses a non-spatial Wilcoxon rank-sum test.
Rauser, S., C. Marquardt, B. Balluff, S.-O. Deininger, C. Albers, E. Belau, R. Hartmer, D. Suckau, K. Specht, M. P. Ebert, M. Schmitt, M. Aubele, H. Höfler and A. Walch (2010). “Classification of HER2 Receptor Status in Breast Cancer Tissues by MALDI Imaging Mass Spectrometry.” Journal of Proteome Research 9(4): 1854-1863.
import numpy as np
from scipy.stats import mannwhitneyuintensity_array = results["intensity_array"]Here we use the same ROI setting as the original paper: the media/matrix region as group 1, while all other region as group 2, treat pixels as samples and test if there is a signifcant intensity difference between two groups.
dataset.set_ROI("matrix", xmin=200, ymin=100)
matrix = intensity_array.loc[dataset.rois["matrix"]]
cultures = intensity_array.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]Apply RMS normalization to each group before the test:
# RMS normalization
rms = np.sqrt(np.mean(np.square(intensity_array), axis=1))
intensity_array_norm = intensity_array.div(rms, axis=0)
matrix = intensity_array_norm.loc[dataset.rois["matrix"]]
cultures = intensity_array_norm.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]Compute the statistics and fold change.
stat, p = mannwhitneyu(cultures, matrix)
stats = results["peak_list"].copy()
stats["culture_mean"] = np.mean(cultures, axis=0).to_numpy()
stats["matrix_mean"] = np.mean(matrix, axis=0).to_numpy()
stats["log2foldchange"] = np.log2(np.mean(cultures, axis=0)/np.mean(matrix, axis=0)).to_numpy()
stats["neg_log10_pvalue"] = -np.log10(p)
stats| mz_values | mobility_values | total_intensity | culture_mean | matrix_mean | log2foldchange | neg_log10_pvalue | |
|---|---|---|---|---|---|---|---|
| 26 | 172.040229 | 0.627523 | 2306.307730 | 0.059480 | 0.068297 | -0.199415 | 3.851708 |
| 41 | 187.055019 | 0.614947 | 1350.618829 | 0.042400 | 0.027598 | 0.619482 | 13.634440 |
| 45 | 189.070660 | 0.618844 | 3353.983734 | 0.101712 | 0.074469 | 0.449772 | 13.680900 |
| 47 | 190.050757 | 0.964792 | 884.310605 | 0.025086 | 0.027849 | -0.150782 | 4.633008 |
| 75 | 201.059263 | 0.651327 | 1019.046168 | 0.033806 | 0.020592 | 0.715197 | 15.615291 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 6602 | 1079.112242 | 1.511486 | 4353.308634 | 0.138639 | 0.126743 | 0.129429 | 1.366319 |
| 6614 | 1088.066520 | 1.557763 | 1397.371889 | 0.044244 | 0.015218 | 1.539681 | 62.449167 |
| 6618 | 1088.067282 | 1.464957 | 1019.910129 | 0.033066 | 0.011269 | 1.553035 | 51.979610 |
| 6623 | 1089.070123 | 1.484550 | 835.687998 | 0.026375 | 0.009408 | 1.487155 | 49.901634 |
| 6636 | 1094.083050 | 1.511502 | 3821.645773 | 0.119431 | 0.066028 | 0.855034 | 45.896223 |
784 rows × 7 columns
2.6 Visualize results in a volcano plot
from bokeh.plotting import figure,show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.transform import factor_cmapNow we can make a volcano plot. The features to be selected is on the top right, that present high signal-to-noise ratio and high significance score.
f = figure(
title="Differential abundance",
match_aspect=True,
toolbar_location="right",
x_axis_label="log2foldchange",
y_axis_label="neg_log10_pvalue",
x_range=(-10,10)
)
source = ColumnDataSource(stats)
volcano = f.scatter(x="log2foldchange",
y="neg_log10_pvalue",
source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
("m/z", "@mz_values{0.0000}"),
("1/K0", "@mobility_values{0.0000}"),
("index", "$index"),
],)
f.add_tools(hover)
show(f)2.7 Corroborate differential features in literature
Here is the features reported in the literature, using the “find discriminating” funtion in SCiLS Lab. We retrieve them in the feature list and look where they are in the volcano plot.
target = np.array([
417.2348,
425.2625,
428.2594,
475.2532,
532.3086,
588.3737,
615.3198,
627.3479,
655.2726,
])All those features were detected by TIMSImaging, with minimal m/z differences.
indices = []
for m in target:
mz_tol = m * 50 * 1e-6
index = np.nonzero(stats['mz_values'].between(m-mz_tol, m+mz_tol))[0]
indices.append(index[0])
stats.iloc[indices]| mz_values | mobility_values | total_intensity | culture_mean | matrix_mean | log2foldchange | neg_log10_pvalue | |
|---|---|---|---|---|---|---|---|
| 1757 | 417.245748 | 0.948254 | 2343.955229 | 0.079047 | 0.000924 | 6.418199 | 152.039042 |
| 1860 | 425.261619 | 0.975195 | 3731.217038 | 0.117055 | 0.002412 | 5.600804 | 129.732154 |
| 1900 | 428.261729 | 0.960216 | 3326.308223 | 0.111457 | 0.001042 | 6.741408 | 172.427855 |
| 2509 | 475.254336 | 1.006788 | 2615.506695 | 0.085422 | 0.002050 | 5.381051 | 154.746667 |
| 3280 | 532.309080 | 1.081939 | 1673.549659 | 0.056178 | 0.004467 | 3.652642 | 102.566166 |
| 4028 | 588.373038 | 1.151386 | 1445.407788 | 0.048771 | 0.001496 | 5.026967 | 95.400457 |
| 4378 | 615.320932 | 1.144504 | 1024.922287 | 0.036958 | 0.001301 | 4.828456 | 115.224511 |
| 4544 | 627.353146 | 1.177115 | 958.086092 | 0.031236 | 0.001508 | 4.372415 | 132.199016 |
| 4843 | 655.273642 | 1.272451 | 2910.554342 | 0.091324 | 0.002353 | 5.278412 | 140.796247 |
stats["target"] = "No"
stats["target"].iloc[indices] = "Yes"Now we can view the position of these precursors in the volcano plot:
#func = lambda df: (df.log2foldchange>4)&(df.neg_log10_pvalue>50)&(df.matrix_mean<1000)
#source.add(np.where(func(stats), "Orange", "Steelblue"), name="color")
source = ColumnDataSource(stats)
a = figure(
title="Differential abundance",
match_aspect=True,
toolbar_location="right",
x_axis_label="log2foldchange",
y_axis_label="neg_log10_pvalue",
x_range=(-10,10)
)
volc = a.scatter(x="log2foldchange",
y="neg_log10_pvalue",
color = factor_cmap("target", ["Steelblue", "Orange"], ["No", "Yes"]),
#color = 'color',
source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
("m/z", "@mz_values{0.0000}"),
("1/K0", "@mobility_values{0.0000}"),
("index", "$index"),
],)
a.add_tools(hover)
show(a)Similar to the internal function in SCiLS, Wilcoxon rank-sum test outputs significant p values for these features. Deisotoping and applying an intensity filtration would result a more concrete precursor candidate list.
2.8 Export the processed data in imzML format
Finally, export processed data for further differential analysis in R.
timsimaging.spectrum.export_imzML(dataset, path=r"D:\dataset\laura_gordon", peaks=results)