{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# An Example XPS Analysis\n",
    "\n",
    "In this example analysis, we are going to look at how some elements of PyARPES fit together. We'll use the `example_data.nano_xps` dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import arpes\n",
    "from arpes.io import example_data\n",
    "from arpes.plotting.dos import plot_core_levels\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "xps = example_data.nano_xps\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(12, 4))\n",
    "xps.sum(\"eV\").S.plot(ax=axes[0])\n",
    "plot_core_levels(xps.sum([\"x\", \"y\"]), ax=axes[1])\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Decomposition Analysis\n",
    "\n",
    "Let's look at this XPS data by performing a PCA-based decomposition across the spatial axes. This, in combination with manual exploration, will give us some sense of what's going on across the sample surface."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from arpes.analysis.decomposition import pca_along\n",
    "import xarray as xr\n",
    "\n",
    "n_components = 5\n",
    "data, pca = pca_along(xps.spectrum, [\"x\", \"y\"], n_components=n_components)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(n_components, 2, figsize=(8, 4 * n_components))\n",
    "\n",
    "for component in range(n_components):\n",
    "    data.isel(components=component).S.plot(ax=ax[component, 0])\n",
    "    ax[component, 0].set_title(f\"Component {component}\")\n",
    "\n",
    "    xr.DataArray(pca.components_[component], {\"eV\": xps.eV.values}, [\"eV\"]).plot(\n",
    "        ax=ax[component, 1]\n",
    "    )\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "Let's parse this. Each row above has a spatial map of the coefficient in the decomposition (left) and the XPS spectrum corresponding to that component (right). Very intense locations in the spatial maps correspond to places where the spectrum on the right is a good representation to the data (red) or to its negative (blue).\n",
    "\n",
    "Based on the images, we can see that the first four components (0 through 3) explain almost all the variation in the data.\n",
    "\n",
    "The first component corresponds to a $\\text{WS}_2$ 4f core level spectrum, so regions of high intensity here indicate the presence of $\\text{WS}_2$. \n",
    "\n",
    "The next component has a wide background, but it still contains the imprint of the core level peaks. What's going on here? Note that in the `Component 0` image, the area surrounding the $\\text{WS}_2$ has a negative coefficient, but if we look at the average below, corresponding to an area to the left of the $\\text{WS}_2$ patch, there are no peaks. The decomposition has learned that it needs to add some core-level like peaks back into the data in order to recover these regions. Clearly, this is not an ideal decomposition, and it points to the need to interpret your analysis. There are other decompositions we could perform which have better biases for spectroscopic data.\n",
    "\n",
    "Moving to the third component we see some internal structure on the $\\text{WS}_2$ sample. By looking at the corresponding PCA component, this appears to represent a core level shift toward lower binding energy.\n",
    "\n",
    "The fourth and final component we will interpret (`Component 3`) again explains internal structure on the $\\text{WS}_2$. Looking at the corresponding spectrum, it looks like these regions correspond to wider core levels as measured by photoemission."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "### Selecting data using the PCA decomposition\n",
    "\n",
    "Suppose we want to continue analysis using only the core levels for the $\\text{WS}_2$ region, as identified by `Component 0`. We can do this by masking and selecting the data before averaging."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "ws2_mask = data.isel(components=0) > 500\n",
    "substrate_mask = data.isel(components=1) > 1000\n",
    "\n",
    "fig, ax = plt.subplots(2, 2, figsize=(10, 10))\n",
    "ws2_mask.S.plot(ax=ax[0, 0])\n",
    "substrate_mask.S.plot(ax=ax[0, 1])\n",
    "\n",
    "xps.spectrum.where(ws2_mask).mean([\"x\", \"y\"]).S.plot(ax=ax[1, 0])\n",
    "xps.spectrum.where(substrate_mask).mean([\"x\", \"y\"]).S.plot(ax=ax[1, 1])\n",
    "\n",
    "ax[0, 0].set_title(\"Component 0 Mask\")\n",
    "ax[0, 1].set_title(\"Component 1 Mask\")\n",
    "ax[1, 0].set_title(\"Component 0 Masked XPS\")\n",
    "ax[1, 1].set_title(\"Component 1 Masked XPS\")\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "### Improving our masks\n",
    "\n",
    "We can see there's still some $\\text{WS}_2$ contamination in the `substrate_mask`. We can improve this by masking according to both of the first two components."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "ws2_mask = data.isel(components=0) > 500\n",
    "substrate_mask = (data.isel(components=1) > 1000) & ~ws2_mask\n",
    "\n",
    "fig, ax = plt.subplots(2, 2, figsize=(10, 10))\n",
    "ws2_mask.S.plot(ax=ax[0, 0])\n",
    "substrate_mask.S.plot(ax=ax[0, 1])\n",
    "\n",
    "xps.spectrum.where(ws2_mask).mean([\"x\", \"y\"]).S.plot(ax=ax[1, 0])\n",
    "xps.spectrum.where(substrate_mask).mean([\"x\", \"y\"]).S.plot(ax=ax[1, 1])\n",
    "\n",
    "ax[0, 0].set_title(\"Component 0 Mask\")\n",
    "ax[0, 1].set_title(\"Component 1 Mask\")\n",
    "ax[1, 0].set_title(\"Component 0 Masked XPS\")\n",
    "ax[1, 1].set_title(\"Component 1 Masked XPS\")\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "### Looking at the wider vs narrower peak regions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "from arpes.config import use_tex\n",
    "\n",
    "use_tex(enable=False)\n",
    "\n",
    "ws2_mask = data.isel(components=0) > 500\n",
    "wide_peak_mask = (data.isel(components=3) > 500) & ws2_mask\n",
    "\n",
    "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n",
    "wide_peak_mask.S.plot(ax=ax[0])\n",
    "\n",
    "xps.spectrum.where(ws2_mask & ~wide_peak_mask).mean([\"x\", \"y\"]).S.plot(ax=ax[1], label=\"ws2_mask\")\n",
    "xps.spectrum.where(wide_peak_mask).mean([\"x\", \"y\"]).S.plot(ax=ax[1], label=\"wide_peak_mask\")\n",
    "\n",
    "ax[1].legend()\n",
    "\n",
    "ax[0].set_title(\"Wide Peak Mask\")\n",
    "ax[1].set_title(\"XPS Comparison\")\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "These peaks look very similar, but sure enough the ones captured by `wide_peak_mask` are wider, especially to high binding energy.\n",
    "\n",
    "Let's look at refining this analysis now with some curve fitting.\n",
    "\n",
    "## Curve Fitting\n",
    "\n",
    "First off, let's get a general model working using a single XPS curve."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "from lmfit.models import GaussianModel, LinearModel\n",
    "\n",
    "test_curve = xps.spectrum.where(ws2_mask & ~wide_peak_mask).mean([\"x\", \"y\"]).sel(eV=slice(-36, -31))\n",
    "\n",
    "test_model = LinearModel() + GaussianModel(prefix=\"a_\") + GaussianModel(prefix=\"b_\")\n",
    "\n",
    "target = test_curve - test_curve.min()\n",
    "\n",
    "x = target.coords[\"eV\"].values\n",
    "roi1 = x < 45\n",
    "roi2 = x > -33.5\n",
    "\n",
    "params = GaussianModel(prefix=\"a_\").guess(target.values[roi1], target.coords[\"eV\"].values[roi1])\n",
    "params.update(\n",
    "    GaussianModel(prefix=\"b_\").guess(target.values[roi2], target.coords[\"eV\"].values[roi2])\n",
    ")\n",
    "\n",
    "\n",
    "result = target.S.modelfit(\"eV\", test_model, params=params)\n",
    "\n",
    "result.modelfit_results.item().plot()\n",
    "result.modelfit_results.item()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "This looks reasonably good, but we can improve the simplicity and quality of the model if we just remove an estimated background first. Let's see how that looks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "from lmfit.models import GaussianModel\n",
    "from arpes.analysis.shirley import remove_shirley_background\n",
    "from arpes.fits.utilities import result_to_hints\n",
    "\n",
    "mask = ws2_mask | (data.isel(components=3) > 800)\n",
    "test_curve = xps.spectrum.where(mask).mean([\"x\", \"y\"]).sel(eV=slice(-36, -31))\n",
    "test_curve = remove_shirley_background(test_curve)\n",
    "\n",
    "test_model = GaussianModel(prefix=\"a_\") + GaussianModel(prefix=\"b_\")\n",
    "x = test_curve.coords[\"eV\"].values\n",
    "roi_a = x < -33.5\n",
    "roi_b = x > -33.5\n",
    "\n",
    "params = GaussianModel(prefix=\"a_\").guess(test_curve.values[roi_a], x[roi_a])\n",
    "params.update(GaussianModel(prefix=\"b_\").guess(test_curve.values[roi_b], x[roi_b]))\n",
    "\n",
    "result = test_curve.S.modelfit(\"eV\", test_model, params=params)\n",
    "\n",
    "result.modelfit_results.item().plot()\n",
    "result.modelfit_results.item()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16",
   "metadata": {},
   "source": [
    "Looking at the residual this is quite a bit better. Now let's perform fitting across the entire $\\text{WS}_2$ region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# subtract the Shirley background, calculated independently at each (x,y)\n",
    "bkg_removed_xps = remove_shirley_background(xps.spectrum.sel(eV=slice(-36, -31)))\n",
    "\n",
    "# first mask\n",
    "masked_xps = bkg_removed_xps.where(mask)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Performs ~500 curve fits... make a selection if you don't want to wait a few seconds\n",
    "fit_results = masked_xps.fillna(0).S.modelfit(\"eV\", test_model, params=params, progress=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "### Interepting the fit quality\n",
    "\n",
    "Let's have a look at the five worst fits now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "worst = fit_results.modelfit_results.F.worst_fits().values\n",
    "\n",
    "for i in range(5):\n",
    "    fig = plt.figure(figsize=(4, 4))\n",
    "    worst[i].plot(fig=fig)\n",
    "    for ax in fig.axes:\n",
    "        ax.set_title(\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21",
   "metadata": {},
   "source": [
    "The first three fits have quality issues, because the background looks not to be appropriately captured and the peak shape is different from the rest. Three problematic fits out of several hundred is not so bad, so we can continue.\n",
    "\n",
    "For proper analysis you might want to exclude these points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(1, 2, figsize=(12, 5))\n",
    "fit_results.modelfit_results.F.p(\"a_amplitude\").T.S.plot(ax=ax[0])\n",
    "fit_results.modelfit_results.F.p(\"b_amplitude\").T.S.plot(ax=ax[1], vmin=0)\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(1, 2, figsize=(12, 5))\n",
    "fit_results.modelfit_results.F.p(\"a_center\").T.S.plot(ax=ax[0], vmin=-34.8, vmax=-34.1)\n",
    "fit_results.modelfit_results.F.p(\"b_center\").T.S.plot(ax=ax[1], vmin=-32.7, vmax=-32.05)\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(1, 2, figsize=(12, 5))\n",
    "fit_results.modelfit_results.F.p(\"a_sigma\").T.S.plot(ax=ax[0], vmax=0.22, vmin=0.14)\n",
    "fit_results.modelfit_results.F.p(\"b_sigma\").T.S.plot(ax=ax[1], vmax=0.22, vmin=0.14)\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(\n",
    "    fit_results.modelfit_results.F.p(\"a_amplitude\").values.ravel(),\n",
    "    fit_results.modelfit_results.F.p(\"b_amplitude\").values.ravel(),\n",
    "    color=(0, 0, 0, 0.05),\n",
    ")\n",
    "plt.gca().set_aspect(1)\n",
    "plt.gca().set_xlabel(\"Lower Peak Intensity\")\n",
    "plt.gca().set_ylabel(\"Upper Peak Intensity\")\n",
    "plt.gca().set_xlim([0, 1600])\n",
    "plt.gca().set_ylim([0, 2400])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(\n",
    "    fit_results.modelfit_results.F.p(\"a_center\").values.ravel(),\n",
    "    fit_results.modelfit_results.F.p(\"b_center\").values.ravel(),\n",
    "    color=(0, 0, 0, 0.05),\n",
    ")\n",
    "plt.gca().set_aspect(1)\n",
    "plt.gca().set_xlabel(\"Lower Peak BE\")\n",
    "plt.gca().set_ylabel(\"Upper Peak BE\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(\n",
    "    fit_results.modelfit_results.F.p(\"a_center\").values.ravel(),\n",
    "    fit_results.modelfit_results.F.p(\"a_sigma\").values.ravel(),\n",
    "    color=(0, 0, 0, 0.1),\n",
    ")\n",
    "plt.gca().set_aspect(1)\n",
    "plt.gca().set_xlabel(\"Lower Peak BE\")\n",
    "plt.gca().set_ylabel(\"Upper Peak Width\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28",
   "metadata": {},
   "source": [
    "The two cohorts in each of the above plots are the piece of $\\text{WS}_2$ on substrate and on plain Si respectively, confirming earlier expectations from the PCA analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "1. Perform a selection of the data according to the fitting values and plot the corresponding mask. What does the collection of points for which **Lower Peak BE** > -34.4eV look like?\n",
    "2. Perform a selection of the data for the decompositions above and plot their corresponding average XPS curves. How do these results compare to the PCA results we found before?\n",
    "3. What might you conclude about the sample and experimental conditions given the extracted peak width map?"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "py312",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}