Performances of 2D integration vs 1D integration#

This is dependant on:

  • Number of azimuthal bins

  • Pixel splitting

  • Algorithm

  • Implementation (i.e. programming language)

  • Hardware used

Thus there is no general answer. But here is a quick benchmark to evaluate the penality on performances:

[1]:
import sys
import os
import time
import numpy
import fabio
import pyFAI
from pyFAI.test.utilstest import UtilsTest
import pyFAI.method_registry
import pyFAI.integrator.azimuthal
print(f"Python version: {sys.version}")
print(f"PyFAI version: {pyFAI.version}")
start_time = time.perf_counter()
Python version: 3.13.1 | packaged by conda-forge | (main, Jan 13 2025, 09:53:10) [GCC 13.3.0]
PyFAI version: 2025.4.0-dev0
[3]:
print(len(pyFAI.method_registry.IntegrationMethod.list_available()))
81
[4]:
ai = pyFAI.load(UtilsTest.getimage("Pilatus1M.poni"))
img = fabio.open(UtilsTest.getimage("Pilatus1M.edf")).data
ai
[4]:
Detector Pilatus 1M      PixelSize= 172µm, 172µm         BottomRight (3)
Wavelength= 1.000000e-10 m
SampleDetDist= 1.583231e+00 m   PONI= 3.341702e-02, 4.122778e-02 m      rot1=0.006487  rot2=0.007558  rot3=0.000000 rad
DirectBeamDist= 1583.310 mm     Center: x=179.981, y=263.859 pix        Tilt= 0.571° tiltPlanRotation= 130.640° 𝛌= 1.000Å
[5]:
%%time
#Tune those parameters to match your needs:
kw1 = {"data": img, "npt":1000}
kw2 = {"data": img, "npt_rad":1000}
#Actual benchmark:
res = {}
for k,v in pyFAI.method_registry.IntegrationMethod._registry.items():
    print(k)
    if k.dim == 1:
        res[k] = %timeit -o ai.integrate1d(method=v, **kw1)
    else:
        res[k] = %timeit -o ai.integrate2d(method=v, **kw2)
Method(dim=1, split='no', algo='histogram', impl='python', target=None)
32.2 ms ± 153 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='no', algo='histogram', impl='python', target=None)
99.7 ms ± 314 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=1, split='no', algo='histogram', impl='cython', target=None)
12.4 ms ± 17.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='histogram', impl='cython', target=None)
21.1 ms ± 159 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=1, split='bbox', algo='histogram', impl='cython', target=None)
26.9 ms ± 41.4 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='bbox', algo='histogram', impl='cython', target=None)
35.7 ms ± 208 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=1, split='full', algo='histogram', impl='cython', target=None)
152 ms ± 639 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='full', algo='histogram', impl='cython', target=None)
215 ms ± 104 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='pseudo', algo='histogram', impl='cython', target=None)
370 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='cython', target=None)
7.6 ms ± 455 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csr', impl='cython', target=None)
8.34 ms ± 1e+03 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='cython', target=None)
8.23 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csr', impl='cython', target=None)
8.48 ms ± 399 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='csr', impl='python', target=None)
10.2 ms ± 21.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csr', impl='python', target=None)
15.1 ms ± 28 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='python', target=None)
13.4 ms ± 20.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csr', impl='python', target=None)
18.2 ms ± 29.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='csc', impl='cython', target=None)
7.98 ms ± 14.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csc', impl='cython', target=None)
10.5 ms ± 59.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csc', impl='cython', target=None)
10.6 ms ± 9.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csc', impl='cython', target=None)
14.8 ms ± 39.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='csc', impl='python', target=None)
11.5 ms ± 251 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csc', impl='python', target=None)
14.6 ms ± 122 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csc', impl='python', target=None)
14.9 ms ± 23.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csc', impl='python', target=None)
21.9 ms ± 50.3 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='cython', target=None)
7.47 ms ± 482 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='cython', target=None)
11.8 ms ± 224 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='cython', target=None)
8.75 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='lut', impl='cython', target=None)
7.67 ms ± 341 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='full', algo='lut', impl='cython', target=None)
7.89 ms ± 474 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='cython', target=None)
12.6 ms ± 674 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='cython', target=None)
8.12 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csr', impl='cython', target=None)
8.72 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='python', target=None)
12.6 ms ± 34.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csr', impl='python', target=None)
17.1 ms ± 73.3 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csc', impl='cython', target=None)
10.4 ms ± 89.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csc', impl='cython', target=None)
14.4 ms ± 99.1 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csc', impl='python', target=None)
14.9 ms ± 238 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csc', impl='python', target=None)
22.7 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(0, 0))
9.88 ms ± 166 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(0, 0))
2.68 ms ± 41.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(0, 1))
12.6 ms ± 206 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(0, 1))
4.14 ms ± 38.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(1, 0))
/users/kieffer/.venv/py313/lib/python3.13/site-packages/pyopencl/cache.py:420: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  prg.build(options_bytes, [devices[i] for i in to_be_built_indices])
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
11.1 ms ± 497 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(1, 0))
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
7.08 ms ± 698 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(0, 0))
661 μs ± 837 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(0, 0))
2.59 ms ± 59.1 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(0, 0))
621 μs ± 3.02 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(0, 0))
2.55 ms ± 1.57 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(0, 1))
1.2 ms ± 29.7 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(0, 1))
6.09 ms ± 39.1 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(0, 1))
1.04 ms ± 295 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(0, 1))
6.01 ms ± 23.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(1, 0))
2.78 ms ± 101 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(1, 0))
82 ms ± 193 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(1, 0))
/users/kieffer/.venv/py313/lib/python3.13/site-packages/pyopencl/cache.py:496: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  _create_built_program_from_source_cached(
/users/kieffer/.venv/py313/lib/python3.13/site-packages/pyopencl/cache.py:500: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  prg.build(options_bytes, devices)
2.47 ms ± 366 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(1, 0))
81.4 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(0, 0))
663 μs ± 1.44 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(0, 0))
2.59 ms ± 78 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(0, 1))
1.18 ms ± 986 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(0, 1))
6.1 ms ± 35.6 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(1, 0))
2.68 ms ± 122 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(1, 0))
82.5 ms ± 80 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(0, 0))
3.32 ms ± 50.7 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(0, 0))
300 ms ± 4.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(0, 0))
1.66 ms ± 64.2 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(0, 0))
180 ms ± 5.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(0, 1))
3.12 ms ± 27.2 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(0, 1))
298 ms ± 5.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(0, 1))
1.8 ms ± 26.9 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(0, 1))
178 ms ± 665 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(1, 0))
3.5 ms ± 293 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(1, 0))
205 ms ± 567 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(1, 0))
2.72 ms ± 166 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(1, 0))
175 ms ± 816 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(0, 0))
2.71 ms ± 53.5 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(0, 0))
293 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(0, 1))
2.75 ms ± 149 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(0, 1))
294 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(1, 0))
3.81 ms ± 546 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(1, 0))
207 ms ± 901 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
CPU times: user 1h 2min 12s, sys: 16.2 s, total: 1h 2min 29s
Wall time: 5min 40s
[10]:
print("-"*80)
print(f"{'Split':5s} | {'Algo':9s} | {'Impl':6s}| {'1d (ms)':8s} | {'2d (ms)':8s} | {'ratio':6s} | Device")
print("-"*80)
for k in res:
    if k.dim == 1:
        k1 = k
        k2 = k._replace(dim=2)
        if k2 in res:
            print(f"{k1.split:5s} | {k1.algo:9s} | {k1.impl:6s}| {res[k1].best*1000:8.3f} | {res[k2].best*1000:8.3f} | {res[k2].best/res[k1].best:6.1f} | ",
                    end="")
        if k.target:
            print(pyFAI.method_registry.IntegrationMethod._registry.get(k).target_name)
        else:
            print()
print("-"*80)
--------------------------------------------------------------------------------
Split | Algo      | Impl  | 1d (ms)  | 2d (ms)  | ratio  | Device
--------------------------------------------------------------------------------
no    | histogram | python|   31.905 |   99.215 |    3.1 |
no    | histogram | cython|   12.397 |   20.965 |    1.7 |
bbox  | histogram | cython|   26.785 |   35.607 |    1.3 |
full  | histogram | cython|  150.790 |  214.578 |    1.4 |
no    | csr       | cython|    7.120 |    7.573 |    1.1 |
bbox  | csr       | cython|    7.213 |    8.136 |    1.1 |
no    | csr       | python|   10.184 |   15.043 |    1.5 |
bbox  | csr       | python|   13.421 |   18.183 |    1.4 |
no    | csc       | cython|    7.973 |   10.450 |    1.3 |
bbox  | csc       | cython|   10.585 |   14.801 |    1.4 |
no    | csc       | python|   11.299 |   14.427 |    1.3 |
bbox  | csc       | python|   14.845 |   21.879 |    1.5 |
bbox  | lut       | cython|    6.978 |   11.554 |    1.7 |
no    | lut       | cython|    6.939 |    7.410 |    1.1 |
full  | lut       | cython|    7.070 |   11.674 |    1.7 |
full  | csr       | cython|    7.035 |    8.100 |    1.2 |
full  | csr       | python|   12.507 |   16.947 |    1.4 |
full  | csc       | cython|   10.292 |   14.216 |    1.4 |
full  | csc       | python|   14.767 |   21.966 |    1.5 |
no    | histogram | opencl|    9.648 |    2.646 |    0.3 | NVIDIA CUDA / NVIDIA RTX A5000
no    | histogram | opencl|   12.399 |    4.108 |    0.3 | NVIDIA CUDA / Quadro P2200
no    | histogram | opencl|   10.550 |    6.308 |    0.6 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | csr       | opencl|    0.660 |    2.559 |    3.9 | NVIDIA CUDA / NVIDIA RTX A5000
no    | csr       | opencl|    0.618 |    2.547 |    4.1 | NVIDIA CUDA / NVIDIA RTX A5000
bbox  | csr       | opencl|    1.173 |    6.071 |    5.2 | NVIDIA CUDA / Quadro P2200
no    | csr       | opencl|    1.040 |    5.996 |    5.8 | NVIDIA CUDA / Quadro P2200
bbox  | csr       | opencl|    2.647 |   81.763 |   30.9 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | csr       | opencl|    2.178 |   80.642 |   37.0 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | csr       | opencl|    0.661 |    2.546 |    3.9 | NVIDIA CUDA / NVIDIA RTX A5000
full  | csr       | opencl|    1.179 |    6.078 |    5.2 | NVIDIA CUDA / Quadro P2200
full  | csr       | opencl|    2.504 |   82.328 |   32.9 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | lut       | opencl|    3.277 |  297.263 |   90.7 | NVIDIA CUDA / NVIDIA RTX A5000
no    | lut       | opencl|    1.621 |  175.541 |  108.3 | NVIDIA CUDA / NVIDIA RTX A5000
bbox  | lut       | opencl|    3.098 |  292.775 |   94.5 | NVIDIA CUDA / Quadro P2200
no    | lut       | opencl|    1.781 |  176.326 |   99.0 | NVIDIA CUDA / Quadro P2200
bbox  | lut       | opencl|    3.215 |  204.729 |   63.7 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | lut       | opencl|    2.484 |  174.308 |   70.2 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | lut       | opencl|    2.683 |  289.591 |  107.9 | NVIDIA CUDA / NVIDIA RTX A5000
full  | lut       | opencl|    2.669 |  290.825 |  109.0 | NVIDIA CUDA / Quadro P2200
full  | lut       | opencl|    3.250 |  205.200 |   63.1 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
--------------------------------------------------------------------------------
[9]:
print(f"Total runtime: {time.perf_counter()-start_time:.3f}s")
Total runtime: 618.791s
[ ]: