Performances of 2D integration vs 1D integration#

This is dependant on: * Number of azimuthal bins * Pixel splitting * Algorithm * Implementation (i.e. programming language) * Hardware used

Thus there is no general answer. But here is a quick benchmark to evaluate the penality on performances:

import sys import os import time import numpy import fabio import pyFAI from pyFAI.test.utilstest import UtilsTest import pyFAI.method_registry import pyFAI.integrator.azimuthal print(f”Python version: {sys.version}”) print(f”PyFAI version: {pyFAI.version}”) start_time = time.perf_counter()

[1]:
import sys
import os
import time
import numpy

os.environ["PYOPENCL_COMPILER_OUTPUT"] = "0"
start_time = time.perf_counter()
[2]:
import fabio
import pyFAI
from pyFAI.test.utilstest import UtilsTest
import pyFAI.method_registry
import pyFAI.integrator.azimuthal
print(f"Python version: {sys.version}")
print(f"PyFAI version: {pyFAI.version}")

Python version: 3.13.1 | packaged by conda-forge | (main, Jan 13 2025, 09:53:10) [GCC 13.3.0]
PyFAI version: 2025.11.0-dev0
[3]:
print("Number of way to performing integration:", len(pyFAI.method_registry.IntegrationMethod.list_available()))
Number of way to performing integration: 95
[4]:
ai = pyFAI.load(UtilsTest.getimage("Pilatus1M.poni"))
img = fabio.open(UtilsTest.getimage("Pilatus1M.edf")).data
ai
[4]:
Detector Pilatus 1M      PixelSize= 172µm, 172µm         BottomRight (3)
Wavelength= 1.000000 Å
SampleDetDist= 1.583231e+00 m   PONI= 3.341702e-02, 4.122778e-02 m      rot1=0.006487  rot2=0.007558  rot3=0.000000 rad
DirectBeamDist= 1583.310 mm     Center: x=179.981, y=263.859 pix        Tilt= 0.571° tiltPlanRotation= 130.640° λ= 1.000Å
[5]:
%%time
#Tune those parameters to match your needs:
kw1 = {"data": img, "npt":1000}
kw2 = {"data": img, "npt_rad":1000}
#Actual benchmark:
res = {}
for k,v in pyFAI.method_registry.IntegrationMethod._registry.items():
    print(k)
    if k.dim == 1:
        res[k] = %timeit -o ai.integrate1d(method=v, **kw1)
    else:
        res[k] = %timeit -o ai.integrate2d(method=v, **kw2)
Method(dim=1, split='no', algo='histogram', impl='python', target=None)
31.2 ms ± 703 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='no', algo='histogram', impl='python', target=None)
114 ms ± 479 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=1, split='no', algo='histogram', impl='cython', target=None)
11.3 ms ± 52.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='histogram', impl='cython', target=None)
16.9 ms ± 237 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='histogram', impl='cython', target=None)
26.1 ms ± 91.5 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='bbox', algo='histogram', impl='cython', target=None)
32.3 ms ± 139 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=1, split='full', algo='histogram', impl='cython', target=None)
155 ms ± 436 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='full', algo='histogram', impl='cython', target=None)
265 ms ± 2.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='pseudo', algo='histogram', impl='cython', target=None)
344 ms ± 2.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='cython', target=None)
8.97 ms ± 783 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csr', impl='cython', target=None)
9.92 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='cython', target=None)
9.95 ms ± 959 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csr', impl='cython', target=None)
11.8 ms ± 2.24 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='csr', impl='python', target=None)
9.9 ms ± 40.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csr', impl='python', target=None)
14.3 ms ± 41.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='python', target=None)
12.9 ms ± 31.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csr', impl='python', target=None)
17.4 ms ± 8.08 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='csc', impl='cython', target=None)
8.08 ms ± 14.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csc', impl='cython', target=None)
10.6 ms ± 15.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csc', impl='cython', target=None)
10.3 ms ± 9.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csc', impl='cython', target=None)
13.9 ms ± 18.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='csc', impl='python', target=None)
11 ms ± 7.41 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csc', impl='python', target=None)
14.5 ms ± 8.64 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csc', impl='python', target=None)
14.8 ms ± 10.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csc', impl='python', target=None)
21.9 ms ± 469 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='cython', target=None)
10.8 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='cython', target=None)
22.9 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='cython', target=None)
9.79 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='lut', impl='cython', target=None)
12 ms ± 3.11 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='full', algo='lut', impl='cython', target=None)
11.7 ms ± 2.63 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='cython', target=None)
11.7 ms ± 302 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='cython', target=None)
10.5 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csr', impl='cython', target=None)
8.32 ms ± 460 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='python', target=None)
13.1 ms ± 44 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csr', impl='python', target=None)
17.5 ms ± 622 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csc', impl='cython', target=None)
10.4 ms ± 23 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csc', impl='cython', target=None)
14.1 ms ± 63.4 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csc', impl='python', target=None)
15.1 ms ± 188 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csc', impl='python', target=None)
22.2 ms ± 284 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(0, 0))
9.02 ms ± 18.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(0, 0))
2.73 ms ± 5.73 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(0, 1))
8.63 ms ± 7.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(0, 1))
4.2 ms ± 4.08 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(1, 0))
1 error generated.
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
1 error generated.
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
14.6 ms ± 676 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(1, 0))
/users/kieffer/.venv/py313/lib/python3.13/site-packages/pyopencl/cache.py:496: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  _create_built_program_from_source_cached(
/users/kieffer/.venv/py313/lib/python3.13/site-packages/pyopencl/cache.py:500: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  prg.build(options_bytes, devices)
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
10.4 ms ± 707 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(2, 0))
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
10.9 ms ± 444 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(2, 0))
6.79 ms ± 271 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(0, 0))
704 μs ± 992 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(0, 0))
2.82 ms ± 71.3 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(0, 0))
663 μs ± 776 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(0, 0))
2.57 ms ± 11.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(0, 1))
1.21 ms ± 365 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(0, 1))
6.09 ms ± 9.21 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(0, 1))
1.07 ms ± 618 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(0, 1))
6.03 ms ± 7.27 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(1, 0))
4.02 ms ± 88.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(1, 0))
7.3 ms ± 343 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(1, 0))
2.81 ms ± 81.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(1, 0))
6.32 ms ± 21.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(2, 0))
3.48 ms ± 1.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(2, 0))
89.8 ms ± 6.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(2, 0))
2.26 ms ± 156 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(2, 0))
82.9 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(0, 0))
706 μs ± 980 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(0, 0))
2.61 ms ± 75.6 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(0, 1))
1.21 ms ± 681 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(0, 1))
6.11 ms ± 41.6 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(1, 0))
4.21 ms ± 18.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(1, 0))
7.74 ms ± 144 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(2, 0))
2.92 ms ± 174 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(2, 0))
87.2 ms ± 5.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(0, 0))
3.16 ms ± 1.89 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(0, 0))
304 ms ± 5.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(0, 0))
1.6 ms ± 2.17 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(0, 0))
186 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(0, 1))
3.13 ms ± 1.27 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(0, 1))
310 ms ± 8.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(0, 1))
1.8 ms ± 4.74 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(0, 1))
183 ms ± 5.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(1, 0))
4.99 ms ± 38.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(1, 0))
167 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(1, 0))
3.59 ms ± 27.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(1, 0))
137 ms ± 2.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(2, 0))
3.74 ms ± 152 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(2, 0))
210 ms ± 573 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(2, 0))
2.63 ms ± 82.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(2, 0))
183 ms ± 5.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(0, 0))
2.61 ms ± 1.74 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(0, 0))
300 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(0, 1))
2.75 ms ± 110 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(0, 1))
309 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(1, 0))
4.4 ms ± 58.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(1, 0))
171 ms ± 5.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(2, 0))
4.33 ms ± 177 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(2, 0))
217 ms ± 5.71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
CPU times: user 1h 50min 21s, sys: 8min 2s, total: 1h 58min 24s
Wall time: 7min 57s
[6]:
print("-"*80)
print(f"{'Split':5s} | {'Algo':9s} | {'Impl':6s}| {'1d (ms)':8s} | {'2d (ms)':8s} | {'ratio':6s} | Device")
print("-"*80)
for k in res:
    if k.dim == 1:
        k1 = k
        k2 = k._replace(dim=2)
        if k2 in res:
            print(f"{k1.split:5s} | {k1.algo:9s} | {k1.impl:6s}| {res[k1].best*1000:8.3f} | {res[k2].best*1000:8.3f} | {res[k2].best/res[k1].best:6.1f} | ",
                    end="")
        if k.target:
            print(pyFAI.method_registry.IntegrationMethod._registry.get(k).target_name)
        else:
            print()
print("-"*80)
--------------------------------------------------------------------------------
Split | Algo      | Impl  | 1d (ms)  | 2d (ms)  | ratio  | Device
--------------------------------------------------------------------------------
no    | histogram | python|   30.568 |  113.243 |    3.7 |
no    | histogram | cython|   11.217 |   16.766 |    1.5 |
bbox  | histogram | cython|   26.019 |   32.139 |    1.2 |
full  | histogram | cython|  154.695 |  262.684 |    1.7 |
no    | csr       | cython|    7.843 |    8.266 |    1.1 |
bbox  | csr       | cython|    8.210 |    9.176 |    1.1 |
no    | csr       | python|    9.819 |   14.269 |    1.5 |
bbox  | csr       | python|   12.830 |   17.338 |    1.4 |
no    | csc       | cython|    8.066 |   10.549 |    1.3 |
bbox  | csc       | cython|   10.278 |   13.857 |    1.3 |
no    | csc       | python|   10.941 |   14.487 |    1.3 |
bbox  | csc       | python|   14.771 |   21.639 |    1.5 |
bbox  | lut       | cython|    7.971 |   17.975 |    2.3 |
no    | lut       | cython|    7.551 |    8.131 |    1.1 |
full  | lut       | cython|    7.938 |   11.485 |    1.4 |
full  | csr       | cython|    8.561 |    8.016 |    0.9 |
full  | csr       | python|   13.001 |   17.103 |    1.3 |
full  | csc       | cython|   10.342 |   14.047 |    1.4 |
full  | csc       | python|   15.001 |   21.900 |    1.5 |
no    | histogram | opencl|    9.006 |    2.720 |    0.3 | NVIDIA CUDA / NVIDIA RTX A5000
no    | histogram | opencl|    8.617 |    4.192 |    0.5 | NVIDIA CUDA / Quadro P2200
no    | histogram | opencl|   13.982 |    9.701 |    0.7 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | histogram | opencl|   10.142 |    6.392 |    0.6 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | csr       | opencl|    0.703 |    2.755 |    3.9 | NVIDIA CUDA / NVIDIA RTX A5000
no    | csr       | opencl|    0.662 |    2.562 |    3.9 | NVIDIA CUDA / NVIDIA RTX A5000
bbox  | csr       | opencl|    1.209 |    6.076 |    5.0 | NVIDIA CUDA / Quadro P2200
no    | csr       | opencl|    1.071 |    6.021 |    5.6 | NVIDIA CUDA / Quadro P2200
bbox  | csr       | opencl|    3.898 |    6.985 |    1.8 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | csr       | opencl|    2.748 |    6.285 |    2.3 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | csr       | opencl|    2.612 |   81.907 |   31.4 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | csr       | opencl|    2.134 |   81.539 |   38.2 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | csr       | opencl|    0.705 |    2.575 |    3.7 | NVIDIA CUDA / NVIDIA RTX A5000
full  | csr       | opencl|    1.212 |    6.083 |    5.0 | NVIDIA CUDA / Quadro P2200
full  | csr       | opencl|    4.192 |    7.483 |    1.8 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | csr       | opencl|    2.750 |   81.969 |   29.8 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | lut       | opencl|    3.163 |  300.582 |   95.0 | NVIDIA CUDA / NVIDIA RTX A5000
no    | lut       | opencl|    1.601 |  184.015 |  114.9 | NVIDIA CUDA / NVIDIA RTX A5000
bbox  | lut       | opencl|    3.131 |  304.496 |   97.3 | NVIDIA CUDA / Quadro P2200
no    | lut       | opencl|    1.801 |  180.383 |  100.2 | NVIDIA CUDA / Quadro P2200
bbox  | lut       | opencl|    4.943 |  164.535 |   33.3 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | lut       | opencl|    3.549 |  132.949 |   37.5 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | lut       | opencl|    3.475 |  208.892 |   60.1 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | lut       | opencl|    2.546 |  178.412 |   70.1 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | lut       | opencl|    2.608 |  297.720 |  114.2 | NVIDIA CUDA / NVIDIA RTX A5000
full  | lut       | opencl|    2.698 |  306.706 |  113.7 | NVIDIA CUDA / Quadro P2200
full  | lut       | opencl|    4.294 |  163.228 |   38.0 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | lut       | opencl|    4.053 |  212.131 |   52.3 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
--------------------------------------------------------------------------------
[7]:
print(f"Total runtime: {time.perf_counter()-start_time:.3f}s")
Total runtime: 478.807s
[ ]: