.. _tutorial: Tutorial =============== *torchquad* is a dedicated module for numerical integration in arbitrary dimensions. This tutorial gives a more detailed look at its functionality and explores some performance considerations. Minimal working example ----------------------- .. code:: ipython3 # To avoid copying things to GPU memory, # ideally allocate everything in torch on the GPU # and avoid non-torch function calls import torch from torchquad import MonteCarlo, set_up_backend # Enable GPU support if available and set the floating point precision set_up_backend("torch", data_type="float32") # The function we want to integrate, in this example # f(x0,x1) = sin(x0) + e^x1 for x0=[0,1] and x1=[-1,1] # Note that the function needs to support multiple evaluations at once (first # dimension of x here) # Expected result here is ~3.2698 def some_function(x): return torch.sin(x[:, 0]) + torch.exp(x[:, 1]) # Declare an integrator; # here we use the simple, stochastic Monte Carlo integration method mc = MonteCarlo() # Compute the function integral by sampling 10000 points over domain integral_value = mc.integrate( some_function, dim=2, N=10000, integration_domain=[[0, 1], [-1, 1]], backend="torch", ) To set the default logger verbosity, change the ``TORCHQUAD_LOG_LEVEL`` environment variable; for example ``export TORCHQUAD_LOG_LEVEL=DEBUG``. A :ref:`later section ` in this tutorial shows how to choose a different numerical backend. For information on using multiple GPUs, see the :ref:`Multi-GPU Usage ` section. Detailed Introduction --------------------- The main problem with higher-dimensional numerical integration is that the computation simply becomes too costly if the dimensionality, *n*, is large, as the number of evaluation points increases exponentially - this problem is known as the *curse of dimensionality*. This especially affects grid-based methods, but is, to some degree, also present for Monte Carlo methods, which also require larger numbers of points for convergence in higher dimensions. At the time, *torchquad* offers the following integration methods for abritrary dimensionality. +--------------+----------------------------------------------------+------------+ | Name | How it works | Spacing | | | | | +==============+====================================================+============+ | Trapezoid | Creates a linear interpolant between two |br| | Equal | | rule | neighbouring points | | +--------------+----------------------------------------------------+------------+ | Simpson's | Creates a quadratic interpolant between |br| | Equal | | rule | three neighbouring point | | +--------------+----------------------------------------------------+------------+ | Boole's | Creates a more complex interpolant between |br| | Equal | | rule | five neighbouring points | | +--------------+----------------------------------------------------+------------+ | Gaussian | Uses orthogonal polynomials to generate a grid | Unequal | | Quadrature | of sample points along with corresponding weights. | | | | A `GaussLegendre` implementation is provided | | | | as is a base `Gaussian` class that can be extended | | | | e.g., to other polynomials. | | +--------------+----------------------------------------------------+------------+ | Monte Carlo | Randomly chooses points at which the |br| | Random | | | integrand is evaluated | | +--------------+----------------------------------------------------+------------+ | VEGAS | Adaptive multidimensional Monte Carlo |br| | Stratified | | Enhanced | integration (VEGAS with adaptive stratified | |br| | | |br| (VEGAS+)| |br| sampling) | sampling | +--------------+----------------------------------------------------+------------+ .. |br| raw:: html
Outline ------- This tutorial is a guide for new users to *torchquad* and is structured in the following way: 1. Example integration in one dimension (1-D) with PyTorch 2. Example integration in ten dimensions (10-D) with PyTorch 3. Some accuracy / runtime comparisons with scipy 4. Information on how to select a numerical backend 5. Example showing how gradients can be obtained w.r.t. the integration domain with PyTorch 6. Methods to speed up the integration 7. Multidimensional/Vectorized Integrands 8. Parametric Integration with Variable Domains 9. Custom Integrators Feel free to test the code on your own computer as we go along. Imports ------- Now let's get started! First, the general imports: .. code:: ipython3 import scipy import numpy as np # For benchmarking import time from scipy.integrate import nquad # For plotting import matplotlib.pyplot as plt # To avoid copying things to GPU memory, # ideally allocate everything in torch on the GPU # and avoid non-torch function calls import torch torch.set_printoptions(precision=10) # Set displayed output precision to 10 digits from torchquad import set_up_backend # Necessary to enable GPU support from torchquad import Trapezoid, Simpson, Boole, MonteCarlo, VEGAS # The available integrators from torchquad.utils.set_precision import set_precision import torchquad .. code:: ipython3 # Use this to enable GPU support and set the floating point precision set_up_backend("torch", data_type="float32") One-dimensional integration ------------------------------------------------ To make it easier to understand the methods used in this notebook, we will start with an example in one dimension. If you are new to these methods or simply want a clearer picture, feel free to check out Patrick Walls' `nice Python introduction `__ to the `Trapezoid rule `__ and `Simpson's rule `__ in one dimension. Similarly, `Tirthajyoti Sarkar `__ has made a nice visual explanation of `Monte Carlo integration in Python `__. Let ``f(x)`` be the function :math:`f(x) = e^{x} \cdot x^{2}`. Over the domain :math:`[0,2]`, the integral of ``f(x)`` is :math:`\int_{0}^{2} f(x) dx = \int_{0}^{2} e^x \cdot x^2 dx = 2(e^{2} - 1) = 12.7781121978613004544...` Let's declare the function and a simple function to print the absolute error, as well as remember the correct result. .. code:: ipython3 def f(x): return torch.exp(x) * torch.pow(x, 2) def print_error(result, solution): print("Results:", result.item()) print(f"Abs. Error: {(torch.abs(result - solution).item()):.8e}") print(f"Rel. Error: {(torch.abs((result - solution) / solution).item()):.8e}") solution = 2 * (torch.exp(torch.tensor([2.0])) - 1) **Note that we are using the torch versions of functions like ``exp`` to ensure that all variables are and stay on the GPU.** **Also, note:** the unit imaginary number :math:`i` is written as ``j`` in Python. Let's plot the function briefly. .. code:: ipython3 points = torch.linspace(0, 2, 100) # Note that for plotting we have to move the values to the CPU first plt.plot(points.cpu(), f(points).cpu()) plt.xlabel("$x$", fontsize=14) plt.ylabel("f($x$)", fontsize=14) Let's define the integration domain, set the precision to double, and initialize the integrator - let's start with the trapezoid rule. .. code:: ipython3 # Integration domain is a list of lists to allow arbitrary dimensionality. integration_domain = [[0, 2]] # Initialize a trapezoid solver tp = Trapezoid() Now we are all set to compute the integral. Let's try it with just 101 sample points for now. .. code:: ipython3 result = tp.integrate(f, dim=1, N=101, integration_domain=integration_domain) print_error(result, solution) .. parsed-literal:: **Output**: Results: 12.780082702636719 Abs. Error: 1.97029114e-03 Rel. Error: 1.54192661e-04 This is quite close already, as 1-D integrals are comparatively easy. Let's see what type of value we get for different integrators. .. code:: ipython3 simp = Simpson() result = simp.integrate(f, dim=1, N=101, integration_domain=integration_domain) print_error(result, solution) .. parsed-literal:: **Output:** Results: 12.778112411499023 Abs. Error: 0.00000000e+00 Rel. Error: 0.00000000e+00 .. code:: ipython3 mc = MonteCarlo() result = mc.integrate(f, dim=1, N=101, integration_domain=integration_domain) print_error(result, solution) .. parsed-literal:: **Output:** Results: 13.32831859588623 Abs. Error: 5.50206184e-01 Rel. Error: 4.30584885e-02 .. code:: ipython3 vegas = VEGAS() result = vegas.integrate(f, dim=1, N=101, integration_domain=integration_domain) print_error(result, solution) .. parsed-literal:: **Output:** Results: 21.83991813659668 Abs. Error: 9.06180573e+00 Rel. Error: 7.09166229e-01 Notably, Simpson's method is already sufficient for a perfect solution here with 101 points. Monte Carlo methods do not perform so well; they are more suited to higher-dimensional integrals. VEGAS currently requires a larger number of samples to function correctly (as it performs several iterations). Let's step things up now and move to a ten-dimensional problem. High-dimensional integration ---------------------------- Now, we will investigate the following ten-dimensional problem: Let ``f_2`` be the function :math:`f_{2}(x) = \sum_{i=1}^{10} \sin(x_{i})`. Over the domain :math:`[0,1]^{10}`, the integral of ``f_2`` is :math:`\int_{0}^{1} \dotsc \int_{0}^{1} \sum_{i=1}^{10} \sin(x_{i}) = 20 \sin^{2}(1/2) = 4.59697694131860282599063392557 \dotsc` Plotting this is tricky, so let's directly move to the integrals. .. code:: ipython3 def f_2(x): return torch.sum(torch.sin(x), dim=1) solution = 20 * (torch.sin(torch.tensor([0.5])) * torch.sin(torch.tensor([0.5]))) Let's start with just 3 points per dimension, i.e., :math:`3^{10}=59,049` sample points. **Note**: *torchquad* currently only supports equal numbers of points per dimension. We are working on giving the user more flexibility on this point. .. code:: ipython3 # Integration domain is a list of lists to allow arbitrary dimensionality integration_domain = [[0, 1]] * 10 N = 3 ** 10 .. code:: ipython3 tp = Trapezoid() # Initialize a trapezoid solver result = tp.integrate(f_2, dim=10, N=N, integration_domain=integration_domain) print_error(result, solution) .. parsed-literal:: **Output:** Results: 4.500804901123047 Abs. Error: 9.61723328e-02 Rel. Error: 2.09207758e-02 .. code:: ipython3 simp = Simpson() # Initialize Simpson solver result = simp.integrate(f_2, dim=10, N=N, integration_domain=integration_domain) print_error(result, solution) .. parsed-literal:: **Output:** Results: 4.598623752593994 Abs. Error: 1.64651871e-03 Rel. Error: 3.58174206e-04 .. code:: ipython3 boole = Boole() # Initialize Boole solver result = boole.integrate(f_2, dim=10, N=N, integration_domain=integration_domain) print_error(result,solution) .. parsed-literal:: **Output:** Results: 4.596974849700928 Abs. Error: 2.38418579e-06 Rel. Error: 5.18642082e-07 .. code:: ipython3 mc = MonteCarlo() result = mc.integrate(f_2, dim=10, N=N, integration_domain=integration_domain, seed=42) print_error(result, solution) .. parsed-literal:: **Output:** Results: 4.598303318023682 Abs. Error: 1.32608414e-03 Rel. Error: 2.88468727e-04 .. code:: ipython3 vegas = VEGAS() result = vegas.integrate(f_2, dim=10, N=N, integration_domain=integration_domain) print_error(result, solution) .. parsed-literal:: **Output:** Results: 4.598696708679199 Abs. Error: 1.71947479e-03 Rel. Error: 3.74044670e-04 Note that the Monte Carlo methods are much more competitive in this case. The bad convergence properties of the trapezoid method are visible while Simpson's and Boole's rule are still OK given the comparatively smooth integrand. If you have been repeating the examples from this tutorial on your own computer, you might get ``RuntimeError: CUDA out of memory`` if you have a small GPU. In that case, you could also try to reduce the number of sample points (e.g., 3 per dimension). You can really see the curse of dimensionality fully at play here, since :math:`5^{10}=9,765,625` but :math:`3^{10}=59,049`, reducing the number of sample points by a factor of 165. Note, however, that Boole's method cannot work for only 3 points per dimension, so the number of sample points is therefore automatically increased to 5 per dimension for this method. Comparison with scipy --------------------- Let's explore how *torchquad*'s performance compares to scipy, the go-to tool for numerical integration. A more detailed exploration of this topic might be done as a side project at a later time. For simplicity, we will stick to a 5-D version of the :math:`\sin(x)` of the previous section. Let's declare it with numpy and torch. NumPy arrays will remain on the CPU and torch tensor on the GPU. .. code:: ipython3 dimension = 5 integration_domain = [[0, 1]] * dimension ground_truth = 2 * dimension * np.sin(0.5) * np.sin(0.5) def f_3(x): return torch.sum(torch.sin(x), dim=1) def f_3_np(*x): return np.sum(np.sin(x)) Now let's evaluate the integral using the scipy function ``nquad``. .. code:: ipython3 start = time.time() opts = {"limit": 10, "epsabs": 1, "epsrel": 1} result, _, details = nquad(f_3_np, integration_domain, opts=opts, full_output=True) end = time.time() print("Results:", result) print("Abs. Error:", np.abs(result - ground_truth)) print(details) print(f"Took {(end - start) * 1000.0:.3f} ms") .. parsed-literal:: **Output:** Results: 2.2984884706593016 Abs. Error: 0.0 {'neval': 4084101} Took 33067.629 ms Using scipy, we get the result in about 33 seconds on the authors' machine (this might take shorter or longer on your machine). The integral was computed with ``nquad``, which on the inside uses the highly adaptive `QUADPACK `__ algorithm. In any event, *torchquad* can reach the same accuracy much, much quicker by utilizing the GPU. .. code:: ipython3 N = 37 ** dimension simp = Simpson() # Initialize Simpson solver start = time.time() result = simp.integrate(f_3, dim=dimension, N=N, integration_domain=integration_domain) end = time.time() print_error(result, ground_truth) print("neval=", N) print(f"Took {(end - start) * 1000.0:.3f} ms") If you tried this yourself and ran out of CUDA memory, simply decrease :math:`N` (this will, however, lead to a loss of accuracy). Note that we use more evaluation points (:math:`37^{5}=69,343,957` for *torchquad* vs. :math:`4,084,101` for scipy), given the comparatively simple algorithm. Anyway, the decisive factor for this specific problem is runtime. A comparison with regard to function evaluations is difficult, as ``nquad`` provides no support for a fixed number of evaluations. This may follow in the future. The results from using Simpson's rule in *torchquad* is: .. parsed-literal:: **Output:** Results: 2.2984883785247803 Abs. Error: 0.00000000e+00 Rel. Error: 0.00000000e+00 neval= 69343957 Took 162.147 ms In our case, *torchquad* with Simpson's rule was more than 300 times faster than ``scipy.integrate.nquad``. We will add more elaborate integration methods over time; however, this tutorial should already showcase the advantages of numerical integration on the GPU. Reasonably, one might prefer Monte Carlo integration methods for a 5-D problem. We might add this comparison to the tutorial in the future. .. _tutorial_backend_selection: Using different backends with torchquad --------------------------------------- This section shows how to select a different numerical backend for the quadrature. Let's change the minimal working example so that it uses Tensorflow instead of PyTorch: .. code:: ipython3 import tensorflow as tf from torchquad import MonteCarlo, set_up_backend # Enable Tensorflow's NumPy behaviour and set the floating point precision set_up_backend("tensorflow", data_type="float32") # The integrand function rewritten for Tensorflow instead of PyTorch def some_function(x): return tf.sin(x[:, 0]) + tf.exp(x[:, 1]) mc = MonteCarlo() # Set the backend argument to "tensorflow" instead of "torch" integral_value = mc.integrate( some_function, dim=2, N=10000, integration_domain=[[0, 1], [-1, 1]], backend="tensorflow", ) As the name suggests, the ``set_up_backend`` function configures a numerical backend so that it works with torchquad and it optionally sets the floating point precision. For Tensorflow this means in our code it enables `NumPy behaviour `__ and configures torchquad so that it uses float32 precision when initialising Tensors for Tensorflow. More details about :func:`torchquad.set_up_backend` can be found in its documentation. To calculate an integral with Tensorflow we changed the ``backend`` argument of the ``integrate`` method. An alternative way to select Tensorflow as backend is to set the ``integration_domain`` argument to a ``tf.Tensor`` instead of a list. The other code changes we did, for example rewriting the integrand, are not directly related to torchquad. To use NumPy or JAX we would analogously need to change the two backend arguments to ``"numpy"`` resp. ``"jax"`` and rewrite the integrand function. Computing gradients with respect to the integration domain ---------------------------------------------------------- *torchquad* allows fully automatic differentiation. In this tutorial, we will show how to extract the gradients with respect to the integration domain with the PyTorch backend. We selected the composite Trapezoid rule and the Monte Carlo method to showcase that getting gradients is possible for both deterministic and stochastic methods. .. code:: ipython3 import torch from torchquad import MonteCarlo, Trapezoid, set_up_backend def test_function(x): """V shaped test function.""" return 2 * torch.abs(x) set_up_backend("torch", data_type="float64") # Number of function evaluations N = 10000 # Calculate a gradient with the MonteCarlo integrator # Define the integrator integrator_mc = MonteCarlo() # Integration domain domain = torch.tensor([[-1.0, 1.0]]) # Enable the creation of a computational graph for gradient calculation. domain.requires_grad = True # Calculate the 1-D integral by using the previously defined test_function # with MonteCarlo; set a RNG seed to get reproducible results result_mc = integrator_mc.integrate( test_function, dim=1, N=N, integration_domain=domain, seed=0 ) # Compute the gradient with a backward pass result_mc.backward() gradient_mc = domain.grad # Calculate a gradient analogously with the composite Trapezoid integrator integrator_tp = Trapezoid() domain = torch.tensor([[-1.0, 1.0]]) domain.requires_grad = True result_tp = integrator_tp.integrate( test_function, dim=1, N=N, integration_domain=domain ) result_tp.backward() gradient_tp = domain.grad # Show the results print(f"Gradient result for MonteCarlo: {gradient_mc}") print(f"Gradient result for Trapezoid: {gradient_tp}") The code above calculates the integral for a 1-D test-function ``test_function()`` in the [-1,1] domain and prints the gradients with respect to the integration domain. The command ``domain.requires_grad = True`` enables the creation of a computational graph, and it shall be called before calling the ``integrate(...)`` method. Gradients computation is, then, performed calling ``result.backward()``. The output of the print statements is as follows: .. parsed-literal:: Gradient result for MonteCarlo: tensor([[-1.9828, 2.0196]]) Gradient result for Trapezoid: tensor([[-2.0000, 2.0000]]) Speedups for repeated quadrature -------------------------------- Compiling the integrate method `````````````````````````````` To speed up the quadrature in situations where it is executed often with the same number of points ``N``, dimensionality ``dim``, and shape of the ``integrand`` (see :ref:`the next section ` for more information on integrands), we can JIT-compile the performance-relevant parts of the integrate method: .. code:: ipython3 import time import torch from torchquad import Boole, set_up_backend def example_integrand(x): return torch.sum(torch.sin(x), dim=1) set_up_backend("torch", data_type="float32") N = 912673 dim = 3 integrator = Boole() domains = [torch.tensor([[-1.0, y]] * dim) for y in range(5)] # Integrate without compilation times_uncompiled = [] for integration_domain in domains: t0 = time.perf_counter() integrator.integrate(example_integrand, dim, N, integration_domain) times_uncompiled.append(time.perf_counter() - t0) # Integrate with partial compilation integrate_jit_compiled_parts = integrator.get_jit_compiled_integrate( dim, N, backend="torch" ) times_compiled_parts = [] for integration_domain in domains: t0 = time.perf_counter() integrate_jit_compiled_parts(example_integrand, integration_domain) times_compiled_parts.append(time.perf_counter() - t0) # Integrate with everything compiled times_compiled_all = [] integrate_compiled = None for integration_domain in domains: t0 = time.perf_counter() if integrate_compiled is None: integrate_compiled = torch.jit.trace( lambda dom: integrator.integrate(example_integrand, dim, N, dom), (integration_domain,), ) integrate_compiled(integration_domain) times_compiled_all.append(time.perf_counter() - t0) print(f"Uncompiled times: {times_uncompiled}") print(f"Partly compiled times: {times_compiled_parts}") print(f"All compiled times: {times_compiled_all}") speedups = [ (1.0, tu / tcp, tu / tca) for tu, tcp, tca in zip(times_uncompiled, times_compiled_parts, times_compiled_all) ] print(f"Speedup factors: {speedups}") This code shows two ways of compiling the integration. In the first case, we use ``integrator.get_jit_compiled_integrate``, which internally uses ``torch.jit.trace`` to compile performance-relevant code parts except the integrand evaluation. In the second case we directly compile ``integrator.integrate``. The function created in the first case may be a bit slower, but it works even if the integrand cannot be compiled and we can re-use it with other integrand functions. The compilations happen in the first iteration of the for loops and in the following iterations the previously compiled functions are re-used. With JAX and Tensorflow it is also possible to compile the integration. In comparison to compilation with PyTorch, we would need to use ``jax.jit`` or ``tf.function`` instead of ``torch.jit.trace`` to compile the whole integrate method. ``get_jit_compiled_integrate`` automatically uses the compilation function which fits to the numerical backend. There is a special case with JAX and MonteCarlo: If a function which executes the integrate method is compiled with ``jax.jit``, the random number generator's current PRNGKey value needs to be an input and output of this function so that MonteCarlo generates different random numbers in each integration. torchquad's RNG class has methods to set and get this PRNGKey value. The disadvantage of compilation is the additional time required to compile or re-compile the code, so if the integrate method is executed only a few times or certain arguments, e.g. ``N``, change often, the program may be slower overall. Reusing sample points ````````````````````` With the MonteCarlo and composite Newton Cotes integrators it is possible to execute the methods for sample point calculation, integrand evaluation and result calculation separately. This can be helpful to obtain a speedup in situations where integration happens very often with the same ``integration_domain`` and ``N`` arguments. However, separate sample point calculation has some disadvantages: - The code is more complex. - The memory required for the grid points is not released after each integration. - With MonteCarlo the same sample points would be used for each integration, which corresponds to a fixed seed. Here is an example where we integrate two functions with Boole and use the same sample points for both functions: .. code:: ipython3 import torch from torchquad import Boole def integrand1(x): return torch.sin(x[:, 0]) + torch.exp(x[:, 1]) def integrand2(x): return torch.prod(torch.cos(x), dim=1) # The integration domain, dimensionality and number of evaluations # For the calculate_grid method we need a Tensor and not a list. integration_domain = torch.Tensor([[0.0, 1.0], [-1.0, 1.0]]) dim = 2 N = 9409 # Initialize the integrator integrator = Boole() # Calculate sample points and grid information for the result calculation grid_points, hs, n_per_dim = integrator.calculate_grid(N, integration_domain) # Integrate the first integrand with the sample points function_values, _ = integrator.evaluate_integrand(integrand1, grid_points) integral1 = integrator.calculate_result(function_values, dim, n_per_dim, hs, integration_domain) # Integrate the second integrand with the same sample points function_values, _ = integrator.evaluate_integrand(integrand2, grid_points) integral2 = integrator.calculate_result(function_values, dim, n_per_dim, hs, integration_domain) print(f"Quadrature results: {integral1}, {integral2}") .. _multi_dim_integrand: Multidimensional/Vectorized Integrands -------------------------------------- If you wish to evaluate many different integrands over the same domain, it may be faster to pass in a vectorized formulation if possible. Our inspiration for this came from scipy's own vectorization capabilities e.g., from its ``fixed_quad`` `method `__. As an example, here we evaluate a similar integrand many times for different values of ``a`` and ``b``. This is an example that could be sped up by a vectorized evaluation of all integrals: .. code:: ipython3 def parametrized_integrand(x, a, b): return torch.sqrt(torch.cos(torch.sin((a + b) * x))) a_params = torch.arange(40) b_params = torch.arange(10, 20) integration_domain = torch.Tensor([[0, 1]]) simp = Simpson() result = torch.stack([torch.Tensor([simp.integrate(lambda x: parametrized_integrand(x, a, b), dim=1, N=101, integration_domain=integration_domain) for a in a_params]) for b in b_params]) Now let's see how to do this a bit more simply, and in a way that provides signficant speedup as the size of the integrand's ``grid`` grows: .. code:: ipython3 grid = torch.stack([torch.Tensor([a + b for a in a_params]) for b in b_params]) def integrand(x): return torch.sqrt(torch.cos(torch.sin(torch.einsum("i,jk->ijk", x.flatten(), grid)))) result_vectorized = simp.integrate(integrand, dim=1, N=101, integration_domain=integration_domain) torch.all(torch.isclose(result_vectorized, result)) # True! .. note:: VEGAS does not support multi-dimensional integrands. If you would like this, please consider opening an issue or PR. Parametric Integration with Variable Domains -------------------------------------------- Sometimes you need to perform multiple integrations where both the integrand and the integration domain depend on parameters. This is particularly useful in applications where you need to compute integrals for many different parameter values simultaneously. For example, you might want to compute: .. math:: I(a, b) = \\int_{a}^{b} f(x, a, b) dx for multiple values of :math:`a` and :math:`b` simultaneously. Currently, torchquad doesn't have built-in support for parametric domains, but you can extend the existing integrators to handle this case. Below is an example of how to create a custom integrator that supports batch 1D integration with variable domains: .. code:: ipython3 import torch from loguru import logger from autoray import numpy as anp from autoray import infer_backend from torchquad import Gaussian class Batch1DIntegrator(Gaussian): """Custom integrator for batch 1D integration with variable domains. This integrator can compute multiple integrals with different domains in a single call, providing significant speedup over sequential computation. """ def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.disable_integration_domain_check = True def _resize_roots(self, integration_domain, roots): """Resize roots for batched integration domains. Args: integration_domain: Shape [batch_size, 2] for multiple domains roots: Shape [N] - the Gaussian quadrature nodes Returns: Resized roots with shape [batch_size, N] """ if integration_domain.ndim == 1: # Single domain case - use parent implementation return super()._resize_roots(integration_domain, roots) # Batch case assert roots.ndim == 1 assert integration_domain.ndim == 2 assert integration_domain.shape[-1] == 2 roots = roots.to(integration_domain.device) # Extract bounds for all domains a = integration_domain[:, 0:1] # Shape [batch_size, 1] b = integration_domain[:, 1:2] # Shape [batch_size, 1] # Broadcast and transform roots for each domain roots_expanded = roots.unsqueeze(0) # [1, N] # Transform from [-1, 1] to [a, b] for each domain out = ((b - a) / 2) * roots_expanded + ((a + b) / 2) # [batch_size, N] return out def integrate(self, fn, dim, N, integration_domain=None, backend="torch"): """Integrate function over multiple domains in a single call. Args: fn: Function to integrate dim: Must be 1 for this implementation N: Number of quadrature points integration_domain: Shape [batch_size, 2] for batch integration backend: Must be "torch" Returns: Tensor of shape [batch_size] with integral results """ assert dim == 1 assert backend == "torch" if integration_domain.ndim == 1: integration_domain = integration_domain.reshape(1, 2) batch_size = integration_domain.shape[0] # Get Gaussian quadrature points and weights N = self._adjust_N(dim=1, N=N) roots = self._roots(N, backend, integration_domain.requires_grad) weights = self._weights(N, dim, backend) # Resize roots for all domains at once grid_points = self._resize_roots(integration_domain, roots) # [batch_size, N] # Evaluate integrand at all points # Flatten for function evaluation: [batch_size * N, 1] points_flat = grid_points.reshape(-1, 1) function_values = fn(points_flat) # [batch_size * N] # Reshape back to [batch_size, N] function_values = function_values.reshape(batch_size, N) # Apply weights and sum for each domain weighted_values = function_values * weights.unsqueeze(0) # Scale by domain width and sum domain_widths = (integration_domain[:, 1] - integration_domain[:, 0]) / 2 results = domain_widths * weighted_values.sum(dim=1) return results Now let's see a concrete example of using this for parametric integration: .. code:: ipython3 # Example 1: Compute multiple integrals in ONE call # I(a) = integral from 0 to a of x^2 dx = a^3/3 # for a = 1, 2, 3, 4, 5 def integrand(x): # x has shape [batch_size * N, 1] where N is the number of quadrature points return x[:, 0] ** 2 # Create multiple integration domains upper_bounds = torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0]) domains = torch.stack([torch.zeros_like(upper_bounds), upper_bounds], dim=1) print(f"Integration domains shape: {domains.shape}") print(f"Domains:\n{domains}") # Initialize the batch integrator batch_integrator = Batch1DIntegrator() # Compute ALL integrals in ONE call - this is the key difference! results = batch_integrator.integrate(integrand, dim=1, N=50, integration_domain=domains) # Analytical solution: a^3/3 analytical = upper_bounds ** 3 / 3 print(f"\nResults shape: {results.shape}") print(f"Numerical results: {results}") print(f"Analytical results: {analytical}") print(f"Absolute errors: {torch.abs(results - analytical)}") Output: .. parsed-literal:: Integration domains shape: torch.Size([5, 2]) Domains: tensor([[0., 1.], [0., 2.], [0., 3.], [0., 4.], [0., 5.]]) Results shape: torch.Size([5]) Numerical results: tensor([ 0.3333, 2.6667, 9.0000, 21.3333, 41.6667]) Analytical results: tensor([ 0.3333, 2.6667, 9.0000, 21.3333, 41.6667]) Absolute errors: tensor([9.9341e-09, 7.9473e-08, 1.7764e-14, 6.3578e-07, 1.2716e-06]) The key advantage of this approach is that all integrals are computed in a single vectorized operation, which can provide significant speedups: .. code:: ipython3 # Performance comparison - batch vs sequential import time from torchquad import GaussLegendre # Many domains n_domains = 500 many_upper_bounds = torch.linspace(0.1, 5.0, n_domains) many_domains = torch.stack([torch.zeros(n_domains), many_upper_bounds], dim=1) # Batch computation start = time.time() batch_results = batch_integrator.integrate(integrand, dim=1, N=50, integration_domain=many_domains) batch_time = time.time() - start # Sequential computation for comparison standard_integrator = GaussLegendre() start = time.time() sequential_results = [] for i in range(n_domains): result = standard_integrator.integrate( integrand, dim=1, N=50, integration_domain=[[0.0, many_upper_bounds[i].item()]] ) sequential_results.append(result) sequential_time = time.time() - start print(f"Computed {n_domains} integrals:") print(f"Batch time: {batch_time:.4f} seconds") print(f"Sequential time: {sequential_time:.4f} seconds") print(f"Speedup: {sequential_time/batch_time:.2f}x") Output: .. parsed-literal:: Computed 500 integrals: Batch time: 0.0010 seconds Sequential time: 0.2289 seconds Speedup: 228.90x This approach can be extended to more complex scenarios where both the integrand and the domain depend on parameters. The key insight is that by properly vectorizing the computation, we can achieve significant performance improvements over sequential integration. .. note:: This implementation is specifically for 1D integrals. Extending it to higher dimensions would require more careful handling of the grid generation and result calculation. .. _tutorial_multi_gpu: Multi-GPU Usage --------------- While torchquad doesn't have a built-in device parameter for selecting specific GPUs, you can effectively use multiple GPUs using standard PyTorch practices and environment variables. Using CUDA_VISIBLE_DEVICES ``````````````````````````` The recommended way to control which GPU torchquad uses is through the ``CUDA_VISIBLE_DEVICES`` environment variable: .. code:: bash # Use only GPU 0 export CUDA_VISIBLE_DEVICES=0 python your_integration_script.py # Use only GPU 1 export CUDA_VISIBLE_DEVICES=1 python your_integration_script.py # Use GPUs 0 and 2 export CUDA_VISIBLE_DEVICES=0,2 python your_integration_script.py This approach has several advantages: - **Clean separation**: Each process sees only the specified GPU(s) - **No code changes**: Your torchquad code remains unchanged - **Standard practice**: This is the recommended approach in the PyTorch community - **Process isolation**: Different processes can use different GPUs without interference Parallel Processing with Multiple GPUs ``````````````````````````````````````` For compute-intensive workloads that can be parallelized, you can spawn multiple processes, each using a different GPU: .. code:: ipython3 import multiprocessing as mp import os import torch from torchquad import MonteCarlo, set_up_backend def run_integration_on_gpu(gpu_id, integration_params, result_queue): """Run integration on a specific GPU""" # Set the GPU for this process os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id) # Initialize torchquad set_up_backend("torch", data_type="float32") # Your integration code here mc = MonteCarlo() result = mc.integrate( integration_params['fn'], dim=integration_params['dim'], N=integration_params['N'], integration_domain=integration_params['domain'], backend="torch" ) result_queue.put((gpu_id, result.item())) def parallel_integration_example(): """Example of parallel integration across multiple GPUs""" # Define your integration parameters def integrand(x): return torch.sin(x[:, 0]) + torch.exp(x[:, 1]) integration_params = { 'fn': integrand, 'dim': 2, 'N': 100000, 'domain': [[0, 1], [-1, 1]] } # Check available GPUs available_gpus = list(range(torch.cuda.device_count())) if not available_gpus: print("No CUDA GPUs available") return print(f"Using GPUs: {available_gpus}") # Create processes for each GPU processes = [] result_queue = mp.Queue() for gpu_id in available_gpus: p = mp.Process( target=run_integration_on_gpu, args=(gpu_id, integration_params, result_queue) ) p.start() processes.append(p) # Collect results results = {} for _ in available_gpus: gpu_id, result = result_queue.get() results[gpu_id] = result # Wait for all processes to complete for p in processes: p.join() print("Results from each GPU:") for gpu_id, result in sorted(results.items()): print(f" GPU {gpu_id}: {result:.6f}") return results Use Cases for Multi-GPU Integration ```````````````````````````````````` 1. **Parameter Sweeps**: Run the same integration with different parameters on different GPUs 2. **Different Integration Methods**: Compare multiple integration methods simultaneously 3. **Monte Carlo with Different Seeds**: Run multiple Monte Carlo integrations with different random seeds for error estimation 4. **Batch Processing**: Process multiple independent integration problems in parallel Example: Monte Carlo Error Estimation `````````````````````````````````````` .. code:: ipython3 import subprocess import numpy as np def monte_carlo_error_estimation(): """Estimate integration error using multiple independent Monte Carlo runs""" # Script content for each GPU process script_template = ''' import os import torch from torchquad import MonteCarlo, set_up_backend os.environ['CUDA_VISIBLE_DEVICES'] = '{gpu_id}' set_up_backend("torch", data_type="float32") def integrand(x): return torch.sin(x[:, 0]) + torch.exp(x[:, 1]) mc = MonteCarlo() result = mc.integrate( integrand, dim=2, N=50000, integration_domain=[[0, 1], [-1, 1]], seed={seed}, backend="torch" ) print(result.item()) ''' num_gpus = torch.cuda.device_count() runs_per_gpu = 5 results = [] processes = [] for gpu_id in range(num_gpus): for run in range(runs_per_gpu): seed = gpu_id * runs_per_gpu + run + 1000 script = script_template.format(gpu_id=gpu_id, seed=seed) # Launch subprocess process = subprocess.Popen( ['python', '-c', script], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True ) processes.append(process) # Collect results for process in processes: stdout, stderr = process.communicate() if process.returncode == 0: results.append(float(stdout.strip())) else: print(f"Error in subprocess: {stderr}") # Calculate statistics results = np.array(results) mean_result = np.mean(results) std_error = np.std(results) / np.sqrt(len(results)) print(f"Monte Carlo Results from {len(results)} runs:") print(f" Mean: {mean_result:.6f}") print(f" Standard Error: {std_error:.6f}") print(f" 95% Confidence Interval: [{mean_result - 1.96*std_error:.6f}, {mean_result + 1.96*std_error:.6f}]") return mean_result, std_error Best Practices for Multi-GPU Usage ``````````````````````````````````` 1. **Use CUDA_VISIBLE_DEVICES**: This is the cleanest way to control GPU selection 2. **Process-based parallelism**: Use ``multiprocessing`` rather than threading for true parallelism 3. **Memory management**: Each GPU process will have its own memory space 4. **Load balancing**: Distribute work evenly across available GPUs 5. **Error handling**: Handle cases where specific GPUs might be unavailable or busy .. warning:: Avoid using ``torch.cuda.set_device()`` within torchquad applications, as this can interfere with torchquad's internal device management. Always use ``CUDA_VISIBLE_DEVICES`` instead. Custom Integrators ------------------ It is of course possible to extend our provided Integrators, perhaps for a special class of functions or for a new algorithm. .. code:: ipython3 import scipy from torchquad import Gaussian from autoray import numpy as anp class GaussHermite(Gaussian): """ Gauss Hermite quadrature rule in torch, for integrals of the form :math:`\\int_{-\\infty}^{+\\infty} e^{-x^{2}} f(x) dx`. It will correctly integrate polynomials of degree :math:`2n - 1` or less over the interval :math:`[-\\infty, \\infty]` with weight function :math:`f(x) = e^{-x^2}`. See https://en.wikipedia.org/wiki/Gauss%E2%80%93Hermite_quadrature """ def __init__(self): super().__init__() self.name = "Gauss-Hermite" self._root_fn = scipy.special.roots_hermite @staticmethod def _apply_composite_rule(cur_dim_areas, dim, hs, domain): """Apply "composite" rule for gaussian integrals cur_dim_areas will contain the areas per dimension """ # We collapse dimension by dimension for _ in range(dim): cur_dim_areas = anp.sum(cur_dim_areas, axis=len(cur_dim_areas.shape) - 1) return cur_dim_areas gh=GaussHermite() integral=gh.integrate(lambda x: 1-x,dim=1,N=200) #integral from -inf to inf of np.exp(-(x**2))*(1-x) # Computed integral was 1.7724538509055168. # analytic result = sqrt(pi)