Options
Juda, Przemyslaw
Résultat de la recherche
Comparison of three recent discrete stochastic inversion methods and influence of the prior choice
2024, Juda, Przemyslaw, Straubhaar, Julien, Renard, Philippe
Groundwater flow depends on subsurface heterogeneity, which often calls for categorical fields to represent different geological facies. The knowledge about subsurface is however limited and often provided indirectly by state variables, such as hydraulic heads of contaminant concentrations. In such cases, solving a categorical inverse problem is an important step in subsurface modeling. In this work, we present and compare three recent inverse frameworks: Posterior Population Expansion (PoPEx), Ensemble Smoother with Multiple Data Assimilation (ESMDA), and DREAM-ZS (a Markov chain Monte Carlo sampler). PoPEx and ESDMA are used with Multiple-point statistics (MPS) as geostatistical engines, and DREAM-ZS is used with a Wasserstein generative adversarial network (WGAN). The three inversion methods are tested on a synthetic example of a pumping test in a fluvial channelized aquifer. Moreover, the inverse problem is solved three times with each method, each time using a different training image to check the performance of the methods with different geological priors. To assess the quality of the results, we propose a framework based on continuous ranked probability score (CRPS), which compares single true values with predictive distributions. All methods performed well when using the training image used to create the reference, but their performances were degraded with the alternative training images. PoPEx produced the least geological artifacts but presented a rather slow convergence. ESMDA showed initially a very fast convergence which reaches a plateau, contrary to the remaining methods. DREAM-ZS was overly confident in placing some incorrect geological features but outperformed the other methods in terms of convergence.
Ice volume and basal topography estimation using geostatistical methods and GPR measurements: Application on the Tsanfleuron and Scex Rouge glacier, Swiss Alps
2021-7, Néven, Alexis, Dall Alba, Valentin, Juda, Przemyslaw, Straubhaar, Julien, Renard, Philippe
Ground Penetrating Radar (GPR) is nowadays widely used for determining glacier thickness. However, this method provides thickness data only along the acquisition lines and therefore interpolation has to be made between them. Depending on the interpolation strategy, calculated ice volumes can differ and can lack an accurate error estimation. Furthermore, glacial basal topography is often characterized by complex geomorphological features, which can be hard to reproduce using classical 5 interpolation methods, especially when the conditioning data are sparse or when the morphological features are too complex. This study investigates the applicability of multiple-point statistics (MPS) simulations to interpolate glacier bedrock topography using GPR measurements. In 2018, a dense GPR data set was acquired on the Tsanfleuron Glacier (Switzerland). The results obtained with the direct sampling MPS method are compared against those obtained with kriging and sequential Gaussian simulations (SGS) on both a synthetic data set – with known reference volume and bedrock topography – and the real data 10 underlying the Tsanfleuron glacier. Using the MPS modelled bedrock, the ice volume for the Scex Rouge and Tsanfleuron Glacier is estimated to be 113.9 ± 1.6 Miom3 . The direct sampling approach, unlike the SGS and the kriging, allowed not only an accurate volume estimation but also the generation of a set of realistic bedrock simulations. The complex karstic geomorphological features are reproduced, and can be used to significantly improve for example the precision of under-glacial flow estimation.
Using Generative Adversarial Networks as a Fast Forward Operator for Hydrogeological Inverse Problems
2020-4, Dagasin, Yasin, Juda, Przemyslaw, Renard, Philippe
Subsurface characterization using inverse techniques constitutes one of the fundamental elements of hydrogeological modeling applications. Available methods to solve inverse problems rely on a forward operator that predicts state variables for a given set of subsurface parameters. As the number of model parameters to be estimated increases, forward operations incur a significant computational demand. In this paper, we investigate the use of conditional generative adversarial networks (cGAN) as an emulator for the forward operator in the context of a hydrogeological inverse problem. We particularly investigate if the cGAN can be used to replace the forward operator used in the adaptive importance sampling method posterior population expansion (PoPEx) with reasonable accuracy and feasible computation requirement. The cGAN model trained on channelized geological structures has shown that the cGAN is able to reproduce the state variables corresponding to a certain parameter field. Hence, its integration in PoPEx yielded satisfactory results. In terms of the computational demand, the use of cGAN as a surrogate forward model reduces the required computational time up to 80% for the problem defined in the study. However, the training time required to create a model seems to be the major drawback of the method.
Discrete stochastic inversion: getting closer to hydrogeological applications
2022, Juda, Przemyslaw
Les méthodes d’inversion discrète stochastiques permettent de reproduire correctement la situation géologique et de quantifier l’incertitude. Ces deux aspects sont cruciaux pour la gestion des eaux souterraines et pour l’application des méthodes stochastiques en hydrogéologie. Cependant, dans la pratique ces méthodes présentent deux défis majeurs : le choix d’une représentation a priori correcte et un coût de calcul élevé. Cette thèse aborde ces problèmes afin de faciliter les applications futures de l’inversion stochastique discrète sur les données hydrogéologiques.
Des stratégies sont présentées pour la sélection de la représentation a priori dans le contexte des simulations géostatistiques, et en particulier des simulations multipoints. Lorsque des données de conditionnement sont disponibles, une méthode de validation croisée pour les variables catégorielles peut être utilisé. Cette méthode permet de régler n’importe quel paramètre des simulations géostatistiques, par exemple le choix de l’image d’entraînement pour les simulations multipoints. Un cas test avec un modèle simplifié de l’aquifère de la plaine du Roussillon a confirmé la validité de la méthode. Un autre outil présenté dans cette thèse est l’algorithme DSBC (Direct Sampling Best Candidate), qui possède moins de paramètres algorithmiques que l’algorithme DS (Direct Sampling). Il conserve néanmoins tous les avantages de DS, mais simplifie le choix des paramètres, qui est souvent effectué avant l’inversion. Pour les cas tests que nous avons étudiés, la qualité de simulation de DSBC était meilleure que celle de DS pour les simulations conditionnelles, et légèrement moins bonne, mais satisfaisante, pour les simulations non conditionnelles.
Quant à l’amélioration des performances computationnelles de l’inversion, des algorithmes d’apprentissage automatique sont proposés pour accélérer l’algorithme PoPEx (posterior population expansion). Avec le Random Forest et AdaBoost, des facteurs d’accélération de PoPEx d’environ deux fois ont été observés, lorsqu’ils ont été appliqués à un cas synthétique d’inversion des données d’essai de traçage. Ces techniques pourraient être utilisées pour d’autres algorithmes d’inversion Monte Carlo. Une autre solution pour améliorer la convergence (et la quantification de l’incertitude) PoPEx a également été développée : la vraisemblance tempérée (tempered likelihood ). Elle permet d’éviter de réduire la dimensionnalité des données avant l’inversion (comme suggéré par les études précédentes sur PoPEx) et atténue le problème d’une fonction de vraisemblance très pointue. Le point final de la thèse est une comparaison de trois méthodes récentes d’inversion discrète : PoPEx, ensemble smoother with multiple data assimilation (ESMDA), et DREAM-ZS. Un cas avec les données synthétiques (mais réalistes) d’un test de pompage a montré que les trois méthodes sont assez performantes, à condition d’utiliser la représentation du prior correcte. Cependant, le choix du prior est essentiel, et avec des représentations mauvaises, représentées par différentes images d’entraînement, les performances des méthodes sont fortement affectées. Les performances ont été mesurées à l’aide de scores probabilistes sur des données assimilées et sur la zone de protection des eaux souterraines de 10 jours.
Abstract
Stochastic discrete inversion methods allow capturing geological realism and quantify uncertainty, the two aspects that are crucial for groundwater management and the application of stochastic methods in hydrogeology. However, these methods present two major practical challenges: the choice of a correct prior representation and a high computational cost. This thesis addresses these challenges to facilitate future applications of discrete stochastic inversion on hydrogeological data.
Strategies for prior selection in the context of geostatistical simulations, and in particular multiple-point statistics are presented. When prior conditioning data is available, a cross-validation framework for categorical variables can be used with scoring rules. The framework can be used for tuning every parameter of geostatistical simulations, for example, choosing the training image for multiple point-statistics. A test case representing a simplified model of the Roussillon plain aquifer confirms the validity of the framework. Another tool presented in this thesis is the Direct Sampling Best Candidate (DSBC) algorithm, which has fewer algorithmic features than the Direct Sampling (DS) algorithm. It retains, however, all the advantages of DS, but simplifies the choice of the parameters, which is often done before the inversion. For the test cases that we studied, the simulation quality of DSBC was better than DS for conditional simulations, and slightly worse, but satisfactory, for unconditional simulations.
As for improving the computational performance of the inversion, machine learning algorithms are proposed to speed-up posterior population expansion (PoPEx). With random forest and AdaBoost, speed-up factors of PoPEx of around two times were observed, when applied to a synthetic tracer test data. These machine learning techniques have the potential to be used for other Monte Carlo inversions. Another solution for improving PoPEx convergence was also developed: a tempered likelihood, allowing to improve the uncertainty quantification. It alleviates the need to reduce the dimensionality of the data before inversion (as suggested by previous studies on PoPEx) and mitigates the problem of a very sharp likelihood function. The final point of the thesis is a comparison of three recent discrete inversion methods: PoPEx, ensemble smoother with multiple data assimilation, and DREAM-ZS. A synthetic but realistic pumping test case showed that all three methods perform fairly well, provided that a correct prior is used. However, the choice of the prior is essential, and with wrong priors, represented by different training images, the performance of the methods is strongly affected. The performance was measured with probabilistic scores on assimilated data and the 10-day groundwater protection zone.
An Attempt to Boost Posterior Population Expansion Using Fast Machine Learning Algorithms
2021-3, Juda, Przemyslaw, Renard, Philippe
In hydrogeology, inverse techniques have become indispensable to characterize subsurface parameters and their uncertainty. When modeling heterogeneous, geologically realistic discrete model spaces, such as categorical fields, Monte Carlo methods are needed to properly sample the solution space. Inversion algorithms use a forward operator, such as a numerical groundwater solver. The forward operator often represents the bottleneck for the high computational cost of the Monte Carlo sampling schemes. Even if efficient sampling methods (for example Posterior Population Expansion, PoPEx) have been developed, they need significant computing resources. It is therefore desirable to speed up such methods. As only a few models generated by the sampler have a significant likelihood, we propose to predict the significance of generated models by means of machine learning. Only models labeled as significant are passed to the forward solver, otherwise, they are rejected. This work compares the performance of AdaBoost, Random Forest, and convolutional neural network as classifiers integrated with the PoPEx framework. During initial iterations of the algorithm, the forward solver is always executed and subsurface models along with the likelihoods are stored. Then, the machine learning schemes are trained on the available data. We demonstrate the technique using a simulation of a tracer test in a fluvial aquifer. The geology is modeled by the multiple-point statistical approach, the field contains four geological facies, with associated permeability, porosity, and specific storage values. MODFLOW is used for groundwater flow and transport simulation. The solution of the inverse problem is used to estimate the 10 days protection zone around the pumping well. The estimated speed-ups with Random Forest and AdaBoost were higher than with the convolutional neural network. To validate the approach, computing times of inversion without and with machine learning schemes were computed and the error against the reference solution was calculated. For the same mean error, accelerated PoPEx achieved a speed-up rate of up to 2 with respect to the standard PoPEx.
A parsimonious parametrization of the Direct Sampling algorithm for multiple-point statistical simulations
2022, Juda, Przemyslaw, Renard, Philippe, Straubhaar, Julien
Multiple-point statistics algorithms allow modeling spatial variability from training images. Among these techniques, the Direct Sampling (DS) algorithm has advanced capabilities, such as multivariate simulations, treatment of non-stationarity, multi-resolution capabilities, conditioning by inequality or connectivity data. However, finding the right trade-off between computing time and simulation quality requires tuning three main parameters, which can be complicated since simulation time and quality are affected by these parameters in a complex manner. To facilitate the parameter selection, we propose the Direct Sampling Best Candidate (DSBC) parametrization approach. It consists in setting the distance threshold to 0. The two other parameters are kept (the number of neighbors and the scan fraction) as well as all the advantages of DS. We present three test cases that prove that the DSBC approach allows to identify efficiently parameters leading to comparable or better quality and computational time than the standard DS parametrization. We conclude that the DSBC approach could be used as a default mode when using DS, and that the standard parametrization should only be used when the DSBC approach is not sufficient.
A Framework for the Cross‐Validation of Categorical Geostatistical Simulations
2020-6, Juda, Przemyslaw, Renard, Philippe, Straubhaar, Julien
The mapping of subsurface parameters and the quantification of spatial uncertainty requires selecting adequate models and their parameters. Cross‐validation techniques have been widely used for geostatistical model selection for continuous variables, but the situation is different for categorical variables. In these cases, cross‐validation is seldom applied, and there is no clear consensus on which method to employ. Therefore, this paper proposes a systematic framework for the cross‐validation of geostatistical simulations of categorical variables such as geological facies. The method is based on K‐fold cross‐validation combined with a proper scoring rule. It can be applied whenever an observation data set is available. At each cross‐validation iteration, the training set becomes conditioning data for the tested geostatistical model, and the ensemble of simulations is compared to true values. The proposed framework is generic. Its application is illustrated with two examples using multiple‐point statistics simulations. In the first test case, the aim is to identify a training image from a given data set. In the second test case, the aim is to identify the parameters in a situation including nonstationarity for a coastal alluvial aquifer in the south of France. Cross‐validation scores are used as metrics of model performance and quadratic scoring rule, zero‐one score, and balanced linear score are compared. The study shows that the proposed fivefold stratified cross‐validation with the quadratic scoring rule allows ranking the geostatistical models and helps to identify the proper parameters.