To reduce simulation uncertainty and improve process understanding of hydrological systems, integrating models with observations through data assimilation techniques is paramount. Among these, the ensemble smoother (ES) stands out for its ease of implementation and high computational efficiency. When tackling problems involving non-linear processes and non-Gaussian distributions, leveraging deep learning (DL) within ES, termed the ES(DL) method, proves superior to the Kalman-based counterpart. However, the original ES(DL) method is constrained by the traditional paradigm that builds a mapping from the innovation vector (i.e., the difference between observations and model predictions) to the update vector (i.e., the difference between posterior and prior state/parameters). In this study, we introduce a generalized form of ES(DL), where the traditional ES(DL) approach becomes its special case. We explore the optimal implementation of ES(DL) through surface and subsurface scenarios, spanning various dimensions (low to high) and parameter distributions (Gaussian to non-Gaussian). Notably, certain implementations of ES(DL), which diverge significantly from the traditional approach, can yield similar or even better outcomes, especially under the non-Gaussian condition. While this study’s focus lies on the smoothing approach for parameter estimation, the proposed formulation can be extended to filtering problems, facilitating model state updates.