Magnetotelluric (MT) and wide-angle seismic reflection/refraction surveys play a fundamental role in understanding the crustal rheology and lithospheric structure of the Earth. In recent years, the integration of the two methods in order to improve the robustness of the inversion has started to gain attention. We present a new approach for joint 3-D inversion of MT and wide-angle seismic reflection/refraction data to accurately determine crustal structures and Moho depth. Based on H-κ stacking of teleseismic receiver functions (RFs), we estimate an initial reference Moho. This is used as input for the subsequent MT/seismic joint inversion, where the Moho interface is updated and crustal structures are added to the model. During the joint inversion process, structural similarity is facilitated through the cross-gradient constraint. Synthetic model tests show an improvement of the inversion results over separate inversions. In particular, the tests based on two geologically realistic models demonstrate that the crustal structure and even the trade-off between velocity and Moho interface can be sufficiently resolved by combined MT and seismic datasets when using the estimates from analysis of RFs. These results show that the new method can provide useful constraints on crustal structures including their geophysical properties and discontinuities.