Developing multimodal large language models (MLLMs) requires efficient semantic communication systems for diverse data modalities in constrained networks, highlighting the need for lightweight semantic communication models optimized for resource-constrained environments. A Mamba-based multiuser multimodal deep-learning semantic communication (3M-DeepSC) system is developed to serve MLLM communication to address this problem. The proposed framework applies the efficient Mamba architecture to replace traditional Transformerbased designs, improving performance and lowering the latency under diverse channel conditions. Moreover, a new semantic similarity metric is introduced to evaluate the system performance from a semantic perspective. In addition, a two-stage training algorithm is developed that jointly optimizes bit-based metrics and semantic similarity. According to the extensive results, the proposed 3M-DeepSC demonstrates promise as a robust, scalable solution supporting the increasing communication demands of MLLMs in diverse network environments.