The manufacturing of Chinese medicines often faces challenges such as poor product consistency, high solvent consumption, and long processing times. The percolation process is a commonly used technique for extracting medicinal herbs. Significant variation in percolate concentration and low concentration near the endpoint make it difficult for existing online detection technologies to accurately determine target component concentrations. To address this, the study developed an online monitoring system integrating multi-modal sensors for physical quantity, image, and spectral data. Using Xiaochaihu capsules, real-time multimodal data were collected, including over 20,000 physical quantity points, 14,000 spectra, and 14,000 images. A Transformer-based framework, PMFormer, was proposed, with interpolation-based data augmentation to alleviate the "data-rich but label-scarce" problem. PMFormer achieved R2 values of 0.96, 0.94, and 0.91 for 6-gingerol, 8-gingerol, and adenine, with RMSEs below 2.4, 0.4, and 1.8 μg/mL, respectively. A quantitative extraction control strategy was developed, determining the percolation endpoint when the accumulated total mass of collection (ATMC) met quality control limits. Validation showed improved consistency, reduced solvent use, and enhanced efficiency, aligning with Lean Six Sigma concepts. This study provides a reference for online monitoring of TCM percolation processes and demonstrates the potential of multimodal data fusion in pharmaceutical manufacturing.