BACKGROUNDColorectal cancer (CRC) ranks as the third most prevalent cancer worldwide, posing significant public health challenges. Late-stage detection often results in poor treatment outcomes, elevating mortality rates. The economic and psychological burdens of CRC treatment underscore the need for early detection.OBJECTIVEThis study aims to enhance the early detection of colorectal cancer by employing machine learning (ML) algorithms on non-invasive features. The focus is on constructing a comprehensive dataset, analyzing non-invasive features, and developing predictive models to minimize the necessity for invasive procedures such as colonoscopy. By focusing on non-invasive, easily accessible data, the study aims to develop a model that can be widely applied without the associated risks of invasive procedures.METHODSA retrospective dataset of 400 patients was sourced from the colorectal cancer unit of Royal Medical Services (2021-2022). The dataset included demographic data, imaging reports, laboratory results, and clinical evaluations. The study involved three experiments, training ML models (K-Nearest Neighbors (KNN), Super Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Naïve Bayes (NB)) on the collected dataset and a public dataset to validate generalizability. The first experiment used 35 features across the ML algorithms. The second experiment focused on the most informative features. The third experiment validated the models using a public dataset, with Phase I including all data and Phase II excluding missing values.RESULTSThe Random Forest (RF) algorithm consistently outperformed other models, achieving an accuracy of 95.8 % in the first experiment, increasing to 96.5 % in the second experiment. For the public dataset, RF accuracy was 66.0 % in Phase I and 68.9 % in Phase II. Conversely, the KNN algorithm exhibited the lowest accuracy across all experiments.CONCLUSIONThis study highlights the effectiveness of ML in early CRC detection using non-invasive techniques. The RF model demonstrated superior accuracy, suggesting its potential application in clinical settings. The research contributes valuable insights into CRC detection within the local context and emphasizes the broader applicability of ML in improving cancer diagnosis and personalized treatment.