A Scaffold-Aware Machine Learning Docking Pipeline for TYK2 Inhibitor Discovery with Calibrated Prioritization of 32 Active Compounds Including Deucravacitinib

Authors

  • Saima Akram Electrical Engineering Department, National Fertilizer Corporation Institute of Engineering and Technology (NFC IET), Multan, Pakistan Author
  • Muhammad Shahid Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan. Author
  • Ghulam Muhy Ud Deen Raee Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan. Author
  • Muhammad Allah Razi Department of Computer and Software Engineering, The Khwaja Fareed University of Engineering and Information Technology (KFUEIT) Author

Keywords:

TYK2 inhibitors, Deucravacitinib, Scaffold-aware machine learning, Molecular docking, Bioactivity prediction, Cheminformatics

Abstract

Tyrosine kinase 2 (TYK2) represents a proven target in the immunology field, with the FDA-approved TYK2 inhibitor deucravacitinib being a prime example. Nonetheless, the identification of new modulators for TYK2 has proven difficult owing to noisy protein-target activity data, presence of a bias in a molecule’s core chemical structure, also known as a scaffold, and the high cost of screen-and-lead experiments in the laboratory. In the current study, we introduced a machine learning strategy focused on scaffolds, incorporating a rigorous data curation process, generation of molecule fingerprint embedders, and the application of calibrated classification algorithms followed by the implementation of physical molecule docking experiments. The curated TYK2 protein-target activity data (pIC50) were converted into ECFP4, MACCS, and a series of physicochemical features, and the resulting collection of features was reduced via variance and correlation pruning. The performance of the classification tasks was evaluated using the Support Vector Machine, Random Forest, and XGBoost algorithms using a two-step process involving a scaffold-split validation strategy, so the model can, in a probable and reliable manner, generalize in the chemical space. The XGBoost model, among the tested machine learning algorithms, showed the best possible results, having achieved an accuracy of 87.5%, an F1-score of 91.3%, and an area under the curve of 95.1% in the task of protein-target classification of the TYK2 kinase. The optimized model was applied to a screening library comprised of more than 10,000 different chemical structures, and the top 32 active structures were filtered out using a probability threshold of 95%, and the structures indeed displayed stable docking geometries in subsequent docking experiments using the Surflex-TBS docking tool. Importantly, the model was also able to predict the known TYK2 inhibitors deucravacitinib, predicting it and ranking it among the active chemical structures, thus validating the proposed machine learning model in the presence of unseen protein-target patterns in the chemical space.

References

Abbas, M. A., Khan, M. Z., Atif, H. M., Shahzad, A., & Mahar, J. (2025). Computer-Aided Analysis of Oxino-bis-Pyrazolederivative as a Potential Breast Cancer Drug Based on DFT, Molecular Docking, and Pharmacokinetic Studies: Compared with the Standard Drug Tamoxifen. Indus Journal of Bioscience Research, 3(6), 535-537

Abbas, M. A., Mahar, J., Hameed, N., & Rasool, M. S. (2025). DFT-Guided Design of a Low-Band-Gap Pyrazoline Scaffold: The Critical Role of a Para-Nitro Substituent. Multidisciplinary Surgical Research Annals, 3(3), 461-503.

Abbas, M. A., Mahar, J., Khan, M. J., Rasool, M. S., & Khan, M. Z. (2025). In Silico Investigation Of 3, 6-Diphenyl-[1, 2, 4] Triazolo [3, 4-B][1, 3, 4] Thiadiazole Derivatives As EGFR Modulators For Lung Cancer Treatment. The Cancer Research Review, 4(2), 243-308.

Abbas, M. A., Mahar, J., Rasool, M. S., Khan, M. J., & Khan, M. Z. (2025). The Dual Therapeutic Promise of Quinoa: Exploring Antidiabetic and Antioxidant Effects through Experimental and Computational Models. Multidisciplinary Surgical Research Annals, 3(3), 504-544.

Dendrou, C.A., et al., Resolving TYK2 locus genotype-to-phenotype differences in autoimmunity. Science translational medicine, 2016. 8(363): p. 363ra149-363ra149.

Deore, S., et al., 2-(3, 4-Dihydroxyphenyl)-5, 7 Dihydroxy-4H-Chromen-4-One Flavones Based Virtual Screening for Potential JAK Inhibitors in Inflammatory Disorders. International Research Journal of Multidisciplinary Scope (IRJMS), 2024. 5(1): p. 557-567.

Fourches, D., E. Muratov, and A. Tropsha, Trust, but verify II: a practical guide to chemogenomics data curation. Journal of chemical information and modeling, 2016. 56(7): p. 1243-1252.

Gaulton, A., et al., The ChEMBL database in 2017. Nucleic acids research, 2017. 45(D1): p. D945-D954.

Guo, C., et al. On calibration of modern neural networks. in International conference on machine learning. 2017. PMLR.

Gurcan, F. (2025). Enhancing breast cancer prediction through stacking ensemble and deep learning integration. PeerJ Computer Science, 11, e2461. https://doi.org/10.7717/peerj cs.2461

Halder, A.K. and M.N.D. Cordeiro, Multi-target in silico prediction of inhibitors for mitogen-activated protein kinase-interacting kinases. Biomolecules, 2021. 11(11): p. 1670.

Hamed, G., Marey, M. A. E.-R., Amin, S. E.-S., & Tolba, M. F. (2020). Deep learning in breast cancer detection and classification. In Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020) (pp. 322–333). Springer.

Hassan, M. M., Yasmin, F., Khan, M. A. R., Zaman, S., Islam, K. K., Bairagi, A. K., et al. (2023). A comparative assessment of machine learning algorithms with the least absolute shrinkage and selection operator for breast cancer detection and prediction. Decision Analytics Journal, 7, 100245.

Karaghiosoff, M., et al., Central role for type I interferons and Tyk2 in lipopolysaccharide-induced endotoxin shock. Nature immunology, 2003. 4(5): p. 471-477.

Lavecchia, A. and C. Di Giovanni, Virtual screening strategies in drug discovery: a critical review. Current medicinal chemistry, 2013. 20(23): p. 2839-2860.

Lenselink, E.B. and P.F. Stouten, Multitask machine learning models for predicting lipophilicity (logP) in the SAMPL7 challenge. Journal of Computer-Aided Molecular Design, 2021. 35(8): p. 901-909.

Minegishi, Y. and H. Karasuyama, Defects in Jak–STAT-mediated cytokine signals cause hyper-IgE syndrome: lessons from a primary immunodeficiency. International immunology, 2009. 21(2): p. 105 112.

Niculescu-Mizil, A. and R. Caruana. Obtaining Calibrated Probabilities from Boosting. in UAI. 2005.

O'Shea, J.J., et al., The JAK-STAT pathway: impact on human disease and therapeutic intervention. Annual review of medicine, 2015. 66(1): p. 311-328.

Paul, S.M., et al., How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nature reviews Drug discovery, 2010. 9(3): p. 203-214.

Ponraj, A., Nagaraj, P., Balakrishnan, D., Srinivasu, P. N., Shafi, J., Kim, W., & Ijaz, M. F. (2025). A multi-patch-based deep learning model with VGG19 for breast cancer classifications in pathology images. Digital Health, 11, 1–21. https://doi.org/10.1177/20552076241313 161

Priyanka, K. S. (2021). A review paper on breast cancer detection using deep learning. IOP Conference Series: Materials Science and Engineering, 1022, 012071. IOP Publishing.

Ramsundar, B., et al., Is multitask deep learning practical for pharma? Journal of chemical information and modeling, 2017. 57(8): p. 2068-2076.

Sharafaddini, S., et al. (2024). A comprehensive review of deep learning methods for breast cancer imaging. Multimedia Tools and Applications. (From file s11042-024-20011-6)

Strober, B., et al., Deucravacitinib versus placebo and apremilast in moderate to severe plaque psoriasis: Efficacy and safety results from the 52-week, randomized, double-blinded, phase 3 Program fOr Evaluation of TYK2 inhibitor psoriasis second trial. Journal of the American Academy of Dermatology, 2023. 88(1): p. 40-51.

Tafavvoghi, M., Sildnes, A., Rakaee, M., Shvetsov, N., Bongo, L. A., Busund, L.-T. R., & Møllersen, K. (2024). Deep learning-based classification of breast cancer molecular subtypes from H&E whole-slide images. Journal of Pathology Informatics, 16, 100410. https://doi.org/10.1016/j.jpi.2024.100410

Watford, W.T., et al., Signaling by IL‐12 and IL‐23 and the immunoregulatory roles of STAT4. Immunological reviews, 2004. 202(1): p. 139 156.

Yuan, S., et al., Mendelian randomization and clinical trial evidence supports TYK2 inhibition as a therapeutic target for EBioMedicine, 2023. 89. autoimmune diseases.

Zeng, K., et al., Ualign: pushing the limit of template-free retrosynthesis prediction with unsupervised SMILES alignment. Journal of Cheminformatics, 2024. 16(1): p. 80.

Downloads

Published

2025-12-31

How to Cite

A Scaffold-Aware Machine Learning Docking Pipeline for TYK2 Inhibitor Discovery with Calibrated Prioritization of 32 Active Compounds Including Deucravacitinib. (2025). International Research Journal of Management and Social Sciences, 6(4), 42-55. https://irjmss.com/index.php/irjmss/article/view/484

Similar Articles

91-100 of 101

You may also start an advanced similarity search for this article.