Conditional Tabular GConditional Tabular GAN-Augmented Ensemble Learning for Off-Design Performance Prediction of a Gas Turbine-ORC Combined Power Cycle Under Limited Operational Data

Authors

  • Mujahed Kareem Al-Samawa Technical Institute / Al-Furat Al-Awsat Technical University (ATU) Author

DOI:

https://doi.org/10.52262/jmjke858

Keywords:

gas turbine–ORC combined cycle; conditional tabular GAN; data augmentation; ensemble machine learning; off-design performance prediction

Abstract

Gas turbine–organic Rankine cycle (GT-ORC) combined power systems operate over a broad off-design range determined by the ambient state and the load demand, but in recently commissioned or prototype plants experimental operating data for model calibration and validation may be limited. Here, a framework is proposed that combines conditional tabular generative adversarial network (CTGAN) and synthetic minority oversampling technique (SMOTE) data augmentation with a stacking ensemble of five gradient-boosted and bagging-based models (Random Forest, XGBoost, LightGBM, CatBoost, Ridge-meta-learned stacking regressor) for combined cycle power output, thermal efficiency, and ORC net power predictions under scarce and noisy data conditions. A physics-based thermodynamic model is used to create the benchmark dataset from which a limited (N = 200–300) subset is sampled to represent the conditions typically available. SMOTE 3× augmentation is found to yield good results consistently across all models compared to CTGAN, with the best combinations reaching R 2 = 0.9998, 0.9997, and 0.9906 for CC_power_kW, eta_CC, and ORC_power_kW, respectively. SHAP analysis shows the ensemble models learn feature–target relationships that are physically meaningful. To our knowledge, this is the first demonstration of CTGAN-augmented ensemble learning for off-design performance prediction of GT-ORC systems, representing a transferable data-efficient machine learning solution for surrogate modeling of energy systems.

References

Kehlhofer, R., Hannemann, F., Stirnimann, F., & Rukes, B. (2009). Combined-cycle gas and steam turbine power plants (3rd ed.). PennWell. https://doi.org/10.1016/B978-0-7506-6357-2.X5000-3

Chacartegui, R., Sánchez, D., Muñoz, J. M., & Sánchez, T. (2009). Alternative ORC bottoming cycles for combined cycle power plants. Applied Energy, 86(10), 2162–2170. https://doi.org/10.1016/j.apenergy.2009.02.016

Li, Y., Lin, Y., He, Y., Zhang, G., Zhang, L., Yang, J., & Sun, E. (2023). Part-load performance analysis of a dual-recuperated gas turbine combined cycle system. Energy, 269, 126744. https://doi.org/10.1016/j.energy.2023.126744

Yang, Y., Bai, Z., Zhang, G., Li, Y., Wang, Z., & Yu, G. (2019). Design/off-design performance simulation and discussion for the gas turbine combined cycle with inlet air heating. Energy, 178, 386–399. https://doi.org/10.1016/j.energy.2019.04.136

Sanjay, Y., Singh, O., & Prasad, B. N. (2007). Energy and exergy analysis of steam cooled reheat gas–steam combined cycle. Applied Thermal Engineering, 27(17–18), 2779–2790. https://doi.org/10.1016/j.applthermaleng.2007.03.011

Liu, Z., & Karimi, I. A. (2020). Gas turbine performance prediction via machine learning. Energy, 192, 116627. https://doi.org/10.1016/j.energy.2019.116627

Hundi, P., & Shahsavari, R. (2020). Comparative studies among machine learning models for performance estimation and health monitoring of thermal power plants. Applied Energy, 265, 114775. https://doi.org/10.1016/j.apenergy.2020.114775

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154.

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31, 6638–6648.

Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84–90. https://doi.org/10.1016/j.inffus.2021.11.011

Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144. https://doi.org/10.1145/3422622

Siddiqui, R., Anwar, H., Ullah, F., Ullah, R., Rehman, M. A., Jan, N., & Zaman, F. (2021). Power prediction of combined cycle power plant (CCPP) using machine learning algorithm-based paradigm. Wireless Communications and Mobile Computing, 2021, 9966395. https://doi.org/10.1155/2021/9966395

Dai, S., Zhang, X., & Luo, M. (2024). A novel data-driven approach for predicting the performance degradation of a gas turbine. Energies, 17(4), 781. https://doi.org/10.3390/en17040781

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Torgo, L., Ribeiro, R. P., Pfahringer, B., & Branco, P. (2013). SMOTE for regression. In Progress in Artificial Intelligence (pp. 378–389). Springer. https://doi.org/10.1007/978-3-642-40669-0_33

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2672–2680.

Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling tabular data using conditional GAN. Advances in Neural Information Processing Systems, 32, 7335–7345.

Shen, K., Zhang, H., Tong, X., & Lu, H. (2025). Advancing LightGBM with data augmentation for predicting the residual strength of corroded pipelines. npj Materials Degradation, 9, 128. https://doi.org/10.1038/s41529-025-00673-9

Pacífico, L. D. S., Maciel, T. T., & Ludermir, T. B. (2024). Strategic data augmentation with CTGAN for smart manufacturing: Enhancing ML predictions of paper breaks in pulp-and-paper production. Journal of Intelligent Manufacturing, 36, 2255–2268. https://doi.org/10.1007/s10845-024-02453-9

Habibi, O., Chemmakha, M., & Lazaar, M. (2023). Imbalanced tabular data modelization using CTGAN and machine learning to improve IoT Botnet attacks detection. Engineering Applications of Artificial Intelligence, 118, 105669. https://doi.org/10.1016/j.engappai.2022.105669

Santarisi, N. S., & Faouri, S. S. (2021). Prediction of combined cycle power plant electrical output power using machine learning regression algorithms. Eastern-European Journal of Enterprise Technologies, 6(8/114), 17–28. https://doi.org/10.15587/1729-4061.2021.245663

Oyekale, J., Heberle, F., & Brüggemann, D. (2023). Machine learning for design and optimization of organic Rankine cycle plants: A review of current status and future perspectives. WIREs Energy and Environment, 12(4), e474. https://doi.org/10.1002/wene.474

Wang, S., Liu, C., Li, Q., Liu, L., Huo, E., & Zhang, C. (2023). Comparison of random forest, support vector regression, and long short term memory for performance prediction and optimization of a cryogenic organic Rankine cycle (ORC). Energy, 280, 128069. https://doi.org/10.1016/j.energy.2023.128069

Wang, Q., & Lu, H. (2024). A novel stacking ensemble learner for predicting residual strength of corroded pipelines. npj Materials Degradation, 8, 87. https://doi.org/10.1038/s41529-024-00508-z

Downloads

Published

2026-06-02