How to cite this paper
Seifi, F & Niaki, S. (2024). Optimizing contextual bandit hyperparameters: A dynamic transfer learning-based framework.International Journal of Industrial Engineering Computations , 15(4), 951-964.
Refrences
Abreu, S. (2019). Automated architecture design for deep neural networks. arXiv preprint arXiv:1908.10714.
Agarwal, A., Dudík, M., Kale, S., Langford, J., & Schapire, R. (2012). Contextual bandit learning with predictable rewards. Paper presented at the Artificial Intelligence and Statistics.
Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L., & Schapire, R. (2014). Taming the monster: A fast and simple algorithm for contextual bandits. Paper presented at the International Conference on Machine Learning.
Agrawal, S., & Goyal, N. (2013). Thompson sampling for contextual bandits with linear payoffs. Paper presented at the International conference on machine learning.
Allesiardo, R., Féraud, R., & Bouneffouf, D. (2014). A neural networks committee for the contextual bandit problem. Paper presented at the Neural Information Processing: 21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014. Proceedings, Part I 21.
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235-256.
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1), 48-77.
Balakrishnan, A., Bouneffouf, D., Mattei, N., & Rossi, F. (2018). Using Contextual Bandits with Behavioral Constraints for Constrained Online Movie Recommendation. Paper presented at the IJCAI.
Balakrishnan, A., Bouneffouf, D., Mattei, N., & Rossi, F. (2019a). Incorporating behavioral constraints in online AI systems. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence.
Balakrishnan, A., Bouneffouf, D., Mattei, N., & Rossi, F. (2019b). Using multi-armed bandits to learn ethical priorities for online AI systems. IBM Journal of Research and Development, 63(4/5), 1: 1-1: 13.
Bastani, H., & Bayati, M. (2020). Online decision making with high-dimensional covariates. Operations Research, 68(1), 276-294.
Bastani, H., Bayati, M., & Khosravi, K. (2021). Mostly exploration-free algorithms for contextual bandits. Management Science, 67(3), 1329-1349.
Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in neural information processing systems, 24.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(2).
Bietti, A., Agarwal, A., & Langford, J. (2018). Practical evaluation and optimization of contextual bandit algorithms. Statistics, 1050, 12.
Bouneffouf, D. (2016). Exponentiated gradient exploration for active learning. Computers, 5(1), 1.
Bouneffouf, D., & Claeys, E. (2020). Hyper-parameter tuning for the contextual bandit. arXiv preprint arXiv:2005.02209.
Bouneffouf, D., & Féraud, R. (2016). Multi-armed bandit problem with known trend. Neurocomputing, 205, 16-21.
Bouneffouf, D., Laroche, R., Urvoy, T., Féraud, R., & Allesiardo, R. (2014). Contextual bandit for active learning: Active thompson sampling. Paper presented at the Neural Information Processing: 21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014. Proceedings, Part I 21.
Bouneffouf, D., Parthasarathy, S., Samulowitz, H., & Wistub, M. (2019). Optimal exploitation of clustering and history information in multi-armed bandit. arXiv preprint arXiv:1906.03979.
Bouneffouf, D., & Rish, I. (2019). A survey on practical applications of multi-armed and contextual bandits. arXiv preprint arXiv:1904.10040.
Bouneffouf, D., Rish, I., Cecchi, G. A., & Féraud, R. (2017). Context attentive bandits: Contextual bandit with restricted context. arXiv preprint arXiv:1705.03821.
Chu, J.-C. (2024). Hyperparameter optimization strategy for Multi-Armed Bandits: Genre recommendation in MovieLens dataset. Paper presented at the Proceedings of the 2023 International Conference on Machine Learning and Automation.
Di Francescomarino, C., Dumas, M., Federici, M., Ghidini, C., Maggi, F. M., Rizzi, W., & Simonetto, L. (2018). Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Information Systems, 74, 67-83.
Ding, Q., Hsieh, C.-J., & Sharpnack, J. (2021). An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling. Paper presented at the International Conference on Artificial Intelligence and Statistics.
Ding, Q., Kang, Y., Liu, Y.-W., Lee, T. C. M., Hsieh, C.-J., & Sharpnack, J. (2022). Syndicated bandits: A framework for auto tuning hyper-parameters in contextual bandit algorithms. Advances in Neural Information Processing Systems, 35, 1170-1181.
Dudik, M., Hsu, D., Kale, S., Karampatziakis, N., Langford, J., Reyzin, L., & Zhang, T. (2011). Efficient optimal learning for contextual bandits. arXiv preprint arXiv:1106.2369.
Falkner, S., Klein, A., & Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. Paper presented at the International Conference on Machine Learning.
Guo, B., Hu, J., Wu, W., Peng, Q., & Wu, F. (2019). The Tabu_genetic algorithm: a novel method for hyper-parameter optimization of learning algorithms. Electronics, 8(5), 579.
Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. Paper presented at the International conference on learning and intelligent optimization.
Hutter, F., Kotthoff, L., & Vanschoren, J. (2019). Automated machine learning: methods, systems, challenges: Springer Nature.
Injadat, M., Salo, F., Nassif, A. B., Essex, A., & Shami, A. (2018). Bayesian optimization with machine learning algorithms towards anomaly detection. Paper presented at the 2018 IEEE global communications conference (GLOBECOM).
Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., . . . Simonyan, K. (2017). Population based training of neural networks. arXiv preprint arXiv:1711.09846.
Jomaa, H. S., Grabocka, J., & Schmidt-Thieme, L. (2019). Hyp-rl: Hyperparameter optimization by reinforcement learning. arXiv preprint arXiv:1906.11527.
Jun, K.-S., Bhargava, A., Nowak, R., & Willett, R. (2017). Scalable generalized linear bandits: Online computation and hashing. Advances in neural information processing systems, 30.
Kang, Y., Hsieh, C.-J., & Lee, T. (2023). Online continuous hyperparameter optimization for contextual bandits. arXiv preprint arXiv:2302.09440.
Karnin, Z., Koren, T., & Somekh, O. (2013). Almost optimal exploration in multi-armed bandits. Paper presented at the International Conference on Machine Learning.
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1), 4-22.
Langford, J., & Zhang, T. (2007). The epoch-greedy algorithm for multi-armed bandits with side information. Advances in neural information processing systems, 20.
Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. Paper presented at the Proceedings of the 19th international conference on World wide web.
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1), 6765-6816.
Lin, B., Bouneffouf, D., Cecchi, G. A., & Rish, I. (2018). Contextual bandit with adaptive feature extraction. Paper presented at the 2018 IEEE International Conference on Data Mining Workshops (ICDMW).
Lorenzo, P. R., Nalepa, J., Ramos, L. S., & Pastor, J. R. (2017). Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. Paper presented at the Proceedings of the Genetic and Evolutionary Computation Conference Companion.
Noothigattu, R., Bouneffouf, D., Mattei, N., Chandra, R., Madan, P., Varshney, K., . . . Rossi, F. (2018). Interpretable multi-objective reinforcement learning through policy orchestration. arXiv preprint arXiv:1809.08343.
Parker-Holder, J., Nguyen, V., & Roberts, S. (2020). Provably efficient online hyperparameter optimization with population-based bandits. arXiv preprint arXiv:2002.02518.
Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on thompson sampling. Foundations and Trends® in Machine Learning, 11(1), 1-96.
Schwartz, E. M., Bradlow, E. T., & Fader, P. S. (2017). Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4), 500-522.
Seifi, F., & Niaki, S. T. A. Dynamic Meta-Learning Acquisition Function Method for Bayesian Optimization with Early Stopping Criteria for Hyperparameter Optimization. Available at SSRN 4205030.
Seifi, F., & Niaki, S. T. A. (2023). Extending the hypergradient descent technique to reduce the time of optimal solution achieved in hyperparameter optimization algorithms. International Journal of Industrial Engineering Computations, 14(3), 501-510.
Sharaf, A., & Daumé III, H. (2019). Meta-learning for contextual bandit exploration. arXiv preprint arXiv:1901.08159.
Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-thaw Bayesian optimization. arXiv preprint arXiv:1406.3896.
Woodroofe, M. (1979). A one-armed bandit problem with a concomitant variable. Journal of the American Statistical Association, 74(368), 799-806.
Zhou, D., Li, L., & Gu, Q. (2020). Neural contextual bandits with ucb-based exploration. Paper presented at the International Conference on Machine Learning.
Zöller, M.-A., & Huber, M. F. (2021). Benchmark and survey of automated machine learning frameworks. Journal of Artificial Intelligence Research, 70, 409-472.
Agarwal, A., Dudík, M., Kale, S., Langford, J., & Schapire, R. (2012). Contextual bandit learning with predictable rewards. Paper presented at the Artificial Intelligence and Statistics.
Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L., & Schapire, R. (2014). Taming the monster: A fast and simple algorithm for contextual bandits. Paper presented at the International Conference on Machine Learning.
Agrawal, S., & Goyal, N. (2013). Thompson sampling for contextual bandits with linear payoffs. Paper presented at the International conference on machine learning.
Allesiardo, R., Féraud, R., & Bouneffouf, D. (2014). A neural networks committee for the contextual bandit problem. Paper presented at the Neural Information Processing: 21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014. Proceedings, Part I 21.
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235-256.
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1), 48-77.
Balakrishnan, A., Bouneffouf, D., Mattei, N., & Rossi, F. (2018). Using Contextual Bandits with Behavioral Constraints for Constrained Online Movie Recommendation. Paper presented at the IJCAI.
Balakrishnan, A., Bouneffouf, D., Mattei, N., & Rossi, F. (2019a). Incorporating behavioral constraints in online AI systems. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence.
Balakrishnan, A., Bouneffouf, D., Mattei, N., & Rossi, F. (2019b). Using multi-armed bandits to learn ethical priorities for online AI systems. IBM Journal of Research and Development, 63(4/5), 1: 1-1: 13.
Bastani, H., & Bayati, M. (2020). Online decision making with high-dimensional covariates. Operations Research, 68(1), 276-294.
Bastani, H., Bayati, M., & Khosravi, K. (2021). Mostly exploration-free algorithms for contextual bandits. Management Science, 67(3), 1329-1349.
Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in neural information processing systems, 24.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(2).
Bietti, A., Agarwal, A., & Langford, J. (2018). Practical evaluation and optimization of contextual bandit algorithms. Statistics, 1050, 12.
Bouneffouf, D. (2016). Exponentiated gradient exploration for active learning. Computers, 5(1), 1.
Bouneffouf, D., & Claeys, E. (2020). Hyper-parameter tuning for the contextual bandit. arXiv preprint arXiv:2005.02209.
Bouneffouf, D., & Féraud, R. (2016). Multi-armed bandit problem with known trend. Neurocomputing, 205, 16-21.
Bouneffouf, D., Laroche, R., Urvoy, T., Féraud, R., & Allesiardo, R. (2014). Contextual bandit for active learning: Active thompson sampling. Paper presented at the Neural Information Processing: 21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014. Proceedings, Part I 21.
Bouneffouf, D., Parthasarathy, S., Samulowitz, H., & Wistub, M. (2019). Optimal exploitation of clustering and history information in multi-armed bandit. arXiv preprint arXiv:1906.03979.
Bouneffouf, D., & Rish, I. (2019). A survey on practical applications of multi-armed and contextual bandits. arXiv preprint arXiv:1904.10040.
Bouneffouf, D., Rish, I., Cecchi, G. A., & Féraud, R. (2017). Context attentive bandits: Contextual bandit with restricted context. arXiv preprint arXiv:1705.03821.
Chu, J.-C. (2024). Hyperparameter optimization strategy for Multi-Armed Bandits: Genre recommendation in MovieLens dataset. Paper presented at the Proceedings of the 2023 International Conference on Machine Learning and Automation.
Di Francescomarino, C., Dumas, M., Federici, M., Ghidini, C., Maggi, F. M., Rizzi, W., & Simonetto, L. (2018). Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Information Systems, 74, 67-83.
Ding, Q., Hsieh, C.-J., & Sharpnack, J. (2021). An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling. Paper presented at the International Conference on Artificial Intelligence and Statistics.
Ding, Q., Kang, Y., Liu, Y.-W., Lee, T. C. M., Hsieh, C.-J., & Sharpnack, J. (2022). Syndicated bandits: A framework for auto tuning hyper-parameters in contextual bandit algorithms. Advances in Neural Information Processing Systems, 35, 1170-1181.
Dudik, M., Hsu, D., Kale, S., Karampatziakis, N., Langford, J., Reyzin, L., & Zhang, T. (2011). Efficient optimal learning for contextual bandits. arXiv preprint arXiv:1106.2369.
Falkner, S., Klein, A., & Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. Paper presented at the International Conference on Machine Learning.
Guo, B., Hu, J., Wu, W., Peng, Q., & Wu, F. (2019). The Tabu_genetic algorithm: a novel method for hyper-parameter optimization of learning algorithms. Electronics, 8(5), 579.
Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. Paper presented at the International conference on learning and intelligent optimization.
Hutter, F., Kotthoff, L., & Vanschoren, J. (2019). Automated machine learning: methods, systems, challenges: Springer Nature.
Injadat, M., Salo, F., Nassif, A. B., Essex, A., & Shami, A. (2018). Bayesian optimization with machine learning algorithms towards anomaly detection. Paper presented at the 2018 IEEE global communications conference (GLOBECOM).
Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., . . . Simonyan, K. (2017). Population based training of neural networks. arXiv preprint arXiv:1711.09846.
Jomaa, H. S., Grabocka, J., & Schmidt-Thieme, L. (2019). Hyp-rl: Hyperparameter optimization by reinforcement learning. arXiv preprint arXiv:1906.11527.
Jun, K.-S., Bhargava, A., Nowak, R., & Willett, R. (2017). Scalable generalized linear bandits: Online computation and hashing. Advances in neural information processing systems, 30.
Kang, Y., Hsieh, C.-J., & Lee, T. (2023). Online continuous hyperparameter optimization for contextual bandits. arXiv preprint arXiv:2302.09440.
Karnin, Z., Koren, T., & Somekh, O. (2013). Almost optimal exploration in multi-armed bandits. Paper presented at the International Conference on Machine Learning.
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1), 4-22.
Langford, J., & Zhang, T. (2007). The epoch-greedy algorithm for multi-armed bandits with side information. Advances in neural information processing systems, 20.
Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. Paper presented at the Proceedings of the 19th international conference on World wide web.
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1), 6765-6816.
Lin, B., Bouneffouf, D., Cecchi, G. A., & Rish, I. (2018). Contextual bandit with adaptive feature extraction. Paper presented at the 2018 IEEE International Conference on Data Mining Workshops (ICDMW).
Lorenzo, P. R., Nalepa, J., Ramos, L. S., & Pastor, J. R. (2017). Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. Paper presented at the Proceedings of the Genetic and Evolutionary Computation Conference Companion.
Noothigattu, R., Bouneffouf, D., Mattei, N., Chandra, R., Madan, P., Varshney, K., . . . Rossi, F. (2018). Interpretable multi-objective reinforcement learning through policy orchestration. arXiv preprint arXiv:1809.08343.
Parker-Holder, J., Nguyen, V., & Roberts, S. (2020). Provably efficient online hyperparameter optimization with population-based bandits. arXiv preprint arXiv:2002.02518.
Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on thompson sampling. Foundations and Trends® in Machine Learning, 11(1), 1-96.
Schwartz, E. M., Bradlow, E. T., & Fader, P. S. (2017). Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4), 500-522.
Seifi, F., & Niaki, S. T. A. Dynamic Meta-Learning Acquisition Function Method for Bayesian Optimization with Early Stopping Criteria for Hyperparameter Optimization. Available at SSRN 4205030.
Seifi, F., & Niaki, S. T. A. (2023). Extending the hypergradient descent technique to reduce the time of optimal solution achieved in hyperparameter optimization algorithms. International Journal of Industrial Engineering Computations, 14(3), 501-510.
Sharaf, A., & Daumé III, H. (2019). Meta-learning for contextual bandit exploration. arXiv preprint arXiv:1901.08159.
Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-thaw Bayesian optimization. arXiv preprint arXiv:1406.3896.
Woodroofe, M. (1979). A one-armed bandit problem with a concomitant variable. Journal of the American Statistical Association, 74(368), 799-806.
Zhou, D., Li, L., & Gu, Q. (2020). Neural contextual bandits with ucb-based exploration. Paper presented at the International Conference on Machine Learning.
Zöller, M.-A., & Huber, M. F. (2021). Benchmark and survey of automated machine learning frameworks. Journal of Artificial Intelligence Research, 70, 409-472.