Document Type : Original Article


1 Department of Financial Engineering, Faculty of Economics, Management and Accounting, Yazd University, Yazd, Iran

2 Department of Finance and Accounting, School of Management and Economics, Yazd University, Yazd, Iran.


Purpose: One of the issues that significantly impact how people invest is the behavioural characteristics of investors. Given the importance of this issue, investors should be able to categorize investors into different classes and recommend investments appropriate to the personality type of the same class for each class. One of the solutions that can be used for this purpose is clustering. Clustering is one of the unsupervised learning methods and has a descriptive nature. In this method, the data are allocated based on a similarity criterion so that the data in each cluster are most similar and the least comparable to the data in other clusters.
Methodology: This study identifies a group of investors with similar ability and willingness to accept risk using K-means clustering and Affinity propagation clustering. We also show how to allocate assets effectively using investor characteristics and clustering techniques.
Findings: Use silhouette coefficient to evaluate two clustering methods to select the best method for data clustering. The k-means coefficient was equal to 0.17, and the Affinity propagation clustering was equal to 0.097. Therefore, we choose the k-means method as the optimal clustering method. Using the K-means clustering method, we cluster investors based on financial, behavioural, and demographic characteristics, and according to the clustering results, we divide individuals into seven categories with low to high-risk acceptance.
Originality/Value: All calculations in this study were performed by Python 3.8. Investment managers and stock advisors can use the results of this study.


Aghabozorgi, S., Shirkhorshidi, A. S., & Wah, T. Y. (2015). Time-series clustering–a decade review. Information systems53, 16-38.
Chandra, B., Gupta, M., & Gupta, M. P. (2008, October). A multivariate time series clustering approach for crime trends prediction. 2008 IEEE international conference on systems, man and cybernetics (pp. 892-896). IEEE. DOI: 10.1109/ICSMC.2008.4811393
Chiu, B., Keogh, E., & Lonardi, S. (2003, August). Probabilistic discovery of time series motifs. Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 493-498).
Dose, C., & Cincotti, S. (2005). Clustering of financial time series with application to index and enhanced index tracking portfolio. Physica A: statistical mechanics and its applications355(1), 145-151.
Durante, F., Pappadà, R., & Torelli, N. (2014). Clustering of financial time series in risky scenarios. Advances in data analysis and classification8(4), 359-376.
Ezugwu, A. E., Shukla, A. K., Agbaje, M. B., Oyelade, O. N., Jose-Garcia, A., & Agushaka, J. O. (2021). Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural computing and applications33(11), 6247-6306. 
Fu, T. C., Chung, F. L., Ng, V., & Luk, R. (2001, August). Pattern discovery from stock time series using self-organizing maps. Workshop notes of KDD2001 workshop on temporal data mining (Vol. 1).
Graves, D., & Pedrycz, W. (2010). Proximity fuzzy clustering and its application to time series clustering and prediction. 10th international conference on intelligent systems design and applications (pp. 49-54). IEEE. DOI: 10.1109/ISDA.2010.5687293
He, W., Feng, G., Wu, Q., He, T., Wan, S., & Chou, J. (2012). A new method for abrupt dynamic change detection of correlated time series. International journal of climatology32(10), 1604-1614.
Jain, P., & Jain, S. (2019). Can machine learning-based portfolios outperform traditional risk-based portfolios? The need to account for covariance misspecification. Risks7(3), 74.
Keogh, E., Lonardi, S., & Chiu, B. Y. C. (2002, July). Finding surprising patterns in a time series database in linear time and space. Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 550-556).
Leng, M., Lai, X., Tan, G., & Xu, X. (2009, August). Time series representation for anomaly detection. 2nd IEEE international conference on computer science and information technology (pp. 628-632). IEEE. DOI: 10.1109/ICCSIT.2009.5234775
León, D., Aragón, A., Sandoval, J., Hernández, G., Arévalo, A., & Niño, J. (2017). Clustering algorithms for risk-adjusted portfolio construction. Procedia computer science, ICCS, 108, 1334-1343.
Lohre, H., Rother, C., & Schäfer, K. A. (2020). Hierarchical risk parity: accounting for tail dependencies in multi‐asset multi‐factor allocations. Machine learning for asset management: new developments and financial applications, 329-368.
MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297). University of California.
Musto, C., Semeraro, G., Lops, P., De Gemmis, M., & Lekkas, G. (2015). Personalized finance advisory through case-based recommender systems and diversification strategies. Decision support systems77, 100-111.
Polz, P. M., Hortnagl, E., & Prem, E. (2003). Processing and clustering time series of mobile robot sensory data (OFAI-TR-2003-10). Austrian research institute for artificial intelligence: systemic intelligence for growingup artefacts that live-SIGNAL.
Raffinot, T. (2017). Hierarchical clustering-based asset allocation. The journal of portfolio management44(2), 89-99. DOI:
Raffinot, T. (2018). The hierarchical equal risk contribution portfolio. Available at SSRN 3237540.
Rai, P., & Singh, S. (2010). A survey of clustering techniques. International journal of computer applications7(12), 1-5.
Ren, F., Lu, Y. N., Li, S. P., Jiang, X. F., Zhong, L. X., & Qiu, T. (2017). Dynamic portfolio strategy using clustering approach. PloS one12(1), e0169299.
Saxena, A., Prasad, M., Gupta, A., Bharill, N., Patel, O. P., Tiwari, A., ... & Lin, C. T. (2017). A review of clustering techniques and developments. Neurocomputing267, 664-681.
Sfetsos, A., & Siriopoulos, C. (2004). Time series forecasting with a hybrid clustering scheme and pattern recognition. IEEE transactions on systems, man, and cybernetics-part A: systems and humans34(3), 399-405. DOI: 10.1109/TSMCA.2003.822270
Statman, M. (2018). Behavioral finance lessons for asset managers. The journal of portfolio management44(7), 135-147.
Tatsat, H., Puri, S., & Lookabaugh, B. (2020). Machine learning and data science blueprints for finance: from building trading strategies to robo-advisors using python. O'Reilly media.
Thuraisingham, B. M., & Ceruti, M. G. (2000, October). Understanding data mining and applying it to command, control, communications and intelligence environments. Proceedings 24th annual international computer software and applications conference. COMPSAC2000 (pp. 171-175). IEEE. DOI: 10.1109/CMPSAC.2000.884710
Tola, V., Lillo, F., Gallegati, M., & Mantegna, R. N. (2008). Cluster analysis for portfolio optimization. Journal of economic dynamics and control32(1), 235-258.
Wang, H., Wang, W., Yang, J., & Yu, P. S. (2002, June). Clustering by pattern similarity in large data sets. Proceedings of the 2002 ACM SIGMOD international conference on management of data (pp. 394-405).
Williams, B., Onsman, A., & Brown, T. (2010). Exploratory factor analysis: a five-step guide for novices. Australasian journal of paramedicine8(3), 1-13.