Twenty Years of Research in Stock Market Prediction from Text Mining

Text mining for the stock market is by far not a new domain.

Already in 1998, Wuthrich et al. attempted to predict stock markets based on online news articles such as The Wall Street Journal. The idea was straightforward: count occurrences of manually defined keywords in articles and correlate their presence with the stock values using machine learning techniques. Despite a low accuracy, this idea spawned interest and a large number of approaches were attempted to tackle the problem: genetic algorithms (Thomas and Sycara, 2000), naive Bayes (Lavrenko et al. 2000), support vector machine (Mittermayer et al. 2004), etc. The core idea is always the same: first retrieve relevant documents, then correlate the documents content with the stock prices. While first approaches focused on financial news (Schumaker and Chen 2009), some authors also investigated financial reports (Loughran and McDonald, 2011). An important trend started when researchers considered documents directly produced by users on financial forums rather than expert journalists such as Antweiler and Frank, 2004.

Parallel to these works, the natural language processing community was interested in extracting sentiments from text with the seminal papers from Pang et al. 2002; Turney 2002. Instead of directly correlating the text contents to stock prices, researchers then correlated with success the sentiment (positive or negative) of financial forums posts to the stock prices (Das and Chen, 2007).

Nevertheless, the work of Bollen et al. 2011 generated a vast amount of research on text mining from social media in many directions: researchers looked at the targets of sentiments (Jiang et al., 2011), the basic emotions (Zhang et al., 2011), interactional structure (Ruiz et al., 2012), etc. They looked at other social media such as Facebook (Karabulut, 2013), or more specialized financial social media such as StockTwits or SeekingAlpha (Liao et al., 2014; Wang et al., 2015). They even investigated more exotic sources such as Wikipedia (Moat et al., 2013) or Google Trends (Preis et al., 2013). The recent approaches involve a fair amount of deep learning (Alsing and Bahceci, 2015; Ding et al., 2015). For instance in Dickinson and Hu, 2015, authors use the word embeddings technique proposed by Mikolov et al., 2013 to learn word vectors directly linked to companies.

It is however impossible to give a full account of the massive body of research in the domain and we thus refer the interested reader to two surveys (Nassirtoussi et al. 2014) and the recent Handbook of Sentiment Analysis in Finance (Mitra and Yu, 2016) which can provide further information. Here at SESAMm, we are the heirs of this rich compendium of 20 years of research. We focus on social media by investigating the role of sentiments and emotions in predicting the stock market. By relying on a multidisciplinary team that involves quantitative analysists, machine learning and natural language processing experts, we are not only able to apply proven techniques but also experiment with cutting edge technologies such as deep learning. In all cases, we make sure to overcome the inherent difficulties of market predictions with thorough evaluation to provide the best possible trading indicators.

References Linked To The Article “twenty Years Of Research In Stock Market Prediction From Text Mining” By Alexandre Denis, Sesamm

Alsing, O. & Bahceci, O.Stock Market Prediction using Social Media Analysis. KTH Royal Institute of Technology, Stockholm, Sweden, 2015.

Antweiler, W. & Frank, M. Z. Is all that talk just noise? The information content of internet stock message boards. In The Journal of Finance, Wiley Online Library, 2004, 59, 1259-1294.

Asur, S. & Huberman, B. Predicting the future with social media. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, 2010, 1, 492-499.

Bollen, J.; Mao, H. & Zeng, X. Twitter mood predicts the stock market. In Journal of Computational Science, Elsevier, 2011, 2, 1-8.

Das, S. R. & Chen, M. Y. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Science, INFORMS, 2007, 53, 1375-1388.

Dickinson, B. & Hu, W. Sentiment analysis of investor opinions on twitter. In Social Networking, Scientific Research Publishing, 2015, 4, 62.

Ding, X.; Zhang, Y.; Liu, T. & Duan, J. Deep learning for event-driven stock prediction. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (ICJAI'15), 2015, pp. 2327-2333.

Ferrara, E., & Yang, Z. Measuring emotional contagion in social media. In PloS one, 10(11). 2015.

Jiang, L.; Yu, M.; Zhou, M.; Liu, X. & Zhao, T. Target-dependent twitter sentiment classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 2011, 151-160.

Karabulut, Y. Can Facebook predict stock market activity? In AFA 2013 San Diego Meetings Paper, 2013.

Lavrenko, V.; Schmill, M.; Lawrie, D.; Ogilvie, P.; Jensen, D. & Allan, J. Mining of Concurrent Text and Time Series. In Proceedings of the 6th ACM SIGKDD INT'L Conference on Knowledge Discovery and Data Mining Workshop on Text Mining, 2000, 37-44.

Liao, W.; Shah, S. & Makrehchi, M. Kennedy, W.; Agarwal, N. & Yang, S. (Eds.) Winning by Following the Winners: Mining the Behaviour of Stock Market Experts in Social Media. In Social Computing, Behavioral-Cultural Modeling and Prediction, Springer International Publishing, 2014, 8393, 103-110.

Loughran, T. & McDonald, B. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. In The Journal of Finance, Wiley Online Library, 2011, 66, 35-65.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). 2013. Mitra, G. & Yu, X. The Handbook of Sentiment Analysis in Finance. 2016

Mittal, A. & Goel, A. Stock prediction using twitter sentiment analysis. Tech report, Standford University, CS229. 2012

Mittermayer, M.-A. Forecasting intraday stock price trends with text mining techniques. In System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on, 2004, 10-pp.

Moat, H. S.; Curme, C.; Avakian, A.; Kenett, D. Y.; Stanley, H. E. & Preis, T. Quantifying Wikipedia usage patterns before stock market moves. In Scientific reports, Nature Publishing Group, 2013, 3.

Nassirtoussi, A. K.; Aghabozorgi, S.; Wah, T. Y. & Ngo, D. C. L. Text mining for market prediction: A systematic review. In Expert Systems with Applications, Elsevier, 2014, 41, 7653-7670

Pang, B.; Lee, L. & Vaithyanathan, S. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, 2002, 79-86.

Preis, T.; Moat, H. S. & Stanley, H. E. Quantifying trading behavior in financial markets using Google Trends. In Scientific reports, Nature Publishing Group, 2013, 3

Ruiz, E. J.; Hristidis, V.; Castillo, C.; Gionis, A. & Jaimes, A. Correlating financial time series with micro-blogging activity. In Proceedings of the fifth ACM international conference on Web search and data mining, 2012, 513-522.

Schumaker, R. P. & Chen, H. Textual analysis of stock market prediction using breaking financial news: The AZFin text system. In ACM Transactions on Information Systems (TOIS), ACM, 2009, 27, 12.

Thomas, J. D. & Sycara, K. Integrating genetic algorithms and text learning for financial prediction. In Data Mining with Evolutionary Algorithms, 2000, 72-75.

Turney, P. D. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, 2002, 417-424.

Wang, G.; Wang, T.; Wang, B.; Sambasivan, D.; Zhang, Z.; Zheng, H. & Zhao, B. Y. Crowds on Wall Street: Extracting Value from Collaborative Investing Platforms. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, 2015, 17-30.

Wuthrich, B.; Cho, V.; Leung, S.-W.; Sankaran, K. & Zhang, J. Daily stock market forecast from textual web data. In Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on, 1998, 3, 2720-2725.

Zhang, X.; Fuehres, H. & Gloor, P. A. Predicting stock market indicators through twitter "I hope it is not as bad as I fear". In Procedia-Social and Behavioral Sciences, Elsevier, 2011, 26, 55-62.

ERROR: Content Element with uid "48982" and type "powermail_pi1" has no rendering definition!

Alexandre Denis SESAMm