Integrating Economic Theory, Domain Knowledge, and Social Knowledge into Hybrid Sentiment Models for Predicting Crude Oil Markets

Articles on stock price prediction use techniques as varied as univariate or multivariate forecasting [3], sentiment analysis [4], random walks [5], fractals and chaos theory [6], fundamental and technical analysis [7], and even news and social media analysis [8]. Besides what is published, there is also a wide array of proprietary systems that support retail traders or institutions, and which may never be described in the literature. Stock market prediction is one of the most competitive research areas, and obtaining a comprehensive snapshot of its current state of the art is almost impossible. Consequently, this section focuses mostly on research that includes sentiment and other natural language processing (NLP) features in the context of FSA.

A survey from Gu et al. [9] suggests that trading and price-based methods together with deep learning yield good results, surpassing traditional features such as text and sentiment. Hu et al. [10] review neural networks used for Forex and stock market prediction and discover that sentiment is rarely used in the papers submitted to top journals. Although this may have been the case for articles considered in that particular survey, the following discussion of the state of the art demonstrates that sentiment has been used extensively in recent literature.

Mahata et al. [11] suspect that due to the disruptions caused by the early 2020 coronavirus-induced market crash, some models described in the literature may no longer perform well in today’s markets. Of particular concern is the lack of representative data for the pandemic period, as data source variability can be a serious source of confusion for machine learning (ML) algorithms [12]. For instance, in March 2020, the spread of COVID-19 led to a negative shift in consumer sentiment, resulting in decreasing stock prices in a volatile market environment particularly in the tourism sector and futures of energy commodities. Nine months later, the approval of multiple COVID-19 vaccines led to a positive shift in consumer sentiment, resulting in an increase in stock prices for companies, especially in the healthcare sector.

The importance awarded to news and sentiment has also increased considerably during the pandemic, as showcased in a recent study that analyzed social media to understand the changes in the public’s behavior [13].

Affective Models

While some models presented in this section have not been used for market prediction, it is necessary to briefly review them to fully understand the discussion on sentiment indicators’ affective models. This section firmly focuses on Transformer architectures that were used for such tasks. Many affective classification models have been introduced in the past three decades: Ekman [14], Plutchik’s Wheel of Emotions [15], the Circumplex Model of Affect [16], the Hourglass of Emotions [17], and its revised version [18]. These models generally distinguish between basic and derived emotions. Plutchik’s popular model [15] was the original gold standard for decades due to its combination of a workable structure (derived from the eight basic emotions: joy, trust, fear, surprise, sadness, disgust, anger, and anticipation) with different degrees of expression along the covered affective categories. The Hourglass of Emotions [17] identifies four categories (pleasantness, attention, sensitivity, and aptitude) and their activation scales (e.g., pleasantness can have various activation levels between joy and sadness like ecstasy and grief). Its revised version [18] improves consistency, removed neutral emotions (e.g., surprise was eliminated as annotators found it difficult to decide its valence), and adds polar and self-conscious emotions.

A recent survey [19] examines most of the sentiment analysis surveys published during the last decade and compiles a list of the most promising research directions. The number of papers has increased massively during the past 20 years (from two papers in 2002 to 1466 in 2021). The survey identifies six large communities (social media, ML, NLP, opinion mining, Arabic, semi-supervised learning) and their key research topics.

Domain-specific affective models [20] go beyond classic models by incorporating application-specific affective categories and interpreting them based on the situational context. An affective model for benchmarking TV shows, for example, might consider fear and sadness to be desirable associations rather than undesirable ones.

Lexicon-Based and Bag-of-Words Models for Financial Sentiment Analysis

The use of sentiment as an indicator to support decision-making in finance has a long history; however, early proprietary systems have been rarely documented in academic literature.

Sezer’s work [21] provides a systematic review of deep learning models for financial time series forecasting used for stock and commodity prices during the late pre-COVID era (2005–2019). The survey reviews the mathematics behind each model and its parameters, as well as the markets to which it was applied. Recurrent neural network models (RNN) like long short-term memory networks (LSTM) or gated recurrent units (GRU) are most common, followed by convolutional neural networks (CNN), regardless of the market.

Due to their generalization capabilities, LSTM models are also well-suited for commodity prices and can be used for predicting prices of WTI or Brent crude oil daily closings [22]. An LSTM-DNN hybrid model trained on multiple markets, including oil, yielded the best results for predicting coal prices [23]. When considering these results, it is important to note that most of the presented methods have been evaluated on different datasets and time intervals, which seriously impacts the comparability of results.

One of the most cited studies addressing the use of NLP and lexicons for FSA is Loughran and McDonald’s work on interpreting liabilities associated with 10-K filing returns [24]. They created a large corpus of 10-K samples from the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) website.Footnote 2 Their analysis focuses on the management discussion & analysis (MD&A) sections to examine the most frequent words and negative terms included in financial dictionaries that are found in the fillings. The study concludes that some words with negative connotations may be industry-specific and not indicative of any liabilities; therefore, general categorization schemes need to be used with caution. The same authors have also published a longer survey on the use of text analysis in accounting and finance [25]. This survey covers mostly the analysis of public financial documents like US Securities and Exchange Commission (SEC) fillings over three decades of text analytics (1984–2014) and reviews financial lexicons, bag-of-words methods, and document similarity metrics.

Xing et al. [26] present a review of surveys and a taxonomy of early NLP models for natural language-based financial forecasting (NLFF) covering the pre-Transformer era. They focus mainly on literature that aims at predicting the stock market using bag-of-words approaches, as well as early ML methods (e.g., k-NN, SVM, least squares regression, and decision trees). The article considers NLFF a nascent field and defines its areas of interest (e.g., sentiment, volatility, technical and fundamental analysis, portfolio management). It also goes on to describe the most important models (e.g., GARCH, ARIMA, neural networks), and methods (e.g., simulation, credit scoring, exchange rates, backtesting) used in this field.

Sentiment-based forecasting methods for crude oil prices are discussed in [27]. The article presents a method that uses a CNN to extract sentiment features. They suggest that text features and financial (e.g., numerical) features are somewhat complementary.

Deep Learning Models for Financial Sentiment Analysis

A recent survey on deep learning for text classification [28] considers sentiment analysis a crucial text classification task. The authors group sentiment analysis methods into two classes (rule-based and ML-based) and define a set of classification tasks (e.g., sentiment analysis, news classification, topic analysis, question answering, textual entailment). The survey is particularly interesting since it provides guidelines and steps (model selection, domain adaptation, task-specific model design, task-specific fine-tuning, and model compression) for selecting and adapting neural networks to a particular task. In addition, it also includes information on resources and evaluation metrics.

FSA is not a new field as sentiment indicators have already been used as early as in the 1980s, although most early systems were proprietary and, therefore, not properly described in the literature. A recent analysis of the common errors and successful approaches in FSA was provided by Xing et al. [29]. The article compares eight models for the FSA task on the Yelp and StockSen datasets and provides a list of six common errors. These include rhetoric issues, dependent opinions, counterfactual moods (irrealis), unspecified aspects (e.g., cases in which humans discover the correct aspect easier than the algorithms), unrecognized words (e.g., acronyms), and external references (e.g., references that need additional information).

The public mood has been a reliable indicator of market direction, but until recently, it has rarely been used for intelligent asset allocation [30]. Work by Xing et al. shows that market views, a formalization of the public mood, can help increase returns when combined with Bayesian allocation models. Their experiments also prove that such an approach increases profitability by 5 to 10% annually. In subsequent work [31], they draw upon LSTMs and adapted their approach to automatic portfolio management, raising annual profitability by 19%.

The sentiment itself can be used for estimating various target metrics, which makes it difficult to assess its predictive power. A survey on sentiment analysis based on deep learning [32] enumerates several widely used models from CNNs and RNNs to LSTMs, GRUs, and hybrid networks, and provides a taxonomy of sentiment analysis techniques. The surveyed methods include ML approaches (e.g., semi-supervised learning, unsupervised and supervised), lexicon-based techniques (e.g., dictionary-based, corpus-based), and hybrid methods (e.g., combinations of neural networks and lexicons or corpus). Li et al. [33] draw upon sentiment for predicting prices and presents an LSTM model that considers technical indicators and sentiment analysis. Xing et al. [34] use sentiment for predicting volatility and implements a system called SAVING (sentiment-aware volatility forecasting) which is built upon a variational RNN (VRNN).

Foundation Models for Financial Sentiment Analysis

The first Transformer model was published in 2017 [35], but due to its flexibility, this class of models has been used for solving many problems in areas such as NLP and computer vision.

Financial Transformer models typically use encoder pre-training on broad collections of data. These models are suitable for a wide number of downstream tasks and can also infer logic statements. Due to their central role in the NLP ecosystem, and their incomplete nature (i.e., they require fine-tuning since logical statements are inferred based on the training data only), these Transformer models are called foundation models [36]. Such models are enabled by transfer learning and can scale to large volumes of data.

Most current work on FSA is based on the architectures of the Bidirectional Encoder Representations from Transformers (BERT) transformer [37], specifically versions of the FinBERT model [38]. FinBERT was one of the first language models applied to the finance domain. When trained on the FiQA datasetFootnote 3 and compared with ELMo and ULMFit, it achieved an improvement of 15% in accuracy. The works from Yi et al. [39] further improve FinBERT’s performance by training on multiple datasets (e.g., Financial Phrase Bank, AnalysTone, and FiQA) and optimizing fine-tuning strategies. Liu et al. [40] train FinBERT on five large general-purpose datasets and on six self-supervised tasks (dialogue relation, sentence distance, reshuffling, token-passage, capitalization, and span prediction) to create a more robust FinBERT version. Their model outperforms other approaches for all evaluation tasks described in their paper (e.g., Financial Sentence Boundary Detection, FSA, and Financial Question Answering). FinBERT can be used in a multitude of applications (e.g., FinTextSen, FinNum tasks, sentiment analysis, numeral understanding) and domain adaptation seems to help improve results, although various errors suggest that the fine-tuning process on small datasets is not always stable [41]. Researchers also use FinBERT as a feature extractor for financial text classification [42] and prediction.

A recent article analyzes FinBERT’s sentiment performance, as well as its advantages over classic BERT large language models (LLMs) [43]. FLANG models [44] expand upon FinBERT and benefit from training on a benchmark known as the Financial Language Understanding Evaluation (FLUE) benchmark.

Zou and Herremans [45] combine FinBERT embeddings with a multimodal model to predict extreme price fluctuations of cryptocurrencies in Twitter feeds. Each year, the best Transformer-based financial models compete in various workshops on classification, phrase similarity, or sentiment-related tasks. During the last few years, the top systems were based on FinBERT [46].

Compared to the number of papers that used FinBERT as a backend for classification, the amount of literature that builds upon its capabilities as a feature extractor is relatively small. Farimani et al. [47] use a FinBERT model fine-tuned for FSA as a backend for an RNN feature extractor and implements an API for cryptocurrency markets. The API as well as their BERT model (FinBERT-SIMF) is publicly available. Ider [48] trains a BERT-based sentiment model on Reddit and Twitter posts for predicting market movement. The model lacks the capability to correctly assign sentiment if multiple targets are involved (e.g., if one target is negative, both targets may end up labeled negative). Chuang and Yang [49] analyze the output of BERT and FinBERT models, and discover that they have some positive implicit preferences towards the stock market and some serious differences in the treatment of various industries.

Closer to the domain of our article, Fang et al. [50] showcase a hybrid model for crude oil price forecasting that integrates FinBERT, variational mode decomposition (VMD), attention mechanisms, and a BiGRU DL model. VMD is a modern signal decomposition technique that was recently used by Huang and Deng [51] for modeling crude oil prices. Since VMD is typically used in hybrid models, its role is that of a time series cleaning method, and it is generally coupled with LSTMs and GRU models. A language model focused on sentiment for the oil and gas domain called PetroBERT [52] was also identified, but only available in Portuguese, which makes comparative evaluations with English models difficult.

The massive amount of recent literature on FSA demonstrates its importance as a field. Nevertheless, much research focuses on incremental improvements of existing models, and rarely considers economic theory, although market behavior should ultimately adhere to fundamental economic laws. The work presented in this article addresses this issue by blending existing affective models with economic theory.

留言 (0)

沒有登入
gif