COMPARISON OF RNN AND LSTM NEURAL NETWORKS
DOI:
https://doi.org/10.32782/IT/2024-3-10Keywords:
recurrent neural network, LSTM, RNN, sentiment classification, long-term dependenciesAbstract
The research aims to identify the advantages and disadvantages of different approaches to sequential data processing, which is an important aspect in natural language processing tasks such as sentiment analysis, machine translation, and text generation. The purpose of the work. The purpose of the work is to investigate the effectiveness of different neural network architectures for the problem of sentiment classification, with an emphasis on comparing RNN and LSTM models. Methodology. The paper examines the theoretical aspects of the functioning of recurrent neural networks (RNN) and long-term short-term memory (LSTM) networks, which are specialized variants of RNN. An experimental comparison of four different neural network models, including simple recurrent networks (RNNs), LSTM networks, and convolutional neural networks (CNNs), applied to the sentiment classification task was conducted. For the experiment, the imdb_reviews dataset was chosen, which contains movie reviews intended for binary sentiment classification (positive or negative feedback). The implementation and training of the models was done using the TensorFlow and Keras libraries, which provide a toolkit for efficient machine learning. The process of training and testing the models took place using standard approaches to preprocessing textual data, such as tokenization and sequence preparation. Scientific novelty. It is shown that the main advantage of LSTM is the ability to solve the problem of long-term dependencies, which makes them more effective for tasks where it is important to take into account the context of long data sequences. It has been experimentally confirmed that the training time of recurrent neural networks is significantly longer compared to non-recurrent models, but they demonstrate slightly better accuracy. Conclusions. The results of the study indicate that the use of LSTM networks is a more effective approach for solving complex problems that require consideration of the context at the level of sequences exceeding in length typical fragments of text. LSTMs are superior to them due to the ability to preserve long-term dependencies, which is especially important in tasks where it is necessary to take into account the relationship between distant data elements.
References
Ісаков С. Рекурентна нейронна мережа (RNN): типи, навчання, приклади. URL: https://neurohive.io/ru/osnovy-data-science/rekurrentnye-nejronnye-seti (дата звернення: 15.08.2024).
Глек П. LSTM – мережа довготривалої короткочасної пам’яті. URL: https://neurohive.io/ru/osnovydata-science/lstm-nejronnaja-set (дата звернення: 15.08.2024).
Hochreiter S. Untersuchungen zu dynamischen neuronalen Netzen. Diploma, Technische Universität München, 1991. 31 с.
Bengio Y., Simard P., Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks. 1994. Vol. 5, № 2. С. 157–166.
Hochreiter S., Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997. Vol. 9, № 8. С. 1735–1780.
Gers F.A., Schmidhuber J. Recurrent nets that time and count. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium. Como, Italy, 2000. Vol. 3. С. 189–194.
Cho K., van Merrienboer B., Gulcehre C., Bougares F., Schwenk H., Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP 2014). 2014.
Yao K., Cohn T., Vylomova K., Duh K., Dyer C. Depth-gated recurrent neural networks. arXiv, 2015. URL: http://arxiv.org/abs/1508.03790.
Koutník J., Greff K., Gomez F., Schmidhuber J. A clockwork RNN. 31st International Conference on Machine Learning, ICML 2014. 2014.
Greff K. et al. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems. 2016. Vol. 28, № 10. С. 2222–2232.
Jozefowicz R., Zaremba W., Sutskever I. An Empirical Exploration of Recurrent Network Architectures. Proceedings of the 32nd International Conference on Machine Learning. PMLR 37:2342–2350. 2015.
Xu K. et al. Show, attend and tell: Neural image caption generation with visual attention. International conference on machine learning. PMLR, 2015.