Publications
Multilingual sentiment analysis using Latent Semantic Indexing and machine learning
Bader, Brett W.; Kegelmeyer, William P.; Chew, Peter A.
We present a novel approach to predicting the sentiment of documents in multiple languages, without translation. The only prerequisite is a multilingual parallel corpus wherein a training sample of the documents, in a single language only, have been tagged with their overall sentiment. Latent Semantic Indexing (LSI) converts that multilingual corpus into a multilingual "concept space". Both training and test documents can be projected into that space, allowing crosslingual semantic comparisons between the documents without the need for translation. Accordingly, the training documents with known sentiment are used to build a machine learning model which can, because of the multilingual nature of the document projections, be used to predict sentiment in the other languages. We explain and evaluate the accuracy of this approach. We also design and conduct experiments to investigate the extent to which topic and sentiment separately contribute to that classification accuracy, and thereby shed some initial light on the question of whether topic and sentiment can be sensibly teased apart. © 2011 IEEE.