Social Media Preprocessing Report

860 Words4 Pages

Today's generation is prone to social media. Social media is a term describing user-generated content that can be shared with others online .As it is becoming popular, everyone’s opinion is digitized. There has been a huge growth in the use of social media over recent years due to increase in broadband access, high street availability of powerful computers, and new website technologies that make content sharing easier. The most popular social media platforms are used by many millions of visitors. The number of people around the world who use internet has witnessed an increase of approximately 40% since 1995 and reached a count of 3.2 billion With the advent of 21st century and huge boom in Internet, social media platforms like Blogs, Facebook …show more content…

We use twitter as a source of the required data as it has become the most favorite platform for people to express their views. The main objectives of this paper is to implement Opinion Mining using enhanced preprocessing steps which is the proposed solution. Preprocessing is vital as it involve hashtag priority and multilingual data. Hence it is divided in two phases. In the first phase, all slangs, emoticons are removed and tweet is converted into plain and simple text. In the next phase, stop words are removed and feature vector is extracted. Another objective is data collection should be keyword based so that it should effectively extract a feature vector which can be used in classification techniques directly and also should maintain the integrity and authentication of data gathered and should involve important words and remove the words of no effect. With the limited set of 140 characters, users on twitter use all kinds of emoticons, slangs to express them better in the limited set of available characters. This is the major problem of opinion mining but the results are more accurate as the data is more authentic. There are three classification techniques used for solving this purpose i.e. Naïve Bayes classification, Support Vector Machine and Maximum Entropy. In Naive Bayes, models that assign class label to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. Whereas Support Vector Machine(SVM) is a machine learning tool that is based on the idea of large margin data classification and Maximum Entropy is rooted in information theory, the mem seeks to extract as much information from a measurement as is justified by the data's signal-to-noise