![]() The reason why stop words are critical to many applications is that, if we remove the words that are very commonly used in a given language, we can focus on the important words instead. Stop words are basically a set of commonly used words in any language, not just English.We also add few stop-words to the standard list. Next we remove all the stop-words present in the comments using the default set of stop-words that can be downloaded from NLTK library.Import nltk from rpus import stopwords from import SnowballStemmer import re import sys import warnings data = data_raw if not sys.warnoptions: warnings.simplefilter("ignore") def cleanHtml(sentence): cleanr = re.compile('') cleantext = re.sub(cleanr, ' ', str(sentence)) return cleantext def cleanPunc(sentence): #function to clean the word of any punctuation or special characters cleaned = re.sub(r'',r'',sentence) cleaned = re.sub(r'',r' ',cleaned) cleaned = cleaned.strip() cleaned = cleaned.replace("\n"," ") return cleaned def keepAlpha(sentence): alpha_sent = "" for word in sentence.split(): alpha_word = re.sub(' ', ' ', word) alpha_sent = alpha_word alpha_sent = " " alpha_sent = alpha_sent.strip() return alpha_sent data = () data = data.apply(cleanHtml) data = data.apply(cleanPunc) data = data.apply(keepAlpha) We first convert the comments to lower-case and then use custom made functions to remove html-tags, punctuation and non-alphabetic characters from the comments. ![]() Whereas, an instance of multi-label classification can be that a text might be about any of religion, politics, finance or education at the same time or none of these. For example, multi-class classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.Difference between multi-class classification
0 Comments
Leave a Reply. |