texthero.preprocessing.remove_stopwords¶
-
remove_stopwords
(input: pandas.core.series.Series, stopwords: Union[Set[str], NoneType] = None, remove_str_numbers=False) → pandas.core.series.Series¶ Remove all instances of words.
By default uses NLTK’s english stopwords of 179 words:
- Parameters
- inputPandas Series
- stopwordsSet[str], Optional
Set of stopwords string to remove. If not passed, by default it used NLTK English stopwords.
Examples
Using default NLTK list of stopwords:
>>> import texthero as hero >>> import pandas as pd >>> s = pd.Series("Texthero is not only for the heroes") >>> hero.remove_stopwords(s) 0 Texthero heroes dtype: object
Add custom words into the default list of stopwords:
>>> import texthero as hero >>> from texthero import stopwords >>> import pandas as pd >>> default_stopwords = stopwords.DEFAULT >>> custom_stopwords = default_stopwords.union(set(["heroes"])) >>> s = pd.Series("Texthero is not only for the heroes") >>> hero.remove_stopwords(s, custom_stopwords) 0 Texthero dtype: object