texthero.preprocessing.tokenize¶
-
tokenize
(s: pandas.core.series.Series) → pandas.core.series.Series¶ Tokenize each row of the given Series.
Tokenize each row of the given Pandas Series and return a Pandas Series where each row contains a list of tokens.
Algorithm: add a space between any punctuation symbol at exception if the symbol is between two alphanumeric character and split.
Examples
>>> import texthero as hero >>> import pandas as pd >>> s = pd.Series(["Today you're looking great!"]) >>> hero.tokenize(s) 0 [Today, you're, looking, great, !] dtype: object