texthero.preprocessing.tokenize¶

tokenize(s: pandas.core.series.Series) → pandas.core.series.Series¶

Tokenize each row of the given Series.

Tokenize each row of the given Pandas Series and return a Pandas Series where each row contains a list of tokens.

Algorithm: add a space between any punctuation symbol at exception if the symbol is between two alphanumeric character and split.

Examples

>>> import texthero as hero
>>> import pandas as pd
>>> s = pd.Series(["Today you're looking great!"])
>>> hero.tokenize(s)
0    [Today, you're, looking, great, !]
dtype: object