Preprocessing
The texthero.preprocess module allow for efficient pre-processing of text-based Pandas Series and DataFrame.
|
Pre-process a text-based Pandas Series. |
Drop all rows without content. |
|
Return a list contaning all the methods used in the default cleaning pipeline. |
|
|
Return a Boolean Pandas Series indicating if the rows has content. |
Remove content within angle brackets <> and the angle brackets. |
|
Remove content within brackets and the brackets itself. |
|
Remove content within curly brackets {} and the curly brackets. |
|
|
Remove all diacritics and accents. |
|
Remove all digits and replace it with a single space. |
Remove html tags from the given Pandas Series. |
|
|
Replace all punctuation with a single space (” “). |
Remove content within parentheses () and parentheses. |
|
Remove content within square brackets [] and the square brackets. |
|
|
Remove all instances of words. |
|
Remove all urls from a given Pandas Series. |
|
Replace all urls with the given symbol. |
|
Remove any extra white spaces. |
|
Replace all punctuation with a given symbol. |
|
Replace all instances of words with symbol. |
|
Tokenize each row of the given Series. |