texthero.representation.tfidf¶
-
tfidf
(s: pandas.core.series.Series, max_features=None, min_df=1, return_feature_names=False)¶ Represent a text-based Pandas Series using TF-IDF.
- Parameters
- sPandas Series
- max_featuresint, optional
Maximum number of features to keep.
- min_dfint, optional. Default to 1.
When building the vocabulary ignore terms that have a document frequency strictly lower than the given threshold.
- return_features_namesBoolean. Default to False.
If True, return a tuple (tfidf_series, features_names)
Examples
>>> import texthero as hero >>> import pandas as pd >>> s = pd.Series(["Sentence one", "Sentence two"]) >>> hero.tfidf(s) 0 [0.5797386715376657, 0.8148024746671689, 0.0] 1 [0.5797386715376657, 0.0, 0.8148024746671689] dtype: object
To return the feature_names:
>>> import texthero as hero >>> import pandas as pd >>> s = pd.Series(["Sentence one", "Sentence two"]) >>> hero.tfidf(s, return_feature_names=True) (0 [0.5797386715376657, 0.8148024746671689, 0.0] 1 [0.5797386715376657, 0.0, 0.8148024746671689] dtype: object, ['Sentence', 'one', 'two'])