snorkelflow_extensions.taxonomy_distillation.models.huggingface.HuggingfaceTextEncoder
- class snorkelflow_extensions.taxonomy_distillation.models.huggingface.HuggingfaceTextEncoder(model_name: str, batch_size: int = 32)
Bases:
object
Text encoder using Hugging Face sentence-transformers models.
This encoder wraps sentence-transformers models from the Hugging Face model hub to generate dense vector representations of text. The encoded vectors serve as input features for downstream classification tasks in the hierarchical text classification pipeline.
Supports batch processing for efficient encoding of multiple texts and automatic device management for GPU acceleration when available.
- __init__(model_name: str, batch_size: int = 32) None
Initialize the huggingface text encoder.
Parameters
Parameters
Name Type Default Info model_name The model name. Must be a sentence-transformers model from the Hugging Face model hub: https://huggingface.co/models?library=sentence-transformers batch_size Optional
The batch size to use for encoding. Default is 32. verbose The verbosity level of the encoder. Default is 0. Returns: None
\_\_init\_\_
__init__
Methods
__init__
(model_name[, batch_size])Initialize the huggingface text encoder. encode_text
(text)Transform a single text. encode_texts
(texts)Transform a list of texts. get_embedding_dim
()Get the embedding dimension. - encode_text(text: str) Tensor
Transform a single text. The fit method must be called before calling this method.
Parameters
Parameters
Name Type Default Info text The text to transform. Returns: The transformed text.
encode\_text
encode_text