snorkelflow_extensions.taxonomy_distillation.entities.Taxonomy
- class snorkelflow_extensions.taxonomy_distillation.entities.Taxonomy(name: str, definition: str, context: str | None = None, examples: List[str] | None = None, properties: List[Property] | None = None)
Bases:
object
A hierarchical classification system for text categorization.
Taxonomies represent tree-like classification structures where text is categorized through multiple levels of increasing specificity. Each taxonomy node contains a name, definition, contextual information, examples, and optional properties that define child classification dimensions.
The hierarchical structure enables top-down classification workflows where text is first assigned to broad categories, then progressively classified into more specific subcategories based on the properties defined at each level. This approach is particularly effective for complex classification tasks requiring multiple levels of granularity.
Taxonomies can be serialized to and from dictionary representations for persistence and interoperability with language model workflows.
- __init__(name: str, definition: str, context: str | None = None, examples: List[str] | None = None, properties: List[Property] | None = None) None
Initialize a Taxonomy instance.
Creates a new taxonomy with a root category and optional subcategories. The taxonomy can be used for hierarchical text classification where text is progressively classified into more specific categories.
Parameters
Parameters
Name Type Default Info name str
Clear name for the root category of the taxonomy. definition str
Detailed definition of the root category. More comprehensive definitions improve classification accuracy and taxonomy refinement. context str, optional
Short description of the context in which classified text appears. examples list of str, optional
List of text examples that fall under the root category. properties list of , optional
List of Property objects defining subcategories under the root.
\_\_init\_\_
__init__
Methods
__init__
(name, definition[, context, ...])Initialize a Taxonomy instance. add_example
(example)Add a text example to the taxonomy. add_property
(prop)Add a classification property to the taxonomy. from_dict
(taxonomy_dict)Create a Taxonomy instance from a dictionary. parse_multiple_from_category_str
(categories_str)Parse the accepted refinement string into a dictionary. to_dict
()Serialize the taxonomy to a dictionary representation. - add_example(example: str) None
Add a text example to the taxonomy.
Examples help define the boundaries of the taxonomy category and improve classification performance by providing concrete instances of text that belong to this category.
Parameters
Parameters
Name Type Default Info example str
Text example that belongs to this taxonomy category.
add\_example
add_example
- add_property(prop: Property) None
Add a classification property to the taxonomy.
Properties define categorical dimensions for hierarchical classification. If a property with the same name already exists, it will be replaced with the new property definition.
add\_property
add_property
- classmethod from_dict(taxonomy_dict: Dict[str, Any]) Taxonomy
Create a Taxonomy instance from a dictionary.
The dictionary should have the following format:
{
"name": <name>,
"definition": <definition>,
"context": <context>,
"examples": [<example1>, <example2>, ...]
"properties": {
<property_name_1>: {
"name": <property_name_1>,
"definition": <definition>,
"categories": {
<class_name_1>: {
"name": <class_name>,
"context": <context>,
"definition": <definition>,
"examples": [<example1>, <example2>, ...]
"properties": {...}
},
<class_name_2>: {...},
...
}
},
<property_name_2>: {...},
...
}
}
from\_dict
from_dict
- classmethod parse_multiple_from_category_str(categories_str: str) List[Taxonomy]
Parse the accepted refinement string into a dictionary.
The expected format of the accepted refinement string is:
---
** CATEGORY NAME **
Definition: definition that will help someone understand the category and how to determine if text belongs to it.
Example 1: "relevant example text"
Example 2: "relevant example text"
Example 3: "relevant example text"
Example 4: "relevant example text"
Example 5: "relevant example text"
---
parse\_multiple\_from\_category\_str
parse_multiple_from_category_str
- to_dict() Dict[str, Any]
Serialize the taxonomy to a dictionary representation.
Creates a nested dictionary structure containing all taxonomy data including properties and their associated categories. This format is suitable for JSON serialization and interoperability with external systems.
Dictionary structure:
{
"name": <name>,
"definition": <definition>,
"context": <context>,
"examples": [<example1>, <example2>, ...],
"properties": {
<property_name_1>: {
"name": <property_name_1>,
"definition": <definition>,
"categories": {
<category_name_1>: {
"name": <category_name>,
"definition": <definition>,
"context": <context>,
"examples": [<example1>, <example2>, ...],
"properties": {...}
},
<category_name_2>: {...},
...
}
},
<property_name_2>: {...},
...
}
}
to\_dict
to_dict