Skip to main content

snorkelflow_extensions.taxonomy_distillation.entities.Taxonomy

class snorkelflow_extensions.taxonomy_distillation.entities.Taxonomy(name: str, definition: str, context: str | None = None, examples: List[str] | None = None, properties: List[Property] | None = None)

Bases: object

A hierarchical classification system for text categorization.

Taxonomies represent tree-like classification structures where text is categorized through multiple levels of increasing specificity. Each taxonomy node contains a name, definition, contextual information, examples, and optional properties that define child classification dimensions.

The hierarchical structure enables top-down classification workflows where text is first assigned to broad categories, then progressively classified into more specific subcategories based on the properties defined at each level. This approach is particularly effective for complex classification tasks requiring multiple levels of granularity.

Taxonomies can be serialized to and from dictionary representations for persistence and interoperability with language model workflows.

__init__

__init__(name: str, definition: str, context: str | None = None, examples: List[str] | None = None, properties: List[Property] | None = None) None

Initialize a Taxonomy instance.

Creates a new taxonomy with a root category and optional subcategories. The taxonomy can be used for hierarchical text classification where text is progressively classified into more specific categories.

Parameters

NameTypeDefaultInfo
namestrClear name for the root category of the taxonomy.
definitionstrDetailed definition of the root category. More comprehensive definitions improve classification accuracy and taxonomy refinement.
contextstr, optionalShort description of the context in which classified text appears.
exampleslist of str, optionalList of text examples that fall under the root category.
propertieslist of , optionalList of Property objects defining subcategories under the root.

Methods

__init__(name, definition[, context, ...])Initialize a Taxonomy instance.
add_example(example)Add a text example to the taxonomy.
add_property(prop)Add a classification property to the taxonomy.
from_dict(taxonomy_dict)Create a Taxonomy instance from a dictionary.
parse_multiple_from_category_str(categories_str)Parse the accepted refinement string into a dictionary.
to_dict()Serialize the taxonomy to a dictionary representation.

add_example

add_example(example: str) None

Add a text example to the taxonomy.

Examples help define the boundaries of the taxonomy category and improve classification performance by providing concrete instances of text that belong to this category.

Parameters

NameTypeDefaultInfo
examplestrText example that belongs to this taxonomy category.

add_property

add_property(prop: Property) None

Add a classification property to the taxonomy.

Properties define categorical dimensions for hierarchical classification. If a property with the same name already exists, it will be replaced with the new property definition.

Parameters

NameTypeDefaultInfo
prop (Property) – Property object defining a classification dimension with its categories and definitions.

from_dict

classmethod from_dict(taxonomy_dict: Dict[str, Any]) Taxonomy

Create a Taxonomy instance from a dictionary.

The dictionary should have the following format:

{
"name": <name>,
"definition": <definition>,
"context": <context>,
"examples": [<example1>, <example2>, ...]
"properties": {
<property_name_1>: {
"name": <property_name_1>,
"definition": <definition>,
"categories": {
<class_name_1>: {
"name": <class_name>,
"context": <context>,
"definition": <definition>,
"examples": [<example1>, <example2>, ...]
"properties": {...}
},
<class_name_2>: {...},
...
}
},
<property_name_2>: {...},
...
}
}

Parameters

NameTypeDefaultInfo
taxonomy_dictdictDictionary representation of the taxonomy with the expected structure containing name, definition, context, examples, and properties.

Returns

New Taxonomy instance created from the dictionary data.

Return type

Taxonomy

parse_multiple_from_category_str

classmethod parse_multiple_from_category_str(categories_str: str) List[Taxonomy]

Parse the accepted refinement string into a dictionary.

The expected format of the accepted refinement string is:

---

** CATEGORY NAME **
Definition: definition that will help someone understand the category and how to determine if text belongs to it.
Example 1: "relevant example text"
Example 2: "relevant example text"
Example 3: "relevant example text"
Example 4: "relevant example text"
Example 5: "relevant example text"

---

Parameters

NameTypeDefaultInfo
categories_strstrFormatted string containing category definitions in the expected structure with triple-dash separators.

Returns

List of Taxonomy objects parsed from the categories string.

Return type

list of Taxonomy

to_dict

to_dict() Dict[str, Any]

Serialize the taxonomy to a dictionary representation.

Creates a nested dictionary structure containing all taxonomy data including properties and their associated categories. This format is suitable for JSON serialization and interoperability with external systems.

Dictionary structure:

{
"name": <name>,
"definition": <definition>,
"context": <context>,
"examples": [<example1>, <example2>, ...],
"properties": {
<property_name_1>: {
"name": <property_name_1>,
"definition": <definition>,
"categories": {
<category_name_1>: {
"name": <category_name>,
"definition": <definition>,
"context": <context>,
"examples": [<example1>, <example2>, ...],
"properties": {...}
},
<category_name_2>: {...},
...
}
},
<property_name_2>: {...},
...
}
}

Returns

The taxonomy formatted as a dictionary with nested structure containing name, definition, context, examples, and properties.

Return type

dict