Vectorizers¶
Vectorizers provide means for turning various column types and columns into fulltext search vector. While PostgreSQL inherently knows how to vectorize string columns, situations arise where additional vectorization rules are neede. This section outlines the process of creating and utilizing vectorization rules for both specific column instances and column types.
Type vectorizers¶
By default, PostgreSQL can only directly vectorize string columns. However, scenarios
may arise where vectorizing non-string columns becomes essential. For instance, when
dealing with an HSTORE column within your model
that requires fulltext indexing, a dedicated vectorization rule must be defined.
To establish a vectorization rule, use the vectorizer
decorator. The subsequent example demonstrates how to apply a vectorization rule to the
values within all HSTORE-typed columns present
in your models:
from typing import Any
from sqlalchemy import cast, ColumnClause, ColumnElement, func, Text
from sqlalchemy.dialects.postgresql import HSTORE
from sqlalchemy_searchable import vectorizer
@vectorizer(HSTORE)
def hstore_vectorizer(column: ColumnClause[Any]) -> ColumnElement[str]:
return cast(func.avals(column), Text)
The expression returned by the vectorizer is then employed for all fulltext indexed
columns of type HSTORE. Consider the following
model as an illustration:
from sqlalchemy.dialects.postgresql import HSTORE
from sqlalchemy.orm import Mapped, mapped_column
from sqlalchemy_utils import TSVectorType
class Article(Base):
__tablename__ = 'article'
id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
name_translations: Mapped[dict[str, str]] = mapped_column(HSTORE)
content_translations: Mapped[dict[str, str]] = mapped_column(HSTORE)
search_vector: Mapped[TSVectorType] = mapped_column(
TSVectorType(
"name_translations",
"content_translations",
)
)
In this scenario, SQLAlchemy-Searchable would create the following search trigger for the model using the default configuration:
CREATE FUNCTION
article_search_vector_update() RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector = to_tsvector(
'pg_catalog.english',
coalesce(CAST(avals(NEW.name_translations) AS TEXT), '')
) || to_tsvector(
'pg_catalog.english',
coalesce(CAST(avals(NEW.content_translations) AS TEXT), '')
);
RETURN NEW;
END
$$ LANGUAGE 'plpgsql';
Column vectorizers¶
Sometimes you may want to set special vectorizer only for specific column. This can be achieved as follows:
from typing import Any
from sqlalchemy import cast, func, Text
from sqlalchemy.dialects.postgresql import HSTORE
from sqlalchemy.orm import Mapped, mapped_column
class Article(Base):
__tablename__ = "article"
id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
name_translations: Mapped[dict[str, str]] = mapped_column(HSTORE)
search_vector: Mapped[TSVectorType] = mapped_column(
TSVectorType("name_translations")
)
@vectorizer(Article.name_translations)
def name_vectorizer(column: ColumnClause[Any]) -> ColumnElement[str]:
return cast(func.avals(column), Text)
Note
Column vectorizers always have precedence over type vectorizers.
API¶
- sqlalchemy_searchable.vectorizer = <sqlalchemy_searchable.vectorizers.Vectorizer object>¶
An instance of
Vectorizerthat keeps a track of the registered vectorizers. Use this as a decorator to register a function as a vectorizer.
- class sqlalchemy_searchable.Vectorizer(type_vectorizers: dict[type[TypeEngine], Callable[[ColumnClause[Any]], ColumnElement[str]]] | None = None, column_vectorizers: dict[Column[Any], Callable[[ColumnClause[Any]], ColumnElement[str]]] | None = None)[source]¶
- __call__(type_or_column: type[TypeEngine] | Column[Any] | InstrumentedAttribute[Any]) Callable[[Callable[[ColumnClause[Any]], ColumnElement[str]]], Callable[[ColumnClause[Any]], ColumnElement[str]]][source]¶
Decorator to register a function as a vectorizer.
- Parameters:
type_or_column – the SQLAlchemy database data type or the column to register a vectorizer for