SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings.
Install the Sentence Transformers library.
pip install -U sentence-transformers
The usage is as simple as:
from sentence_transformers import SparseEncoder
# 1. Load a pretrained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
# The sentences to encode
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
# 2. Calculate sparse embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 30522] - sparse representation with vocabulary size dimensions
# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 35.629, 9.154, 0.098],
# [ 9.154, 27.478, 0.019],
# [ 0.098, 0.019, 29.553]])
# 4. Check sparsity stats
stats = SparseEncoder.sparsity(embeddings)
print(f"Sparsity: {stats['sparsity_ratio']:.2%}")
# Sparsity: 99.84%
Hugging Face makes it easy to collaboratively build and showcase your Sentence Transformers models! You can collaborate with your organization, upload and showcase your own models in your profile ❤️
To upload your SparseEncoder models to the Hugging Face Hub, log in with huggingface-cli login
and use the push_to_hub
method within the Sentence Transformers library.
from sentence_transformers import SparseEncoder
# Load or train a model
model = SparseEncoder(...)
# Push to Hub
model.push_to_hub("my_new_model")
Note that this repository hosts for now only examples of sparse-encoder models from the SentenceTransformers package that can be easily reproduced with the different training script examples.
More details at Sparse Encoder > Training Examples for the examples scripts and Sparse Encoder > Pretrained Models for the community pre-trained models, that you can also found for some of them in the following collections.