Vector Similarity Functions
When creating a vector index in Upstash Vector, you have the flexibility to choose from different vector similarity functions. Each function yields distinct query results, catering to specific use cases. Here are the three supported similarity functions:
The score returned from query requests is a normalized value between 0 and 1, where 1 indicates the highest similarity and 0 the lowest regardless of the similarity function used.
Cosine Similarity
Cosine similarity measures the cosine of the angle between two vectors. It is particularly useful when the magnitude of the vectors is not essential, and the focus is on the orientation.
Use Cases:
- Natural Language Processing (NLP): Ideal for comparing document embeddings or word vectors, as it captures semantic similarity irrespective of vector magnitude.
- Recommendation Systems: Effective in recommending items based on user preferences or content similarities.
Score calculation:
(1 + cosine_similarity(v1, v2)) / 2;
Euclidean Distance
Euclidean distance calculates the straight-line distance between two vectors in a multi-dimensional space. It is well-suited for scenarios where the magnitude of vectors is crucial, providing a measure of their spatial separation.
Use Cases:
- Computer Vision: Useful in image processing tasks, such as image recognition or object detection, where the spatial arrangement of features is significant.
- Anomaly Detection: Valuable for detecting anomalies in datasets, as it considers both the direction and magnitude of differences between vectors.
Score calculation:
1 / (1 + squared_distance(v1, v2))
Dot Product
The dot product measures the similarity by multiplying the corresponding components of two vectors and summing the results. It provides a measure of alignment between vectors. Note that to use dot product, the vectors needs to be normalized to be of unit length.
Use Cases:
- Machine Learning Models: Commonly used in machine learning for tasks like sentiment analysis or classification, where feature alignment is critical.
- Collaborative Filtering: Effective in collaborative filtering scenarios, such as recommending items based on user behavior or preferences.
Score calculation:
(1 + dot_product(v1, v2)) / 2