Improved Search Accuracy with New Embedding Models

Improved Search Accuracy with New Embedding Models

Embedding Awesomeness: How We're Powering Up Our AI at Visma Spcs.

I have been fortunate enough to land a spot as Data Science Intern at Visma Spcs in Växjö, Sweden. In our recent work with AI at Visma Spcs I’ve had the opportunity to experiment with the newly released embedding models from OpenAI. The results were great and I am writing here to share our thoughts with you.

What are Embeddings and Why Do They Matter?

Embeddings are a powerful technique in natural language processing (NLP) that transform words, audio and phrases, or even entire documents into numerical vectors. These vectors live in a high-dimensional space where the distances and relationships between them capture the semantic meaning of the original text.

By representing text as numbers, embeddings enable computers to understand language more like humans do, leading to more accurate and sophisticated tasks like search, translation, and question answering. The closer embeddings are to each other, the more semantically similar the text is considered.

OpenAI announced their coming release of new models in January and we at Innovation and Technology Team have eagerly been awaiting their release through Azure OpenAI.

Recently text-embedding-3-large and text-embedding-3-small were finally made available and we jumped straight into testing them.

Initial Testing and Impressions

We have previously been testing the ada–002 embedding model. While it has served us well it has had issues finding the relevant content in certain cases, for example in retrieving documents containing terms like "client," "customer," or "member," which often have overlapping meanings.

These newly released models are a significant step forward. In our testing, they've consistently outperformed ada–002, resulting in a noticeable accuracy increase in the article suggestions they produce.

Article content


These boxplots show how our text-3-large, text-3-small and ada-002 (current model) are spreading their score within their top 15 articles that they are suggesting as answers to some, for ada-002, difficult questions.

We can observe that both text-3-large and text-3-small have a far wider spread in their scores. If we focus on the top 3 choices of each model we can see that ada-002 has a very limited span in its relevance scoring. It is never under 0.8 even at the bottom of its lower quartile. Its top quartile scores are almost identical across rankings.

Article content

Meanwhile, if we instead look at our new models, we notice that their boxes have significant height, meaning their scores are more widely spread. Text-3-large has a range, on its rank 1, between 0.3-0.75. There is also a more distinct step down from rank 1 both for text-3-large and text-3-small while for ada-002 it is almost indistinguishable.

It is our overall impression that text-3-large especially is a big leap forward in the capacity of text embeddings. We served the models a whole battery of ambiguous questions. Text-3-large consistently found the correct answer in its top 2 suggestions. We believe it will have a positive impact on our AI projects performance while also giving us the ability to send back less context since the model will find the correct answer in the top.

OpenAI themselves report big improvements in known embeddings benchmarks.

Both new models have been trained with a technique called Matryoshka Representation Learning, giving developers the ability to trade-off performance versus cost of embeddings usage. The default text-3-large 3072 embedding vector can according to the MTEB benchmark be shortened to 256 while still beating ada-002 full-size 1536 embedding in performance.

Next Steps

We are very much looking forward to further testing these new models and to bringing them into production. Dimension reduction also opens up possibilities for tuning performance vs cost, which of course is of high interest to many developers. 

Best regards,

Anna at Visma Spcs Innovation & Technology Team

Linnea Backgard

Data Scientist | Machine Learning, Statistics, Python, R, Power BI, Alteryx

1y

Så grym du är!!

Like
Reply
Antonio Prgomet

AI | Negotiations | Education | IT | Business

1y

Spännande!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics