Arize AI announced today that it is making embedding drift monitoring available to all customers and free users of the company’s leading machine learning (ML) observability platform.
The debut follows a beta in which over 20 enterprises and startups tested Arize’s embedding drift monitoring across billions of model predictions, resulting in over $10 million in savings from improved model performance and reductions in labeling costs.
Arize’s full rollout of embedding drift monitoring comes at a time of great need in the industry. Despite investing billions in computer vision and natural language processing models to do everything from detecting cancer to improving crop yields, most organizations still lack visibility into what is happening when unstructured models are put into production. Since metrics typically used to measure changes in data distributions (drift) in structured data simply do not extend to unstructured data, ML teams often miss upstream data quality issues and new patterns in the data before they impact model performance and business results.
Arize’s tool helps ameliorate this problem by enabling ML teams to easily compare embeddings (vector representations of data) across different periods of time using sensitive and scalable metrics like euclidean distance and cosine distance. Leveraging this unique approach to embedding drift monitoring, ML practitioners can now better identify new patterns in the data, prioritize what to label next, and focus retraining efforts to proactively improve model performance.
“Gone are the days of shipping CV and NLP models blind,” notes Jason Lopatecki, CEO and Co-Founder of Arize. “We’re proud to make embedding drift measurement available to all after a year of extensive research and development on a large variety of scenarios and data.”