This project focuses on developing a semantic similarity model capable of computing relationships between textual inputs while also estimating the confidence of its predictions. Traditional models typically output a similarity score without providing insight into how confident they are in those predictions—an important limitation for high-stakes or mission-critical applications.
To address this, the project incorporates uncertainty quantification techniques to enhance the reliability and transparency of semantic similarity models. It explores advanced methods such as Monte Carlo Dropout, Bayesian Neural Networks, and information-theoretic metrics to capture and quantify prediction uncertainty.
Models will be evaluated using widely recognized benchmark datasets, including Semantic Textual Similarity (STS) and Quora Question Pairs (QQP).
Objectives
- Analyze current methods for semantic similarity and uncertainty quantification
- Identify suitable architectures for combining similarity scoring with confidence estimation
- Investigate the use of knowledge graph-based techniques for deeper contextual understanding
- Develop a working prototype that outputs both similarity scores and associated uncertainty
- Compare the proposed approach against baseline models to assess effectiveness
Impact
This project aims to advance the development of more transparent and trustworthy AI systems for applications where accuracy and confidence are critical, such as legal, healthcare, and decision support systems.