Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries
Nigam, Shubham Kumar, Dubey, Tanmay, Shallum, Noel, Bhattacharya, Arnab
–arXiv.org Artificial Intelligence
Legal precedent retrieval is a cornerstone of the common law system, governed by the principle of stare decisis, which demands consistency in judicial decisions. However, the growing complexity and volume of legal documents challenge traditional retrieval methods. TraceRetriever mirrors real-world legal search by operating with limited case information, extracting only rhetorically significant segments instead of requiring complete documents. Our pipeline integrates BM25, Vector Database, and Cross-Encoder models, combining initial results through Reciprocal Rank Fusion before final re-ranking. Rhetorical annotations are generated using a Hierarchical BiLSTM CRF classifier trained on Indian judgments. Evaluated on IL-PCR and COLIEE 2025 datasets, TraceRetriever addresses growing document volume challenges while aligning with practical search constraints, reliable and scalable foundation for precedent retrieval enhancing legal research when only partial case knowledge is available.
arXiv.org Artificial Intelligence
Aug-4-2025
- Country:
- Asia
- India
- Maharashtra > Pune (0.04)
- Uttar Pradesh > Kanpur (0.04)
- West Bengal > Kolkata (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- India
- Europe
- Spain > Galicia
- Madrid (0.04)
- Switzerland (0.04)
- Spain > Galicia
- North America
- Canada > Ontario
- Toronto (0.04)
- United States > Florida
- Miami-Dade County > Miami (0.04)
- Canada > Ontario
- Asia
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Law (1.00)
- Technology: