In case it's helpful, here's a really quick-and-dirty python library for segmenting legal texts (judicial opinions, statutes, etc) into sentences. It relies on hand-coded rules, and checks against Bluebook acronyms/abbreviations.
github.com/neelguha/legal…
We’re excited to share Embroid: a method for “stitching” together an LLM with embedding information from multiple smaller models (e.g., BERT), allowing us to automatically correct LLM predictions without supervision.
✍️: hazyresearch.stanford.edu/blog/2023-08-1…
📜: arxiv.org/abs/2307.11031
We’re beyond excited to share the first release of LegalBench–a collaboratively constructed open-source benchmark for evaluating legal reasoning in English large language models.
🔗hazyresearch.stanford.edu/legalbench/
📜arxiv.org/abs/2308.11462