LangExtract, an Information Extraction Library #
Released at the end of July 2025, LangExtract came out just days before this page’s initial launch.
Setup: #
| LLM | Gemini v2.5 Flash |
|---|---|
| Text | REGULATION (EU) 2022/2554 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL (DORA): - CHAPTER II - ICT risk management, Section I, Article 5, 2nd paragraph |
Objective: #
- Explore capabilities of LangExtract
- Assess accurancy
Test Cases #
Test Case 01: Derive any type of obligations from a unstructered text #
In order to further restrict the scope of the data to be extracted, we focus on any obligations that can be derived from an unstructured text. We would like to have LangExtract returning us triplet of the following structure:
- actor: Who needs to fulfill the obligation
- obligation: The actual oblication
- reason:: What justifies this obligation
I provide LangExtract with the following example:
examples = [
lx.data.ExampleData(
text="1. Financial entities shall have a sound, comprehensive and well-documented ICT risk management framework as part of their overall risk management system, which enables them to address ICT risk quickly, efficiently and comprehensively and to ensure a high level of digital operational resilience.",
extractions=[
lx.data.Extraction(
extraction_class="actor",
extraction_text="Financial entities",
attributes={"attributes": "None"}
),
lx.data.Extraction(
extraction_class="obligation",
extraction_text="shall have a sound, comprehensive and well-documented ICT risk management framework as part of their overall risk management system",
attributes={"key words": "shall"}
),
lx.data.Extraction(
extraction_class="reason",
extraction_text="to ensure a high level of digital operational resilience",
attributes={"attributes": "None"}
)
]
)
]
LangExtract returns data in JSONL format which can be visualized with a HTML page. Given that this is a first shot I find the results quite impressive. Success rate in regards to the correct identification of the three objects should be > 90%
Click on below image to view a larger and interactive version of the graph in a new tab.
Click to open interactive Graph in new tab