Data extraction for DIGIS
The main project objective is to devise approaches for the automated extraction of geochemical data and metadata from research papers and implement them prototypically pipeline for DIGIS. The project is fundet by the DFG.
The data ingestion into DIGIS relies heavily on the manual interpretation and categorization of papers for the GEOROC database. During this process, methodical descriptions and chemical and geographical information are extracted and mapped onto the GEOROC metadata format. Expert knowledge is needed to interpret the content of publications, categorize the data types according to the data systematics of the GEOROC database and identify critical metadata. The main objective of this project is, to support this process with an infrastructure prototype for information extraction (IE).
- DFG project number: 437919684
- Start: aprox. 07/2023
- End: aprox. 06/2026
- Results: TBD
- Code: TBD