- Published on
Meritis is looking for a Data Engineer to join their Data team. The mission objective is to design an automated pipeline for extracting, cleaning, structuring, and enriching textual knowledge derived from heterogeneous documents (PDF, Word, HTML), in a usable pivot format (enriched Markdown or JSON). Responsibilities include extracting text from various formats using appropriate libraries, normalizing content (OCR, noise removal, formatting), segmenting documents into structured knowledge units, enriching content with metadata (source, date, theme, typology), and developing a reusable and automated pipeline in collaboration with Knowledge Management teams.