Extracting word files to upload in my database

JCF · September 9, 2022, 5:23am

Hello everyone,
I am responsible (pedagogic one) for several degrees at the university.

Our administration has the habit of requesting a syllabus in the form of a word file from all the people in charge of a course.

This document contains headings such as “name of the course”, “course code”, “degrees that include this teaching”, “general objective”, “operational objective”, “course outline”, “bibliographical references”, “evaluation method”, etc.

It has the format of a kind of form, with headings followed by boxes for each text.

Obviously, this produces files that are as interactive as one could make them in the age of clay tablets, i.e., if I want to give my students a summary of the course name, its code, and its general purpose, I have to go and open the files one by one and copy and paste.

My goal would be to first extract these word files and integrate them into a small Corteza database that I could then query on the fields and criteria that interest me.

It seems to me that there is a way to name fields in a word form in order to be able to extract their content later, in a way that is the opposite of what we do when we make a mailing.

I searched a lot on the net but I must not have used the right keywords because I couldn’t find anything.

Do any of you have any idea how to do this or where to find such information?

Thank you and have a nice day,
Jean-Christophe

tjerman · September 12, 2022, 1:47pm

What you could consider is using automation scripts and a NPM package like word-extractor - npm (didn’t check the package, was top of the Google search ).

Automation scripts would give you enough flexibility to use custom node packages to work with word documents.

Alternatively, you could consider having a Python (or similar) script outside of Corteza to extract the required data and use the API to insert the data you need.