Most existing systems employ a software-driven workflow to achieve their development goals. They ignore data engineering completely. This is a fatal mistake, when it comes to knowledge intensive development projects such as AI.
While a plethora of tools is available to manage software issues, around 80% of the effort in AI goes into data engineering with hardly any tool support for acquisition, data quality curation and validation or integration.
In the presentation, we elaborate on the agile data engineering methodology that is supported by the [DBpedia Databus] (http://databus.dbpedia.org).
In the theoretical part, we will introduce the notion of abstract datasets that follow the Maven model and show how rich metadata can be exploited to structure web files in an interoperable way.
Towards the end, we will present practical use cases that have been realized with the DBpedia Databus and introduce some very useful tools for knowledge engineers that speed up the data massaging for AI by a factor of 50.