I work in the Digital Humanities and my experience is that typically Computer Science, Information Science and Data Science are not well prepared to work with Humanities data. Some commonplace challenges:
- the methodologies used in the humanities like semiotics, phenomenology, etc. often do not allow for the level of formalisation that a computer science model would require
- (probably a consequence of the above) data in the humanities is rarely quantitative and much more often qualitative, i.e. nominal and categorical if structured at all. That’s why for example a lot of attention is paid recently to language models, but repeatedly we find out that these have undesirable (inadequate) biases
- a particularly big issue is that historical data is much more scarce than data scientists would like, and often it is not digitised or digitised with poor quality. As a consequence established machine learning approaches cannot be trained
There’s much more to it, but these are the most immediate challenges that come to my mind.
https://github.com/pgp/XFiles is what I’ve been using and am pretty happy about it