The Computational Linguistics group (GroNLP) of the Center for Language and Cognition Groningen (CLCG) is looking for a PhD student in “Language technology for cultural heritage: New discoveries with little data” within the HAICu research project. This specific PhD position is about effectively dealing with missing and sparse labels in humanities datasets such as literature, history, philosophy. Cultural heritage institutions, and especially the National Library of the Netherlands, offer access to a lot of digitized data which can be leveraged through computational approaches. However, it is very common that the data is incomplete. This is a challenge for typical machine learning methods that rely on being fed with representative and complete data, leading to systems that cannot handle distribution shifts or extrapolating beyond their training set.
The project will, in collaboration with the National Library of The Netherlands, be coordinated by Andreas van Cranenburgh, Tommaso Caselli, and Malvina Nissim at the University of Groningen. This is an interdisciplinary project at the intersection of Computational Linguistics/Natural Language Processing (NLP) and the humanities.
Within the dynamic HAICu team, the PhD researcher will participate in Work Package 3 (WP3), titled “Learning from sparse examples”. In this WP, we will collaborate with AI and machine learning experts from the University of Tilburg, the Fontys Hogeschool as well as other partners, in addition to the aforementioned Dutch National Library.
Apply here!