Dutch AI for the Dutch Language (NAIN)

Initiative by:

Nederlands Forensisch Instituut, Ministerie van Justitie en Veiligheid, Politie, Openbaar Ministerie, Gemeente Den Haag, Universiteit Leiden, TU Delft, TNO, KPN en Security Delta (HSD)

Status:

In progress

The consortium Dutch AI for the Dutch Language (NAIN) was founded because several public and private parties in the Netherlands see opportunities for Dutch speech models and AI applications based on these models. However, the capacities that they as individual organizations have, to improve the language models are limited as they do not have sufficient knowledge, training data, or specialized hardware in-house, or because the algorithms still need further development. In addition, generic language models have also proven not to be sufficiently accurate to be used in specific cases, thus requiring a lot of customization. For individual public parties, it is not feasible to develop this customization themselves, and for private parties, it is not profitable to improve both the generic and the specific language models for smaller parties.

The development of improvements in language and speech technology facilitates its application in the security domain, among others. This encourages not only the application, but also further innovation within this field which ultimately contributes to a more efficient use of language and speech technology for security. One example is the automatic transcription of recorded police reports. This saves officers on the street time spent writing or typing and gives colleagues in the control room complete information that they can look back on later and base decisions on.

What does NAIN do?

NAIN takes Dutch language and speech technology to the next level, where the developed models and technologies are sovereign, inclusive, diverse, and transparent. In order to realize this, NAIN focuses on the following points: the further development of privacy-enhancing technologies to enable data sharing of privacy-sensitive language and speech data, establishing and tightening legal and ethical frameworks for the use of NLP and ASR, and joining or developing infrastructures to unlock the next generation of Dutch language and speech technology.

Who is involved in NAIN and how do they contribute?

NAIN solves the problem of the low quality of Dutch language and speech technology which, in its current state, is not well or not at all deployable in many sectors. The consortium is unique because of its size in terms of sectors (multiple sectors are involved) and type of institutions (public, private, startups, knowledge, and educational institutions, and public and private organizations in Flanders). Sectors represented in NAIN include health, new media, education, commerce, and security. The consortium is also connected to the Dutch Speech Coalition (NL AIC). Joint efforts are being made to bring together current initiatives (to prevent duplication of work and make efficient use of project costs), to build up knowledge of language and speech technology, to achieve sovereignty (greater control and independence from large foreign commercial parties), and inclusion (better performance from language and speech technology where dialects, accents, and slang are concerned). Ultimately, this should lead to both a generic and several specific Dutch language models of high quality, improved algorithms (incorporated in software), a licensing model or other forms of disclosure of the developed language models, and legal and ethical frameworks for use of language and speech technology in the Netherlands.

To what applications can the technologies developed within NAIN contribute?

Applications such as the automated processing of wiretaps, reports, interrogations, reports, 112 calls, and much more are now possible. One prospect for application in the media sector, for example, is that speech technology can be used to put automated subtitles under movies. In healthcare, the prospect looms that healthcare providers will verbally report what they do during their work, after which it will be automatically processed and stored as administrative reporting and accountability. Within the NL AIC, cross-sectoral applications are handled in close cooperation with the other sectoral working groups.

The Ministry of Justice and Security also sees a great interest in the use of speech recognition, but simultaneously observes that there is difficulty in gaining a clear understanding of what preconditions should apply for the responsible use of such technologies. It therefore wants to have these conditions better mapped out. To this end, Considerati is going to perform a legal analysis to gain a better understanding of how speech recognition can be applied in a legally responsible manner. This could involve subjects such as consent and how this can be implicitly and explicitly requested and accepted, but also, for example, how to deal with collected data.

Want to know more about NAIN?

In the November 2021 landscape map, the NAIN consortium, with support from the Zuid-Holland AI hub, presents the current state of language and speech technology in the Netherlands and Flanders. From this landscape map, work can continue over the next five years to develop state-of-the-art sovereign Dutch language and speech technology that is inclusive, diverse, transparent, explainable, and to which domain-specific extensions can be linked. The final results of this project will be useful everywhere in Dutch society and the Dutch security domain. During this time, information sessions will be organized where you can learn more about NAIN, ongoing projects, and developed applications. More information will be available on the HSD website and social media channels.

Do you have any other questions or comments about the NAIN consortium? Or would you like to receive the NAIN newsletter to learn more? Please contact Paul Coumans at paul.coumans@securitydelta.nl.