GPT-NL: Secure & Ethical AI to Power Dutch Society
The Netherlands will develop its own open language model: GPT-NL. Non-profit parties TNO, NFI and SURF jointly develop the model in order to take an important step towards transparent, fair and verifiable use of Artificial Intelligence (AI) in accordance with Dutch and European values and guidelines and with respect for data ownership. Recently the consortium, facilitated by the NL AIC and HSD via the Security, Peace and Justice working group, received funding of 13.5 million euros Ministry of Economic Affairs and Climate Policy/RVO for executing this project.
With the launch of ChatGPT in 2022, the power of AI and Large Language Models (LLMs) was revealed to the masses for the first time. Many discovered the benefits of the technology, but several issues regarding companies like OpenAI and the technology behind their solutions call for serious caution. For example, they lack transparency about the algorithms and the datasets used, making it impossible to audit or hold them accountable for potential unethical or harmful output. It’s also unclear what happens with the information we feed into the model and who has access to it, so we can’t assume that our privacy is respected.
In addition, the quality of the output depends not only on the quality of the datasets a model is trained on, but also on the quantity of data. This is an issue for languages such as Dutch, spoken by about 22 million people worldwide. Most if not all LLMs are trained on datasets that contain very little Dutch data, which impacts the quality of the Dutch output. What the Netherlands does have, is a strong research and knowledge base on AI to build on, an excellent network structure covering relevant public, private and academic partners and a solid digital infrastructure. Combined with a growing need for a strong native Dutch LLM that complies with Dutch and European regulations concerning privacy and ethics, is transparent about the used algorithms and datasets, and adheres to Dutch cultural norms. This led to the project of developing GPT-NL.
Limitations current language models
Netherlands Forensics Institute, the initiator of the project, has a strong history of using LLMs. They use these models for a variety of purposes, such as to analyse large amounts of data for evidence of criminal activities. “Language models have been indispensable for investigative work for many years,” says Erwin van Eijk, Head of the Digital and Biometric traces division at NFI. “It’s impossible for humans to analyse the huge amounts of data within the limited time frame, as is required in our work. Additionally, AI is used to protect investigators from unnecessary exposure to traumatising content. But our language models have limitations because we don’t have sufficient resources to develop more expansive technology, which is especially needed as messaging in criminal circuits evolves to become more and more cryptic. We do, however, have a solid fundament of available data, algorithms, expertise and experience to build on for the GPT-NL project.” Erwin continues.
Connecting the AI ecosystem
Using language models such as ChatGPT is practically impossible for NFI, because the results of the models are used in criminal cases and should therefore be transparent in their working and compliance to legal regulations. But the concerns regarding existing LLM’s applies to a much wider range of organisations and applications. Erwin therefore sees the potential for many organisations in the Netherlands, from public, private and academic sectors, to benefit from a more expansive Dutch native language model.
Erwin: “In order to get access to the resources required for this project, we needed to join forces with other organisations and define a collective purpose. Security Delta (HSD), the Dutch security cluster, as well as the Netherlands AI Coalition (NL AIC), saw the urgency and the potential of a Dutch AI language model from the start. They are extremely well-connected and helped get the relevant organisations on board to make this project happen.”
Snellius: The Dutch National Supercomputer
LLM’s require very high computational power and an advanced hardware infrastructure. “As security cluster, we knew the perfect partner to facilitate that infrastructure”, says Joris den Bruinen, head of Security Delta (HSD) and the working group Security, Peace and Justice of the NL AIC. “In SURF, educational institutions and research institutes join forces to develop and purchase digital services. It’s a public organisation, which is built around the need for shared access to digital infrastructure and research data. They have the Dutch National Supercomputer Snellius, which is the flagship of SURF's research services and the trust required for a wide range of partners to share their data sets on their platform,” Joris concludes.
Benefits for Dutch society
ChatGPT presents numerous potential benefits for Dutch society. “As Erwin already mentioned, there is a large number of potential applications for GPT-NL, and to be clear, the project does not include developing models for specific applications; it focuses on building the structural fundament on which an infinite number of tailored models can be built,” Saskia Lenskink, NLP specialist at TNO explains. “Several governmental organisations can benefit from GPT-NL, if only to help tailor their communication to their citizens' vocabulary,” Joris den Bruinen adds. “The language model developed for the GPT-NL project will be operated based on a licensing structure, with different rates for academic, non-commercial, and commercial use.", so that companies, including start-ups, can develop commercial applications atop the fundament. This provides sovereignty in Dutch products and services, resulting in economic value,” he continues.
Some examples are in healthcare environments, where such a model could support medical professionals by for example summarizing transcripts of conversations with patients, which requires the data to be securely stored according to European privacy law. In education, we see that current AI models provide an American context, and American values in their output, which is something we may not want for our children. Although current models may suffice for the time being, GPT-NL may provide a valuable alternative in this segment when it’s available. “We can’t really predict this, but we have seen with ChatGPT the power of AI and how it can elicit a wide variety of commercial and public applications,” Joris concludes.
The importance of a collaborative approach
Early in the process, HSD invited TNO to the project, which was critical to getting the GPT-NL project to where it is today. TNO has widespread knowledge of AI from the large variety of industries in which it is active. Saskia Lensink played an important role in getting the most advanced experts from relevant disciplines and organisations involved in GPT-NL. Together with HSD and the NL AIC, Saskia managed to attract top-AI experts from 20 partners to work towards a larger project named Nederlandse AI voor het Nederlands, or NAIN in short. NAIN aimed at a National Growth Fund opportunity preceding the GPT-NL project. After this funding opportunity vanished for reasons beyond the influence of the project partners, the experts have kept in touch.
“This is highly valuable for the GPT-NL project,” says Saskia. “Some people have expressed concerns that €13,5 million is not enough to build a Dutch LLM and compared to the sometimes billions of dollars going into commercial LLMs, that amount may seem a tad conservative. But several conditions add up to make this a realistic investment for the first basic LLM structure we have defined for this project. That network of AI experts from 20 partners is a big one, and the availability of a large variety of data sets, partly from these experts and their organisations, is another. Computational power and experts to tailor it to our needs as well as the lack of commercial requirements, also push the costs down. And we are connecting with European partners to learn from their process and experiences, such as Sweden, which has built their GPT-SW3-model,” Saskia concludes.
Planning GPT-NL project
The GPT-NL project will work in 2 phases. Phase 1, the first year, focuses on the concrete development of the Dutch language model. The academic sector is actively involved in this. The second phase is that of exploitation. For this purpose, the programme will connect to the National Supercomputer (Snellius), which provides the computing power needed to make the model work.
Also read the call in 'Trouw' from thought leader and Tech expert Ilyaz Nasrullah: “Donate your data for the development of GPT.NL”.