
Josh McGiff. Image: University of Limerick
University of Limerick’s Josh McGiff wants to build generative AI tools like ChatGPT to protect the integrity of the Irish language.
Do you know what artificial intelligence is as Gaeilge?
There has been plenty of chatter over the last few years about AI and specifically generative AI (GenAI) tools such as ChatGPT, which brought it to mainstream, competitors such as Anthropic’s Claude and complete disrupters such as DeepSeek’s R1.
But as we approach Seachtain na Gaeilge, the biggest Irish language festival in the world, SiliconRepublic.com wanted to speak to a researcher who is working on ways of preserving the Irish language.
Josh McGiff is a PhD researcher in AI and a lecturer in immersive software engineering at University of Limerick (UL).
Making technology accessible is a key mission for McGiff as he told SiliconRepublic.com that the heavy technical information at the beginning of computer science courses can often crush the dreams of students early on.
“My lectures have involved turning an entire building into a murder mystery scene where students had to solve my murder using knowledge from the lecture, a database and coding-related clues hidden around the building; incorporating a treasure hunt into a web development lecture; doing a class in virtual reality; and getting students to build a video game in a week,” he said.
“I am now gearing up to do some exciting guest lectures for another course that will hopefully push the boat out in terms of AI education.”
‘I believe that Gaeilgeoirí should have the right to use this kind of technology in their own language’
Outside of his passion for teaching, McGiff grew up surrounded by the Irish language, having spent summers with his mother’s family in west Kerry and attending Gaelcholáiste Luimnigh, an all-Irish post-primary school.
“Not only am I part of gaming community that plays games as Gaeilge, but I have also spent the last two years working on my own Irish-language indie game set in Ireland – I am super excited about this.”
During his undergraduate degree in UL, McGiff used machine learning to build a ‘homophobia detection system’, giving him a strong foundation to continue his research in the field of AI.
“ChatGPT had just come out around this time and I remember testing its ability to produce text in Gaeilge. I found that it was a poor representation of the language with many inconsistencies,” he said.
“This, combined with the invaluable backing and guidance from my supervisor Dr Nikola Nikolov, prompted my research application to the Research Ireland Centre for Research Training in Artificial Intelligence.”
Preserving the Irish language with AI
McGiff’s research is centred around building GenAI tools like ChatGPT for the Irish language.
“I felt that existing ‘one-size-fits-all’ approaches to modelling the Irish language disregarded the various dialects that are fundamental to its identity,” he said.
“I realised that building a chatbot for the Irish language could be a powerful form of preservation. All the intricacies and features of the Irish language could be encapsulated in a model (AI algorithm), effectively protecting it from disappearing.”
One of more than 60 official minority languages recognised by the European Union, Irish has been classified as “definitely endangered” by UNESCO. While most language technologies focus on widely used languages such as English and Chinese, Irish has limited digital support.
This is one of the key challenges McGiff faces in his research. With a lack of available data on the web, low-resourced languages such as Irish do not tend to be well captured. But McGiff said he is determined to help Gaeilgeoirí thrive in the digital age.
Along with a lack of data, computational power is another issue that comes with building AI tools, like the one McGiff wants to create. While major tech companies have access to machines that allow them to train state-of-the-art models, researchers like McGiff are limited by the equipment they have access to – usually within the university they work from.
“However, restrictions like these are enabling researchers to explore more environmentally friendly methods of creating these tools,” he said.
Tackling the data challenge
In terms of sourcing data in the Irish language, McGiff said there has been some incredible work done to create public datasets for the language. In 2021, two technology projects received more than €350,000 in Government funding to help prevent the “digital extinction” of the Irish language.
However, the combination of the existing sources still only amounts to a fraction of the overall data required to build an AI model that could be used to make a service similar to ChatGPT.
“I have reached out to many organisations and have had some success in developing a stronger dataset,” said McGiff. “Part of the challenge is tackling fears associated with AI tools themselves. With countries like the UK exploring the use of AI chatbots in public services, propelling Gaeilge firmly into the digital age is essential to prevent further language inequality.”
While he continues to explore every avenue to acquire authentic Irish data, McGiff is also developing a variety of algorithms for creating synthetic data. This involves taking sentences and applying a number of transformations to create new data.
“In addition to this, there is some research to suggest that building models on related languages can equate to a major boost in data. As a result, I am building a model using other Goidelic languages like Scottish Gaelic and Manx,” he said. “A combination of all these augmentation techniques should help to overcome the challenge of limited data for this research.”
Bringing Irish into the digital age
McGiff said he’s concerned that, since AI models like ChatGPT are dominating as tools across the board, not including Irish properly could lock Gaeilgeoirí out of many services.
“I believe that Gaeilgeoirí should have the right to use this kind of technology in their own language. Otherwise, Irish speakers could be excluded from these productivity-enhancing tools,” he added.
“Moreover, existing AI chatbots have not accurately modelled the language. If applications are powered by these existing AI tools, then the inaccuracies could erode the Irish language over time.”
McGiff also said that building an AI model to accurately represent Irish will not only preserve it, but could help to grow it as a language.
In practical terms, this could mean Irish speakers could engage with more digital services in their own languages, it could allow businesses to integrate AI tools for Irish into their products and TV shows and video games could use AI tools to localise content for Gaelige.
“Overall, this research could bridge the gap in linguistic equality, empowering Irish speakers to use their language seamlessly in daily life and reducing reliance on English,” he said.
“An AI model for the Irish language, built by Gaeilgeoirí, will be the key to empowering its speakers and propelling the language into the digital age.”
Oh, and in case you’re as curious as I was, the Irish for artificial intelligence is ‘intleacht shaorga’.
Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.