HPLT: High Performance Language Technologies
Mikko Aulamo, Nikolay Bogoychev, Shaoxiong Ji, Graeme Nail, Gema Ramírez-Sánchez, Jörg Tiedemann, Jelmer van der Linde, Jaume Zaragoza
Correct Metadata for
Abstract
We describe the High Performance Language Technologies project (HPLT), a 3-year EU-funded project started in September 2022. HPLT will build a space combining petabytes of natural language data with large-scale model training. It will derive monolingual and bilingual datasets from the Internet Archive and CommonCrawl and build efficient and solid machine translation (MT) as well as large language models (LLMs). HPLT aims at providing free, sustainable and reusable datasets, models and workflows at scale using high-performance computing (HPC).- Anthology ID:
- 2023.eamt-1.61
- Volume:
- Proceedings of the 24th Annual Conference of the European Association for Machine Translation
- Month:
- June
- Year:
- 2023
- Address:
- Tampere, Finland
- Editors:
- Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, Eva Vanmassenhove, Sergi Alvarez Vidal, Nora Aranberri, Mara Nunziatini, Carla Parra Escartín, Mikel Forcada, Maja Popovic, Carolina Scarton, Helena Moniz
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 517–518
- Language:
- URL:
- https://aclanthology.org/2023.eamt-1.61/
- DOI:
- Bibkey:
- Cite (ACL):
- Mikko Aulamo, Nikolay Bogoychev, Shaoxiong Ji, Graeme Nail, Gema Ramírez-Sánchez, Jörg Tiedemann, Jelmer van der Linde, and Jaume Zaragoza. 2023. HPLT: High Performance Language Technologies. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 517–518, Tampere, Finland. European Association for Machine Translation.
- Cite (Informal):
- HPLT: High Performance Language Technologies (Aulamo et al., EAMT 2023)
- Copy Citation:
- PDF:
- https://aclanthology.org/2023.eamt-1.61.pdf
Export citation
@inproceedings{aulamo-etal-2023-hplt,
title = "{HPLT}: High Performance Language Technologies",
author = {Aulamo, Mikko and
Bogoychev, Nikolay and
Ji, Shaoxiong and
Nail, Graeme and
Ram{\'i}rez-S{\'a}nchez, Gema and
Tiedemann, J{\"o}rg and
van der Linde, Jelmer and
Zaragoza, Jaume},
editor = "Nurminen, Mary and
Brenner, Judith and
Koponen, Maarit and
Latomaa, Sirkku and
Mikhailov, Mikhail and
Schierl, Frederike and
Ranasinghe, Tharindu and
Vanmassenhove, Eva and
Vidal, Sergi Alvarez and
Aranberri, Nora and
Nunziatini, Mara and
Escart{\'i}n, Carla Parra and
Forcada, Mikel and
Popovic, Maja and
Scarton, Carolina and
Moniz, Helena",
booktitle = "Proceedings of the 24th Annual Conference of the European Association for Machine Translation",
month = jun,
year = "2023",
address = "Tampere, Finland",
publisher = "European Association for Machine Translation",
url = "https://aclanthology.org/2023.eamt-1.61/",
pages = "517--518",
abstract = "We describe the High Performance Language Technologies project (HPLT), a 3-year EU-funded project started in September 2022. HPLT will build a space combining petabytes of natural language data with large-scale model training. It will derive monolingual and bilingual datasets from the Internet Archive and CommonCrawl and build efficient and solid machine translation (MT) as well as large language models (LLMs). HPLT aims at providing free, sustainable and reusable datasets, models and workflows at scale using high-performance computing (HPC)."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="aulamo-etal-2023-hplt">
<titleInfo>
<title>HPLT: High Performance Language Technologies</title>
</titleInfo>
<name type="personal">
<namePart type="given">Mikko</namePart>
<namePart type="family">Aulamo</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Nikolay</namePart>
<namePart type="family">Bogoychev</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Shaoxiong</namePart>
<namePart type="family">Ji</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Graeme</namePart>
<namePart type="family">Nail</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Gema</namePart>
<namePart type="family">Ramírez-Sánchez</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jörg</namePart>
<namePart type="family">Tiedemann</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jelmer</namePart>
<namePart type="family">van der Linde</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jaume</namePart>
<namePart type="family">Zaragoza</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2023-06</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the 24th Annual Conference of the European Association for Machine Translation</title>
</titleInfo>
<name type="personal">
<namePart type="given">Mary</namePart>
<namePart type="family">Nurminen</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Judith</namePart>
<namePart type="family">Brenner</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Maarit</namePart>
<namePart type="family">Koponen</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sirkku</namePart>
<namePart type="family">Latomaa</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mikhail</namePart>
<namePart type="family">Mikhailov</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Frederike</namePart>
<namePart type="family">Schierl</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tharindu</namePart>
<namePart type="family">Ranasinghe</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Eva</namePart>
<namePart type="family">Vanmassenhove</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sergi</namePart>
<namePart type="given">Alvarez</namePart>
<namePart type="family">Vidal</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Nora</namePart>
<namePart type="family">Aranberri</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mara</namePart>
<namePart type="family">Nunziatini</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Carla</namePart>
<namePart type="given">Parra</namePart>
<namePart type="family">Escartín</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mikel</namePart>
<namePart type="family">Forcada</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Maja</namePart>
<namePart type="family">Popovic</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Carolina</namePart>
<namePart type="family">Scarton</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Helena</namePart>
<namePart type="family">Moniz</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>European Association for Machine Translation</publisher>
<place>
<placeTerm type="text">Tampere, Finland</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
</relatedItem>
<abstract>We describe the High Performance Language Technologies project (HPLT), a 3-year EU-funded project started in September 2022. HPLT will build a space combining petabytes of natural language data with large-scale model training. It will derive monolingual and bilingual datasets from the Internet Archive and CommonCrawl and build efficient and solid machine translation (MT) as well as large language models (LLMs). HPLT aims at providing free, sustainable and reusable datasets, models and workflows at scale using high-performance computing (HPC).</abstract>
<identifier type="citekey">aulamo-etal-2023-hplt</identifier>
<location>
<url>https://aclanthology.org/2023.eamt-1.61/</url>
</location>
<part>
<date>2023-06</date>
<extent unit="page">
<start>517</start>
<end>518</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings %T HPLT: High Performance Language Technologies %A Aulamo, Mikko %A Bogoychev, Nikolay %A Ji, Shaoxiong %A Nail, Graeme %A Ramírez-Sánchez, Gema %A Tiedemann, Jörg %A van der Linde, Jelmer %A Zaragoza, Jaume %Y Nurminen, Mary %Y Brenner, Judith %Y Koponen, Maarit %Y Latomaa, Sirkku %Y Mikhailov, Mikhail %Y Schierl, Frederike %Y Ranasinghe, Tharindu %Y Vanmassenhove, Eva %Y Vidal, Sergi Alvarez %Y Aranberri, Nora %Y Nunziatini, Mara %Y Escartín, Carla Parra %Y Forcada, Mikel %Y Popovic, Maja %Y Scarton, Carolina %Y Moniz, Helena %S Proceedings of the 24th Annual Conference of the European Association for Machine Translation %D 2023 %8 June %I European Association for Machine Translation %C Tampere, Finland %F aulamo-etal-2023-hplt %X We describe the High Performance Language Technologies project (HPLT), a 3-year EU-funded project started in September 2022. HPLT will build a space combining petabytes of natural language data with large-scale model training. It will derive monolingual and bilingual datasets from the Internet Archive and CommonCrawl and build efficient and solid machine translation (MT) as well as large language models (LLMs). HPLT aims at providing free, sustainable and reusable datasets, models and workflows at scale using high-performance computing (HPC). %U https://aclanthology.org/2023.eamt-1.61/ %P 517-518
Markdown (Informal)
[HPLT: High Performance Language Technologies](https://aclanthology.org/2023.eamt-1.61/) (Aulamo et al., EAMT 2023)
- HPLT: High Performance Language Technologies (Aulamo et al., EAMT 2023)
ACL
- Mikko Aulamo, Nikolay Bogoychev, Shaoxiong Ji, Graeme Nail, Gema Ramírez-Sánchez, Jörg Tiedemann, Jelmer van der Linde, and Jaume Zaragoza. 2023. HPLT: High Performance Language Technologies. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 517–518, Tampere, Finland. European Association for Machine Translation.