loading page

Transcoding Unicode Characters with AVX-512 Instructions
  • Daniel Lemire,
  • Robert Clausecker
Daniel Lemire
Universite TELUQ

Corresponding Author:daniel.lemire@teluq.ca

Author Profile
Robert Clausecker
Konrad Zuse Zentrum fur Informationstechnik Berlin
Author Profile

Abstract

Intel includes on its recent processors a powerful set of instructions capable of processing 512-bit registers with a single instruction (AVX-512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF-8 and UTF-16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF-8 to UTF-16 at more than 5 GiB s − 1 using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open source library.
09 Dec 2022Submitted to Software: Practice and Experience
09 Dec 2022Submission Checks Completed
09 Dec 2022Assigned to Editor
22 Dec 2022Review(s) Completed, Editorial Evaluation Pending
08 Feb 2023Reviewer(s) Assigned
20 Apr 2023Editorial Decision: Revise Major
23 May 20231st Revision Received
29 May 2023Submission Checks Completed
29 May 2023Assigned to Editor
29 May 2023Review(s) Completed, Editorial Evaluation Pending
29 May 2023Reviewer(s) Assigned
19 Jun 2023Editorial Decision: Revise Major
11 Jul 20232nd Revision Received
12 Jul 2023Submission Checks Completed
12 Jul 2023Assigned to Editor
12 Jul 2023Review(s) Completed, Editorial Evaluation Pending
17 Jul 2023Reviewer(s) Assigned
04 Aug 2023Editorial Decision: Accept