Library idna

Library methods for handling IDNA domains.

Internationalized Domain Names (IDNs) follow a mechanism to process Internationalizing Domain Names in Applications (IDNA) for handling characters outside the ASCII repertoire in a standard fashion. IDNs use characters drawn from a large repertoire (Unicode), but IDNA allows the non-ASCII characters to be represented using only the ASCII characters already allowed in so-called host names today. This backward-compatible representation is required in existing protocols like DNS, so that IDNs can be introduced with no changes to the existing infrastructure. IDNA is only meant for processing domain names, not free text.

Client software, such as browsers and emailers, faces a difficult transition from the version of international domain names approved in 2003 (IDNA2003), to the revision approved in 2010 (IDNA2008). The following functions allows the developer and end user to access domains that are valid under either system but the default conversion is set to IDNA2008.

IDNA specification solves the problem of extending the repertoire of characters that can be used in domain names to include the Unicode repertoire (with some restrictions).

Applications can use IDNA to support internationalized domain names anywhere that ASCII domain names are already supported, including DNS master files and resolver interfaces. The IDNA protocol is contained completely within applications. It is not a client-server or peer-to-peer protocol: everything is done inside the application itself. When used with a DNS resolver library, IDNA is inserted as a "shim" between the application and the resolver library. When used for writing names into a DNS zone, IDNA is used just before the name is committed to the zone.

References:

TODO: Add support for mapping right to left scripts for IDNA library. References:

Author:

  • Rewanth Cool

Copyright © Same as Nmap--See https://nmap.org/book/man-legal.html

Source: https://svn.nmap.org/nmap/nselib/idna.lua

Functions

map (decoded_tbl, useSTD3ASCIIRules, transitionalProcessing, viewDisallowedCodePoints)

Maps the codepoints of the input to their respective codepoints based on the latest IDNA version mapping.

toASCII (codepoints, transitionalProcessing, checkHyphens, checkBidi, checkJoiners, useSTD3ASCIIRules, tbl)

Converts the input codepoints into ASCII text based on IDNA rules.

toUnicode (codepoints, transitionalProcessing, checkHyphens, checkBidi, checkJoiners, useSTD3ASCIIRules)

Converts the input into Unicode codepoints based on IDNA rules.

validate (tableOfTables, checkHyphens)

Validate the input based on IDNA codepoints validation rules.

Functions

map (decoded_tbl, useSTD3ASCIIRules, transitionalProcessing, viewDisallowedCodePoints)

Maps the codepoints of the input to their respective codepoints based on the latest IDNA version mapping.

Parameters

decoded_tbl
Table of Unicode decoded codepoints.
useSTD3ASCIIRules
Boolean value to set the mapping according to IDNA2003 rules. useSTD3ASCIIRules=true refers to IDNA2008. useSTD3ASCIIRules=false refers to IDNA2003.
transitionalProcessing
Processing option to handle deviation codepoints. transitionalProcessing=true maps deviation codepoints to the input. transitionalProcessing=false maintains original input.
viewDisallowedCodePoints
Boolean value to see the list of disallowed codepoints.

Return value:

Returns table with the list of mapped codepoints.
toASCII (codepoints, transitionalProcessing, checkHyphens, checkBidi, checkJoiners, useSTD3ASCIIRules, tbl)

Converts the input codepoints into ASCII text based on IDNA rules.

Parameters

codepoints
Table of codepoints of decoded input.
transitionalProcessing
Boolean value. Default: true.
checkHyphens
Boolean flag for checking hyphens presence in input. Default: true.
checkBidi
Boolean flag to represent if the input is of Bidi type. Default: false.
checkJoiners
Boolean flag to check for ContextJ rules in input. Default: false.
useSTD3ASCIIRules
Boolean value to represent ASCII rules. Default: true.
tbl
Table of optional params.

Return values:

  1. Returns the IDNA ASCII format of the input.
  2. Throws nil, if there is any error in conversion.
toUnicode (codepoints, transitionalProcessing, checkHyphens, checkBidi, checkJoiners, useSTD3ASCIIRules)

Converts the input into Unicode codepoints based on IDNA rules.

Note that the input should already be a table of Unicode code points. If your input is an ASCII string, convert it by using unicode.decode with the unicode.utf8_dec decoder.

Parameters

codepoints
A domain name as a list of code points.
transitionalProcessing
Boolean value. Default: true.
checkHyphens
Boolean flag for checking hyphens presence in input. Default: true.
checkBidi
Boolean flag to represent if the input is of Bidi type. Default: false.
checkJoiners
Boolean flag to check for ContextJ rules in input. Default: false.
useSTD3ASCIIRules
Boolean value to represent ASCII rules. Default: true.

Return values:

  1. Returns the Unicode format of the input based on IDNA rules.
  2. Throws nil, if there is any error in conversion.
validate (tableOfTables, checkHyphens)

Validate the input based on IDNA codepoints validation rules.

Parameters

tableOfTables
Table of codepoints of the splitted input.
checkHyphens
Boolean flag checks for 0x002D in unusual places.