Home page logo
/
Intro Reference Guide Book Install Guide
Download Changelog Zenmap GUI Docs
Bug Reports OS Detection Propaganda Related Projects
In the Movies In the News

Sponsors


Library unicode

Library methods for handling unicode strings.

Author:
Daniel Miller

Copyright© Same as Nmap--See http://nmap.org/book/man-legal.html

Source: http://nmap.org/svn/nselib/unicode.lua

Functions

cp437_dec (buf, pos)

Decodes a CP437 character

cp437_enc (cp)

Encode a Unicode code point to CP437

decode (buf, decoder, bigendian)

Decode a buffer containing Unicode data.

encode (list, encoder, bigendian)

Encode a list of Unicode code points

transcode (buf, decoder, encoder, bigendian_dec, bigendian_enc)

Transcode a string from one format to another

utf16_dec (buf, pos, bigendian)

Decodes a UTF-16 character.

utf16_enc (cp, bigendian)

Encode a Unicode code point to UTF-16. See RFC 2781.

utf16to8 (from)

Helper function for the common case of UTF-16 to UTF-8 transcoding, such as from a Windows/SMB unicode string to a printable ASCII (subset of UTF-8) string.

utf8_dec (buf, pos)

Decodes a UTF-8 character.

utf8_enc (cp)

Encode a Unicode code point to UTF-8. See RFC 3629.

utf8to16 (from)

Helper function for the common case of UTF-8 to UTF-16 transcoding, such as from a printable ASCII (subset of UTF-8) string to a Windows/SMB unicode string.



Functions

cp437_dec (buf, pos)

Decodes a CP437 character

Parameters

  • buf: A string containing the character
  • pos: The index in the string where the character begins

Return values:

  1. pos The index in the string where the character ended
  2. cp The code point of the character as a number
cp437_enc (cp)

Encode a Unicode code point to CP437

Returns nil if the code point cannot be found in CP437

Parameters

  • cp: The Unicode code point as a number

Return value:

A string containing the related CP437 character
decode (buf, decoder, bigendian)

Decode a buffer containing Unicode data.

Parameters

  • buf: The string/buffer to be decoded
  • decoder: A Unicode decoder function (such as utf8_dec)
  • bigendian: For encodings that care about byte-order (such as UTF-16), set this to true to force big-endian byte order. Default: false (little-endian)

Return value:

A list-table containing the code points as numbers
encode (list, encoder, bigendian)

Encode a list of Unicode code points

Parameters

  • list: A list-table of code points as numbers
  • encoder: A Unicode encoder function (such as utf8_enc)
  • bigendian: For encodings that care about byte-order (such as UTF-16), set this to true to force big-endian byte order. Default: false (little-endian)

Return value:

An encoded string
transcode (buf, decoder, encoder, bigendian_dec, bigendian_enc)

Transcode a string from one format to another

The string will be decoded and re-encoded in one pass. This saves some overhead vs simply passing the output of unicode.encode to unicode.decode.

Parameters

  • buf: The string/buffer to be transcoded
  • decoder: A Unicode decoder function (such as utf16_dec)
  • encoder: A Unicode encoder function (such as utf8_enc)
  • bigendian_dec: Set this to true to force big-endian decoding.
  • bigendian_enc: Set this to true to force big-endian encoding.

Return value:

An encoded string
utf16_dec (buf, pos, bigendian)

Decodes a UTF-16 character.

Does not check that the returned code point is a real character. Specifically, it can be fooled by out-of-order lead- and trail-surrogate characters.

Parameters

  • buf: A string containing the character
  • pos: The index in the string where the character begins
  • bigendian: Set this to true to encode big-endian UTF-16. Default is false (little-endian)

Return values:

  1. pos The index in the string where the character ended
  2. cp The code point of the character as a number
utf16_enc (cp, bigendian)

Encode a Unicode code point to UTF-16. See RFC 2781.

Windows OS prior to Windows 2000 only supports UCS-2, so beware using this function to encode code points above 0xFFFF.

Parameters

  • cp: The Unicode code point as a number
  • bigendian: Set this to true to encode big-endian UTF-16. Default is false (little-endian)

Return value:

A string containing the code point in UTF-16 encoding.
utf16to8 (from)

Helper function for the common case of UTF-16 to UTF-8 transcoding, such as from a Windows/SMB unicode string to a printable ASCII (subset of UTF-8) string.

Parameters

  • from: A string in UTF-16, little-endian

Return value:

The string in UTF-8
utf8_dec (buf, pos)

Decodes a UTF-8 character.

Does not check that the returned code point is a real character.

Parameters

  • buf: A string containing the character
  • pos: The index in the string where the character begins

Return values:

  1. pos The index in the string where the character ended or nil on error
  2. cp The code point of the character as a number, or an error string
utf8_enc (cp)

Encode a Unicode code point to UTF-8. See RFC 3629.

Does not check that cp is a real character; that is, doesn't exclude the surrogate range U+D800 - U+DFFF and a handful of others.

Parameters

  • cp: The Unicode code point as a number

Return value:

A string containing the code point in UTF-8 encoding.
utf8to16 (from)

Helper function for the common case of UTF-8 to UTF-16 transcoding, such as from a printable ASCII (subset of UTF-8) string to a Windows/SMB unicode string.

Parameters

  • from: A string in UTF-8

Return value:

The string in UTF-16, little-endian

Nmap Site Navigation

Intro Reference Guide Book Install Guide
Download Changelog Zenmap GUI Docs
Bug Reports OS Detection Propaganda Related Projects
In the Movies In the News
[ Nmap | Sec Tools | Mailing Lists | Site News | About/Contact | Advertising | Privacy ]
AlienVault