sourCEntral - mobile manpages

pdf

nkf

NAME

nkf − Network Kanji Filter

SYNOPSIS

  nkf B<[-butjnesliohrTVvwWJESZxXFfmMBOcdILg]> B<[>I<file ...>B<]>

DESCRIPTION

Nkf is a yet another kanji code converter among networks, hosts and terminals. It converts input kanji code to designated kanji code such as ISO−2022−JP , Shift_JIS, EUC−JP , UTF−8 or UTF−16 .

One of the most unique faculty of nkf is the guess of the input kanji encodings. It currently recognizes ISO−2022−JP , Shift_JIS, EUC−JP , UTF−8 and UTF−16 . So users needn’t set the input kanji code explicitly.

By default, X0201 kana is converted into X0208 kana. For X0201 kana, SO/SI , SSO and ESC− (−I methods are supported. For automatic code detection, nkf assumes no X0201 kana in Shift_JIS. To accept X0201 in Shift_JIS, use −X, −x or −S.

OPTIONS

−b −u

Output is buffered ( DEFAULT ), Output is unbuffered.

−j −s −e −w −w16

Output code is ISO−2022−JP (7bit JIS ), Shift_JIS, EUC−JP , UTF−8N , UTF−16BE . Without this option and compile option, ISO−2022−JP is assumed.

−J −S −E −W −W16

Input assumption is JIS 7 bit, Shift_JIS, EUC−JP , UTF−8 , UTF−16LE .

−J

Assume JIS input. It also accepts EUC−JP . This is the default. This flag does not exclude Shift_JIS.

−S

Assume Shift_JIS and X0201 kana input. It also accepts JIS . EUC-JP is recognized as X0201 kana. Without −x flag, X0201 kana (halfwidth kana) is converted into X0208.

−E

Assume EUC-JP input. It also accepts JIS . Same as −J.

−t

No conversion.

−i[@B]

Specify the Esc Seq for JIS X 0208−1978/83. ( DEFAULT B)

−o[ BJH ]

Specify the Esc Seq for ASCII/Roman. ( DEFAULT B)

−r

{de/en}crypt ROT13/47

−h[123] −−hiragana −−katakana −−katakana−hiragana

−h1 −−hiragana

Katakana to Hiragana conversion.

−h2 −−katakana

Hiragana to Katakana conversion.

−h3 −−katakana−hiragana

Katakana to Hiragana and Hiragana to Katakana conversion.

−T

Text mode output ( MS−DOS )

−l

ISO8859−1 (Latin−1) support

−f[m [− n]]

Folding on m length with n margin in a line. Without this option, fold length is 60 and fold margin is 10.

−F

New line preserving line folding.

−Z[0−3]

Convert X0208 alphabet (Fullwidth Alphabets) to ASCII .
−Z −Z0

Convert X0208 alphabet to ASCII .

−Z1

Converts X0208 kankaku to single ASCII space.

−Z2

Converts X0208 kankaku to double ASCII spaces.

−Z3

Replacing Fullwidth >, <, ", & into ’&gt;’, ’&lt;’, ’&quot;’, ’&amp;’ as in HTML .

−X −x

Assume X0201 kana in MS−Kanji. With −X or without this option, X0201 is converted into X0208 Kana. With −x, try to preserve X0208 kana and do not convert X0201 kana to X0208. In JIS output, ESC− (−I is used. In EUC output, SSO is used.

−B[0−2]

Assume broken JIS-Kanji input, which lost ESC . Useful when your site is using old B−News Nihongo patch.

−B1

allows any char after ESC− ( or ESC−$ .

−B2

forces ASCII after NL .

−I

Replacing non iso−2022−jp char into a geta character (substitute character in Japanese).

−m[ BQN0 ]

MIME ISO−2022−JP/ISO8859−1 decode. ( DEFAULT ) To see ISO8859−1 (Latin−1) −l is necessary.

−mB

Decode MIME base64 encoded stream. Remove header or other part before conversion.

−mQ

Decode MIME quoted stream. ’_’ in quoted stream is converted to space.

−mN

Non-strict decoding. It allows line break in the middle of the base64 encoding.

−m0

No MIME decode.

−M

MIME encode. Header style. All ASCII code and control characters are intact.

−MB

MIME encode Base64 stream. Kanji conversion is performed before encoding, so this cannot be used as a picture encoder.

−MQ

Perfome quoted encoding.

−l

Input and output code is ISO8859−1 (Latin−1) and ISO−2022−JP . −s, −e and −x are not compatible with this option.

−L[uwm] −d −c

Convert line breaks.
−Lu −d

unix ( LF )

−Lw −c

windows ( CRLF )

−Lm

mac ( CR )

Without this option, nkf doesn’t convert line breaks.

−−fj −−unix −−mac −−msdos −−windows

convert for these system

−−jis −−euc −−sjis −−mime −−base64

convert for named code

−−jis−input −−euc−input −−sjis−input −−mime−input −−base64−input

assume input system

−−ic=input codeset −−oc=output codeset

Set the input or output codeset. NKF supports following codesets and those codeset name are case insensitive.
ISO−2022−JP

a.k.a. RFC1468 , 7bit JIS , JUNET

EUC-JP (eucJP−nkf)

a.k.a. AT&T JIS , Japanese EUC , UJIS

eucJP-ascii
eucJP-ms
CP51932

Microsoft Version of EUC−JP .

Shift_JIS

a.k.a. SJIS , MS-Kanji

CP932

a.k.a. Windows−31J

UTF−8

same as UTF−8N

UTF−8N

UTF−8 without BOM

UTF−8−BOM

UTF−8 with BOM

UTF−16

same as UTF−16BE

UTF−16BE

UTF−16 Big Endian without BOM

UTF−16BE−BOM

UTF−16 Big Endian with BOM

UTF−16LE

UTF−16 Little Endian without BOM

UTF−16LE−BOM

UTF−16 Little Endian with BOM

UTF8−MAC (input only)

−−fb−{skip, html, xml, perl, java, subchar}

Specify the way that nkf handles unassigned characters. Without this option, −−fb−skip is assumed.

−−prefix=escape charactertarget character..

When nkf converts to Shift_JIS, nkf adds a specified escape character to specified 2nd byte of Shift_JIS characters. 1st byte of argument is the escape character and following bytes are target characters.

−−no−cp932ext

Handle the characters extended in CP932 as unassigned characters.

−−no−best−fit−chars

When Unicode to Encoded byte conversion, don’t convert characters which is not round trip safe. When Unicode to Unicode conversion, with this and −x option, nkf can be used as UTF converter. (In other words, without this and −x option, nkf doesn’t save some characters)

When nkf convert string which related to path, you should use this opion.

−−cap−input

Decode hex encoded characters.

−−url−input

Unescape percent escaped characters.

−−numchar−input

Decode character reference, such as "&#....;".

−−in−place[= SUFFIX ] −−overwrite[= SUFFIX ]

Overwrite original listed files by filtered result.

Note −−overwrite preserves timestamp of original files.

−−guess

Print guessed encoding.

−−help

Print nkf’s help.

−−version

Print nkf’s version.

−−

Ignore rest of −option.

AUTHOR

Copyright (C) 1987, FUJITSU LTD . (I.Ichikawa),2000 S. Kono, COW Copyright (C) 2002−2006 Kono, Furukawa, Naruse, mastodon

pdf