与你分享
互联网的方方面面

macOS系统使用iconv命令进行文件编码格式转换

今天收到同事发来的一个txt文件,打开之后发现中文乱码,老王首先想到的是文件编码格式的问题,毕竟同事用的是Windows电脑,默认会以GBK编码,而老王的办公电脑是macOS,默认会以UTF-8编码。

本来打算使用vs code修改的,但是怎么调整都不正确。从网上搜索了一下,发现macOS上面可以直接使用iconv命令调整文件编码,本文分享一下。

一、iconv命令

以下是iconv命令的帮助信息:

NAME
       iconv - character set conversion

SYNOPSIS
       iconv [OPTION...] [-f encoding] [-t encoding] [inputfile ...]
       iconv -l

DESCRIPTION
       The  iconv  program converts text from one encoding to another encoding.  More precisely, it converts from
       the encoding given for the -f option to the encoding given for the -t option. Either  of  these  encodings
       defaults  to  the encoding of the current locale. All the inputfiles are read and converted in turn; if no
       inputfile is given, the standard input is used. The converted text is printed to standard output.

       The encodings permitted are system dependent. For the libiconv implementation,  they  are  listed  in  the
       iconv_open(3) manual page.

       Options controlling the input and output format:

       -f encoding, --from-code=encoding
              Specifies the encoding of the input.

       -t encoding, --to-code=encoding
              Specifies the encoding of the output.

       Options controlling conversion problems:

       -c     When  this  option is given, characters that cannot be converted are silently discarded, instead of
              leading to a conversion error.

       --unicode-subst=formatstring
              When this option is given, Unicode characters that cannot be represented in the target encoding are
              replaced  with a placeholder string that is constructed from the given formatstring, applied to the
              Unicode code point. The formatstring must be a format string in the same format as for  the  printf
              command  or  the printf() function, taking either no argument or exactly one unsigned integer argu-
              ment.

       --byte-subst=formatstring
              When this option is given, bytes in the input that  are  not  valid  in  the  source  encoding  are
              replaced  with a placeholder string that is constructed from the given formatstring, applied to the
              byte's value. The formatstring must be a format string in the same format as for the printf command
              or the printf() function, taking either no argument or exactly one unsigned integer argument.

       --widechar-subst=formatstring
              When  this  option is given, wide characters in the input that are not valid in the source encoding
              are replaced with a placeholder string that is constructed from the given formatstring, applied  to
              the  byte's  value.  The  formatstring must be a format string in the same format as for the printf
              command or the printf() function, taking either no argument or exactly one unsigned  integer  argu-
              ment.

       Options controlling error output:

       -s, --silent
              When  this  option  is given, error messages about invalid or unconvertible characters are omitted,
              but the actual converted text is unaffected.

       The iconv -l or iconv --list command lists the names of the supported encodings,  in  a  system  dependent
       format. For the libiconv implementation, the names are printed in upper case, separated by whitespace, and
       alias names of an encoding are listed on the same line as the encoding itself.

EXAMPLES
       iconv -f ISO-8859-1 -t UTF-8
              converts input from the old West-European encoding ISO-8859-1 to Unicode.

       iconv -f KOI8-R --byte-subst="<0x%x>"
                       --unicode-subst="<U+%04X>"
              converts input from the old Russian encoding KOI8-R to the locale encoding, substituting  an  angle
              bracket notation with hexadecimal numbers for invalid bytes and for valid but unconvertible charac-
              ters.

       iconv --list
              lists the supported encodings.

SEE ALSO
       iconv_open(3)

二、单个文件转码

// iconv -f 原编码 -t 新编码 原文件路径 > 新文件路径
iconv -c -f GBK -t UTF-8 上品寒士.txt > 上品寒士UTF8.txt

三、多个文件批量转码

find *.txt -exec sh -c "iconv -c -f GB18030 -t UTF8 {} > {}.txt" \;

参考:

  1. https://www.jianshu.com/p/f379f6a8e3ed
  2. https://stackoverflow.com/questions/29922866/why-iconv-cannot-convert-from-utf-8-to-iso-8859-1
赞(0)
版权声明:本文采用知识共享 署名4.0国际许可协议 [BY-NC-SA] 进行授权
文章名称:《macOS系统使用iconv命令进行文件编码格式转换》
文章链接:https://wph.im/150.html
本站资源仅供个人学习交流,请于下载后24小时内删除,不允许用于商业用途,否则法律问题自行承担。