最近发现一些老电影的srt字幕是gb2312编码的。如果在非中文系统上播放的话,就会变成乱码。但是字幕文件又特别多(>100),不适合手工转换。
在网上Search了一下发现用Notepad++的Python插件可以完美的实现。
实现步骤其实也很简单。打开Notepad++,打开Plugins-Plugins Admin。安装Python Script.
安装好以后在Python Script里面选择New Script,取个名字以后,粘贴入下面的代码:
import os;
import sys;
filePathSrc="f:\\Temp\\UTF8"
for root, dirs, files in os.walk(filePathSrc):
for fn in files:
if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico':
notepad.open(root + "\\" + fn)
console.write(root + "\\" + fn + "\r\n")
#Does not work --> notepad.runMenuCommand("Encoding", "Character sets", "Chinese", "GB2312 (Simplified)")
notepad.menuCommand(MENUCOMMAND.FORMAT_GB2312)
# notepad.runMenuCommand("Encoding", "Convert to UTF-8-BOM")
notepad.menuCommand(MENUCOMMAND.FORMAT_CONV2_UTF_8)
# Reference: https://github.com/bruderstein/PythonScript/blob/master/PythonScript/src/NotepadPython.cpp
notepad.save()
notepad.close()
其中filePathSrc可以改成你想要转换的文件的路径。
保存以后,运行。就可以将目录中的所有文件由gb2312转换为utf8。对了,还有就是路径里面不能有中文字符。
Ref: https://pw999.wordpress.com/2013/08/19/mass-convert-a-project-to-utf-8-using-notepad/