Wednesday, October 14, 2015

Unzipping archives with special character named files in JDK1.6 and earlier version



When a zip file with file names in accent characters of French had to be unzipped using a standard java.util.zip API of JDK 1.6, started getting below error to my surprise. The same program unzipped all the files successfully in Linux but in Windows, this was the issue.

[unzip] java.io.IOException: Stream closed
[unzip] at java.io.BufferedInputStream.getInIfOpen(BufferedInputStream.java:134)
[unzip] at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
[unzip] at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
[unzip] at java.security.DigestInputStream.read(DigestInputStream.java:144)

Investigating further on this, opened up new arena of encoding mystery of archives, specification, JDK bug and archiving tools.

Several archive tools including latest version of Winzip (19.x), 7z, WinRar, and Truezip provides encoding of the file names with UTF-8. However JDK 1.6 version fails to convert the Unicode to platform encoding while unzipping them. This puts us in serious trouble as we cannot upgrade to the next version of Java quickly and we have no tools that work in compatibility with Java.

Finally, found a rescuer in JAR utility that can successfully help us here. The Jar files portable across different platforms and different locale environments, seems like supporting the encoding of the entries in the file within zip itself. So used below jar command to zip the required files and this zip gets opened by JDK 1.6 with no hassles 

jar -cvf filename.zip folder1 folder2

This makes Java to understand the file encoding at ZipEntry level. This methodology breaks the whole purpose of compressing the files but it worked for my usecase where files compression and size was not a constraint. We can use this solution in Windows environment where Java with JDK 1.6 is failing to convert the encoding to native format. 


Cool quick references : 

https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/

https://bugs.openjdk.java.net/browse/JDK-4244499

http://www.siao2.com/2008/05/13/8498184.aspx

No comments:

Post a Comment