Package org.freebsd.file
Class FileEncoding
java.lang.Object
org.freebsd.file.FileEncoding
Tries to guess the encoding of the byte sequence.
Orignial code taken from https://github.com/file/file/blob/master/src/encoding.c
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate byte[]
from_ebcdic
(byte[] buf, int nbytes) getCode()
getType()
boolean
guessFileEncoding
(byte[] buf) Try to determine whether text is in some character code we can identify.private boolean
looks_ascii
(byte[] buf, int nbytes) private boolean
looks_extended
(byte[] buf, int nbytes) private boolean
looks_latin1
(byte[] buf, int nbytes) private int
looks_ucs16
(byte[] buf, int nbytes) private boolean
looks_utf7
(byte[] buf, int nbytes) protected int
looks_utf8
(byte[] buf, int nbytes) private boolean
looks_utf8_with_BOM
(byte[] buf, int nbytes) private int
unsignedByte
(byte value)
-
Field Details
-
type
-
code
-
code_mime
-
F
private static final byte F- See Also:
-
T
private static final byte T- See Also:
-
I
private static final byte I- See Also:
-
X
private static final byte X- See Also:
-
text_chars
private byte[] text_chars -
ebcdic_to_ascii
private static final char[] ebcdic_to_ascii -
ebcdic_1047_to_8859
private static final char[] ebcdic_1047_to_8859
-
-
Constructor Details
-
FileEncoding
public FileEncoding()
-
-
Method Details
-
getCodeMime
-
getType
-
getCode
-
guessFileEncoding
public boolean guessFileEncoding(byte[] buf) Try to determine whether text is in some character code we can identify. It also identifies EBCDIC by converting it to ISO-8859-1.- Returns:
- true if it could guess an encoding.
-
looks_ascii
private boolean looks_ascii(byte[] buf, int nbytes) -
looks_latin1
private boolean looks_latin1(byte[] buf, int nbytes) -
looks_extended
private boolean looks_extended(byte[] buf, int nbytes) -
looks_utf8
protected int looks_utf8(byte[] buf, int nbytes) -
looks_utf8_with_BOM
private boolean looks_utf8_with_BOM(byte[] buf, int nbytes) -
looks_utf7
private boolean looks_utf7(byte[] buf, int nbytes) -
looks_ucs16
private int looks_ucs16(byte[] buf, int nbytes) -
from_ebcdic
private byte[] from_ebcdic(byte[] buf, int nbytes) -
unsignedByte
private int unsignedByte(byte value)
-