Class RawText
Elements of the sequence are the lines of the file, as delimited by the UNIX newline character ('\n'). The file content is treated as 8 bit binary text, with no assumptions or requirements on character encoding.
Note that the first line of the file is element 0, as defined by the Sequence interface API. Traditionally in a text editor a patch file the first line is line number 1. Callers may need to subtract 1 prior to invoking methods if they are converting from "line number" to "element index".
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final AtomicIntegerNumber of bytes to check for heuristics inisBinary(byte[]).protected final byte[]The file content for this sequence.static final RawTextA RawText of length 0private static final intDefault and minimum forBUFFER_SIZE.protected final IntListMap of line number to starting position withincontent. -
Constructor Summary
ConstructorsConstructorDescriptionRawText(byte[] input) Create a new sequence from an existing content byte array.Create a new sequence from the existing content byte array and the line map indicating line boundaries.Create a new sequence from a file. -
Method Summary
Modifier and TypeMethodDescriptionprotected Stringdecode(int start, int end) Decode a region of the text into a String.static intObtains the buffer size to use for analyzing whether certain content is text or binary, or what line endings are used if it's text.private intgetEnd(int i) Get the line delimiter for the first line.byte[]getRawString(int i) Get the raw text for a single line.private intgetStart(int i) getString(int i) Get the text for a single line.getString(int begin, int end, boolean dropLF) Get the text for a region of lines.static booleanisBinary(byte[] raw) Determine heuristically whether a byte array represents binary (as opposed to text) content.static booleanisBinary(byte[] raw, int length) Determine heuristically whether a byte array represents binary (as opposed to text) content.static booleanisBinary(byte[] raw, int length, boolean complete) Determine heuristically whether a byte array represents binary (as opposed to text) content.static booleanisBinary(byte curr, byte prev) Determines from the last two bytes read from a source if it looks like binary content.static booleanisBinary(InputStream raw) Determine heuristically whether the bytes contained in a stream represents binary (as opposed to text) content.static booleanisCrLfText(byte[] raw) Determine heuristically whether a byte array represents text content using CR-LF as line separator.static booleanisCrLfText(byte[] raw, int length) Determine heuristically whether a byte array represents text content using CR-LF as line separator.static booleanisCrLfText(byte[] raw, int length, boolean complete) Determine heuristically whether a byte array represents text content using CR-LF as line separator.static booleanisCrLfText(InputStream raw) Determine heuristically whether the bytes contained in a stream represent text content using CR-LF as line separator.booleanDetermine if the file ends with a LF ('\n').static RawTextload(ObjectLoader ldr, int threshold) Read a blob object into RawText, or throw BinaryBlobException if the blob is binary.static intsetBufferSize(int bufferSize) Sets the buffer size to use for analyzing whether certain content is text or binary, or what line endings are used if it's text.intsize()Get sizevoidwriteLine(OutputStream out, int i) Write a specific line to the output stream, without its trailing LF.
-
Field Details
-
EMPTY_TEXT
A RawText of length 0 -
FIRST_FEW_BYTES
private static final int FIRST_FEW_BYTESDefault and minimum forBUFFER_SIZE.- See Also:
-
BUFFER_SIZE
Number of bytes to check for heuristics inisBinary(byte[]). -
protected final byte[] contentThe file content for this sequence.
-
lines
Map of line number to starting position withincontent.
-
-
Constructor Details
-
RawText
public RawText(byte[] input) Create a new sequence from an existing content byte array.The entire array (indexes 0 through length-1) is used as the content.
- Parameters:
input- the content array. The object retains a reference to this array, so it should be immutable.
-
RawText
Create a new sequence from the existing content byte array and the line map indicating line boundaries.- Parameters:
input- the content array. The object retains a reference to this array, so it should be immutable.lineMap- an array with 1-based offsets for the start of each line. The first and last entries should beInteger.MIN_VALUEand an offset one past the end of the last line, respectively.- Since:
- 5.0
-
RawText
Create a new sequence from a file.The entire file contents are used.
- Parameters:
file- the text file.- Throws:
IOException- if Exceptions occur while reading the file
-
-
Method Details
-
getRawContent
public byte[] getRawContent()- Returns:
- the raw, unprocessed content read.
- Since:
- 4.11
-
size
public int size()Get size -
writeLine
Write a specific line to the output stream, without its trailing LF.The specified line is copied as-is, with no character encoding translation performed.
If the specified line ends with an LF ('\n'), the LF is not copied. It is up to the caller to write the LF, if desired, between output lines.
- Parameters:
out- stream to copy the line data onto.i- index of the line to extract. Note this is 0-based, so line number 1 is actually index 0.- Throws:
IOException- the stream write operation failed.
-
isMissingNewlineAtEnd
public boolean isMissingNewlineAtEnd()Determine if the file ends with a LF ('\n').- Returns:
- true if the last line has an LF; false otherwise.
-
getString
Get the text for a single line.- Parameters:
i- index of the line to extract. Note this is 0-based, so line number 1 is actually index 0.- Returns:
- the text for the line, without a trailing LF.
-
getRawString
Get the raw text for a single line.- Parameters:
i- index of the line to extract. Note this is 0-based, so line number 1 is actually index 0.- Returns:
- the text for the line, without a trailing LF, as a
ByteBufferthat is backed by a slice of theraw content, with the buffer's position on the start of the line and the limit at the end. - Since:
- 5.12
-
getString
Get the text for a region of lines.- Parameters:
begin- index of the first line to extract. Note this is 0-based, so line number 1 is actually index 0.end- index of one past the last line to extract.dropLF- if true the trailing LF ('\n') of the last returned line is dropped, if present.- Returns:
- the text for lines
[begin, end).
-
decode
Decode a region of the text into a String. The default implementation of this method tries to guess the character set by considering UTF-8, the platform default, and falling back on ISO-8859-1 if neither of those can correctly decode the region given.- Parameters:
start- first byte of the content to decode.end- one past the last byte of the content to decode.- Returns:
- the region
[start, end)decoded as a String.
-
getStart
private int getStart(int i) -
getEnd
private int getEnd(int i) -
getBufferSize
public static int getBufferSize()Obtains the buffer size to use for analyzing whether certain content is text or binary, or what line endings are used if it's text.- Returns:
- the buffer size, by default
FIRST_FEW_BYTESbytes - Since:
- 6.0
-
setBufferSize
public static int setBufferSize(int bufferSize) Sets the buffer size to use for analyzing whether certain content is text or binary, or what line endings are used if it's text. If the givenbufferSizeis smaller thanFIRST_FEW_BYTESset the buffer size toFIRST_FEW_BYTES.- Parameters:
bufferSize- Size to set- Returns:
- the size actually set
- Since:
- 6.0
-
isBinary
Determine heuristically whether the bytes contained in a stream represents binary (as opposed to text) content. Note: Do not further use this stream after having called this method! The stream may not be fully read and will be left at an unknown position after consuming an unknown number of bytes. The caller is responsible for closing the stream.- Parameters:
raw- input stream containing the raw file content.- Returns:
- true if raw is likely to be a binary file, false otherwise
- Throws:
IOException- if input stream could not be read
-
isBinary
public static boolean isBinary(byte[] raw) Determine heuristically whether a byte array represents binary (as opposed to text) content.- Parameters:
raw- the raw file content.- Returns:
- true if raw is likely to be a binary file, false otherwise
-
isBinary
public static boolean isBinary(byte[] raw, int length) Determine heuristically whether a byte array represents binary (as opposed to text) content.- Parameters:
raw- the raw file content.length- number of bytes inrawto evaluate. This should beraw.lengthunlessrawwas over-allocated by the caller.- Returns:
- true if raw is likely to be a binary file, false otherwise
-
isBinary
public static boolean isBinary(byte[] raw, int length, boolean complete) Determine heuristically whether a byte array represents binary (as opposed to text) content.- Parameters:
raw- the raw file content.length- number of bytes inrawto evaluate. This should beraw.lengthunlessrawwas over-allocated by the caller.complete- whetherrawcontains the whole data- Returns:
- true if raw is likely to be a binary file, false otherwise
- Since:
- 6.0
-
isBinary
public static boolean isBinary(byte curr, byte prev) Determines from the last two bytes read from a source if it looks like binary content.- Parameters:
curr- the last byte, read afterprevprev- the previous byte, read beforelast- Returns:
trueif either byte is NUL, or if prev is CR and curr is not LF,falseotherwise- Since:
- 6.0
-
isCrLfText
public static boolean isCrLfText(byte[] raw) Determine heuristically whether a byte array represents text content using CR-LF as line separator.- Parameters:
raw- the raw file content.- Returns:
trueif raw is likely to be CR-LF delimited text,falseotherwise- Since:
- 5.3
-
isCrLfText
Determine heuristically whether the bytes contained in a stream represent text content using CR-LF as line separator. Note: Do not further use this stream after having called this method! The stream may not be fully read and will be left at an unknown position after consuming an unknown number of bytes. The caller is responsible for closing the stream.- Parameters:
raw- input stream containing the raw file content.- Returns:
trueif raw is likely to be CR-LF delimited text,falseotherwise- Throws:
IOException- if input stream could not be read- Since:
- 5.3
-
isCrLfText
public static boolean isCrLfText(byte[] raw, int length) Determine heuristically whether a byte array represents text content using CR-LF as line separator.- Parameters:
raw- the raw file content.length- number of bytes inrawto evaluate.- Returns:
trueif raw is likely to be CR-LF delimited text,falseotherwise- Since:
- 5.3
-
isCrLfText
public static boolean isCrLfText(byte[] raw, int length, boolean complete) Determine heuristically whether a byte array represents text content using CR-LF as line separator.- Parameters:
raw- the raw file content.length- number of bytes inrawto evaluate.complete- whetherrawcontains the whole data- Returns:
trueif raw is likely to be CR-LF delimited text,falseotherwise- Since:
- 6.0
-
getLineDelimiter
Get the line delimiter for the first line.- Returns:
- the line delimiter or
null - Since:
- 2.0
-
load
Read a blob object into RawText, or throw BinaryBlobException if the blob is binary.- Parameters:
ldr- the ObjectLoader for the blobthreshold- if the blob is larger than this size, it is always assumed to be binary.- Returns:
- the RawText representing the blob.
- Throws:
BinaryBlobException- if the blob contains binary data.IOException- if the input could not be read.- Since:
- 4.10
-