public class GtfInputStream
extends java.io.InputStream
FlatFileParser
and other tools for parsing the resulting stream. The first line in the
file is used a template line. The first 8 columns are fixed. The 9th column
contains attributes as key/value pairs, which are converted to additional
columns in the output. The GTF specification require that gene_id
and transcript_id
are present, which means that the output will
contain at least 10 columns. Subsequent lines are parsed in the same way and
attributes are lined up with the first line. Note that any attributes
that are not present in the first line are skipped. The parser also has an
option to skip lines with a transcript_id+seqname
that is not unique.
Normally, a GTF file will contain multiple entries with the same id:s, but
in most cases we are not interested in this when importing data to BASE.
This option also remove the feature, start, end, score, strand and frame
columns from the output. Lines that can't be split into at least 9 columns
(eg. comment lines starting with #) are ignored and forwarded without modification.Modifier and Type | Class and Description |
---|---|
(package private) static class |
GtfInputStream.Attribute |
Modifier and Type | Field and Description |
---|---|
private java.util.regex.Pattern |
ATTRIBUTE_PATTERN |
private GtfInputStream.Attribute[] |
attributes |
private byte[] |
buffer |
private java.nio.charset.Charset |
charset |
private int |
geneIdIndex |
private int |
index |
private int |
lineNum |
private java.io.InputStream |
master |
private java.io.BufferedReader |
reader |
private boolean |
skipRepeatedTranscriptIds |
private int |
transcriptIdIndex |
private java.util.Set<java.lang.String> |
transcriptIds |
Constructor and Description |
---|
GtfInputStream(java.io.InputStream master,
java.lang.String charset,
boolean skipRepeatedTranscriptIds)
Create a new input stream reading from the master.
|
Modifier and Type | Method and Description |
---|---|
private java.lang.StringBuffer |
appendLine(java.lang.StringBuffer sb,
java.lang.String[] columns,
GtfInputStream.Attribute[] attr)
Append columns to the buffer and separate each with a tab.
|
int |
available() |
void |
close() |
private java.lang.String[] |
getNextLine()
Read the next line from the GTF file and split on tab character.
|
int |
getNumLines()
Get the number of lines parsed so far.
|
int |
getNumUniqueTranscriptIds()
Get the number of unique transcript ids found so far.
|
boolean |
markSupported() |
private void |
parseAttributes(java.lang.String template)
Parse attributes from the given template string.
|
int |
read() |
int |
read(byte[] b) |
int |
read(byte[] b,
int off,
int len) |
private byte[] |
readMore()
Read more data from the GTF file.
|
void |
reset() |
private final java.io.InputStream master
private final java.io.BufferedReader reader
private final java.nio.charset.Charset charset
private final java.util.regex.Pattern ATTRIBUTE_PATTERN
private byte[] buffer
private int index
private int lineNum
private GtfInputStream.Attribute[] attributes
private int geneIdIndex
private int transcriptIdIndex
private final boolean skipRepeatedTranscriptIds
private final java.util.Set<java.lang.String> transcriptIds
public GtfInputStream(java.io.InputStream master, java.lang.String charset, boolean skipRepeatedTranscriptIds) throws java.io.IOException
master
- The master input streamcharset
- The character set used in the fileskipRepeatedTranscriptIds
- TRUE to skip lines with non-unique
values for transcript_id+seqnamejava.io.IOException
public int read() throws java.io.IOException
read
in class java.io.InputStream
java.io.IOException
public int read(byte[] b) throws java.io.IOException
read
in class java.io.InputStream
java.io.IOException
public int read(byte[] b, int off, int len) throws java.io.IOException
read
in class java.io.InputStream
java.io.IOException
public int available() throws java.io.IOException
available
in class java.io.InputStream
java.io.IOException
public void close() throws java.io.IOException
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
close
in class java.io.InputStream
java.io.IOException
public boolean markSupported()
markSupported
in class java.io.InputStream
public void reset() throws java.io.IOException
reset
in class java.io.InputStream
java.io.IOException
public int getNumLines()
public int getNumUniqueTranscriptIds()
private byte[] readMore() throws java.io.IOException
java.io.IOException
private java.lang.String[] getNextLine() throws java.io.IOException
java.io.IOException
private void parseAttributes(java.lang.String template) throws java.io.IOException
java.io.IOException
private java.lang.StringBuffer appendLine(java.lang.StringBuffer sb, java.lang.String[] columns, GtfInputStream.Attribute[] attr)
sb
- The buffer to append tocolumns
- The regular columns (must be at least 8)attr
- The attributes to add