public class GtfInputStream extends InputStream
FlatFileParser
and other tools for parsing the resulting stream. The first line in the
file is used a template line. The first 8 columns are fixed. The 9th column
contains attributes as key/value pairs, which are converted to additional
columns in the output. The GTF specification require that gene_id
and transcript_id
are present, which means that the output will
contain at least 10 columns. Subsequent lines are parsed in the same way and
attributes are lined up with the first line. Note that any attributes
that are not present in the first line are skipped. The parser also has an
option to skip lines with a transcript_id+seqname
that is not unique.
Normally, a GTF file will contain multiple entries with the same id:s, but
in most cases we are not interested in this when importing data to BASE.
This option also remove the feature, start, end, score, strand and frame
columns from the output. Lines that can't be split into at least 9 columns
(eg. comment lines starting with #) are ignored and forwarded without modification.Modifier and Type | Class and Description |
---|---|
(package private) static class |
GtfInputStream.Attribute |
Modifier and Type | Field and Description |
---|---|
private Pattern |
ATTRIBUTE_PATTERN |
private GtfInputStream.Attribute[] |
attributes |
private byte[] |
buffer |
private Charset |
charset |
private int |
geneIdIndex |
private int |
index |
private int |
lineNum |
private InputStream |
master |
private BufferedReader |
reader |
private boolean |
skipRepeatedTranscriptIds |
private int |
transcriptIdIndex |
private Set<String> |
transcriptIds |
Constructor and Description |
---|
GtfInputStream(InputStream master,
String charset,
boolean skipRepeatedTranscriptIds)
Create a new input stream reading from the master.
|
Modifier and Type | Method and Description |
---|---|
private StringBuffer |
appendLine(StringBuffer sb,
String[] columns,
GtfInputStream.Attribute[] attr)
Append columns to the buffer and separate each with a tab.
|
int |
available() |
void |
close() |
private String[] |
getNextLine()
Read the next line from the GTF file and split on tab character.
|
int |
getNumLines()
Get the number of lines parsed so far.
|
int |
getNumUniqueTranscriptIds()
Get the number of unique transcript ids found so far.
|
boolean |
markSupported() |
private void |
parseAttributes(String template)
Parse attributes from the given template string.
|
int |
read() |
int |
read(byte[] b) |
int |
read(byte[] b,
int off,
int len) |
private byte[] |
readMore()
Read more data from the GTF file.
|
void |
reset() |
mark, skip
private final InputStream master
private final BufferedReader reader
private final Charset charset
private final Pattern ATTRIBUTE_PATTERN
private byte[] buffer
private int index
private int lineNum
private GtfInputStream.Attribute[] attributes
private int geneIdIndex
private int transcriptIdIndex
private final boolean skipRepeatedTranscriptIds
public GtfInputStream(InputStream master, String charset, boolean skipRepeatedTranscriptIds) throws IOException
master
- The master input streamcharset
- The character set used in the fileskipRepeatedTranscriptIds
- TRUE to skip lines with non-unique
values for transcript_id+seqnameIOException
public int read() throws IOException
read
in class InputStream
IOException
public int read(byte[] b) throws IOException
read
in class InputStream
IOException
public int read(byte[] b, int off, int len) throws IOException
read
in class InputStream
IOException
public int available() throws IOException
available
in class InputStream
IOException
public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
close
in class InputStream
IOException
public boolean markSupported()
markSupported
in class InputStream
public void reset() throws IOException
reset
in class InputStream
IOException
public int getNumLines()
public int getNumUniqueTranscriptIds()
private byte[] readMore() throws IOException
IOException
private String[] getNextLine() throws IOException
IOException
private void parseAttributes(String template) throws IOException
IOException
private StringBuffer appendLine(StringBuffer sb, String[] columns, GtfInputStream.Attribute[] attr)
sb
- The buffer to append tocolumns
- The regular columns (must be at least 8)attr
- The attributes to add