|
3.2.4: 2013-12-06 | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjava.io.InputStream
net.sf.basedb.util.gtf.GtfInputStream
public class GtfInputStream
Input stream implementation that reads from a GTF file and converts it to
a simple tab-separated file with a single line of column headers. This is
useful since it means that we can use the regular FlatFileParser
and other tools for parsing the resulting stream. The first line in the
file is used a template line. The first 8 columns are fixed. The 9th column
contains attributes as key/value pairs, which are converted to additional
columns in the output. The GTF specification require that gene_id
and transcript_id
are present, which means that the output will
contain at least 10 columns. Subsequent lines are parsed in the same way and
attributes are lined up with the first line. Note that any attributes
that are not present in the first line are skipped. The parser also has an
option to skip lines with a transcript_id+seqname
that is not unique.
Normally, a GTF file will contain multiple entries with the same id:s, but
in most cases we are not interested in this when importing data to BASE.
This option also remove the feature, start, end, score, strand and frame
columns from the output.
Nested Class Summary | |
---|---|
(package private) static class |
GtfInputStream.Attribute
|
Field Summary | |
---|---|
private Pattern |
ATTRIBUTE_PATTERN
|
private GtfInputStream.Attribute[] |
attributes
|
private byte[] |
buffer
|
private Charset |
charset
|
private int |
geneIdIndex
|
private int |
index
|
private int |
lineNum
|
private InputStream |
master
|
private BufferedReader |
reader
|
private boolean |
skipRepeatedTranscriptIds
|
private int |
transcriptIdIndex
|
private Set<String> |
transcriptIds
|
Constructor Summary | |
---|---|
GtfInputStream(InputStream master,
String charset,
boolean skipRepeatedTranscriptIds)
Create a new input stream reading from the master. |
Method Summary | |
---|---|
private StringBuffer |
appendLine(StringBuffer sb,
String[] columns,
GtfInputStream.Attribute[] attr)
Append the first 8 columns to the buffer and then add all values from the attributes. |
int |
available()
|
void |
close()
|
private String[] |
getNextLine()
Read the next line from the GTF file and split on tab character into 9 or 10 columns. |
int |
getNumLines()
Get the number of lines parsed so far. |
int |
getNumUniqueTranscriptIds()
Get the number of unique transcript ids found so far. |
private void |
init()
Initialize the converter by reading the first line from the GTF file. |
boolean |
markSupported()
|
private void |
parseAttributes(String template)
Parse attributes from the given template string. |
int |
read()
|
int |
read(byte[] b)
|
int |
read(byte[] b,
int off,
int len)
|
private byte[] |
readMore()
Read more data from the GTF file. |
void |
reset()
|
Methods inherited from class java.io.InputStream |
---|
mark, skip |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private final InputStream master
private final BufferedReader reader
private final Charset charset
private final Pattern ATTRIBUTE_PATTERN
private byte[] buffer
private int index
private int lineNum
private GtfInputStream.Attribute[] attributes
private int geneIdIndex
private int transcriptIdIndex
private final boolean skipRepeatedTranscriptIds
private final Set<String> transcriptIds
Constructor Detail |
---|
public GtfInputStream(InputStream master, String charset, boolean skipRepeatedTranscriptIds) throws IOException
master
- The master input streamcharset
- The character set used in the fileskipRepeatedTranscriptIds
- TRUE to skip lines with non-unique
values for transcript_id+seqname
IOException
Method Detail |
---|
public int read() throws IOException
read
in class InputStream
IOException
public int read(byte[] b) throws IOException
read
in class InputStream
IOException
public int read(byte[] b, int off, int len) throws IOException
read
in class InputStream
IOException
public int available() throws IOException
available
in class InputStream
IOException
public void close() throws IOException
close
in interface Closeable
close
in class InputStream
IOException
public boolean markSupported()
markSupported
in class InputStream
public void reset() throws IOException
reset
in class InputStream
IOException
public int getNumLines()
public int getNumUniqueTranscriptIds()
private void init() throws IOException
IOException
private byte[] readMore() throws IOException
IOException
private String[] getNextLine() throws IOException
IOException
private void parseAttributes(String template) throws IOException
IOException
private StringBuffer appendLine(StringBuffer sb, String[] columns, GtfInputStream.Attribute[] attr)
sb
- The buffer to append tocolumns
- The regular columns (must be at least 8)attr
- The attributes to add
|
3.2.4: 2013-12-06 | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |