1 | Lund, June 3, 2008
|
---|
2 |
|
---|
3 |
|
---|
4 | BASE support of data storage in files in the analysis domain of BASE
|
---|
5 | should be extended. Currently, the analysis performed on information
|
---|
6 | residing in the dynamic database with a fairly strict specification on
|
---|
7 | how data should look like. The specification allows for creating
|
---|
8 | additional columns in the database and create items such as files that
|
---|
9 | are associated with specific analysis step. The additional column
|
---|
10 | information is available for the next step of analysis, this may be
|
---|
11 | also true for attached files but I have not seen usage of associated
|
---|
12 | files in BASE. However, for analysis steps where only files are
|
---|
13 | created and associated with the transformation there is no possiblity
|
---|
14 | to run further analysis, at least not through the web
|
---|
15 | interface. Probably, this is because analysis in BASE can only be done
|
---|
16 | on bioassay sets.
|
---|
17 |
|
---|
18 | In general there should be a possibility to run plug-ins for nodes
|
---|
19 | where there are files only or files with some other data. This can be
|
---|
20 | accomplished by making it possible to attach files to bioassay set and
|
---|
21 | extravalues. Since we try to have some context sensitiveness for
|
---|
22 | plugins a scheme to create context is needed. A straight forward
|
---|
23 | approach would be to create a "FILE" context for use to trigger file
|
---|
24 | based plug-ins.
|
---|
25 |
|
---|
26 | Once file storage is possible one can imagine situations where "store
|
---|
27 | to database" plug-in is needed to copy file information into the
|
---|
28 | dynamic database. This will enable the user to use BASE features that
|
---|
29 | does not work with files and other non-file plug-ins. Of course, it
|
---|
30 | may not be possible to store information from files into database.
|
---|
31 |
|
---|
32 |
|
---|
33 | There is two use cases already at hand that shows what kind of support
|
---|
34 | should be built into BASE. These are straight forward use cases, no
|
---|
35 | strange features are required except that analysis steps are based on
|
---|
36 | file storage. The issue is rather how to store information about
|
---|
37 | files, information that enables plug-ins to decide whether they can
|
---|
38 | handle the files stored or not. The 'Data file' type can be used to
|
---|
39 | tag files created and stored by plug-ins. Then further down stream
|
---|
40 | plug-ins can use the data file type to decide whether it can handle
|
---|
41 | the file. Remember, the files have more meta-data associated to them
|
---|
42 | since they reside in a specific experiment (array platform, number of
|
---|
43 | channels) that can be used by the plug-ins for deciding if it can
|
---|
44 | handle a file or not.
|
---|
45 |
|
---|
46 | For any added data file type we should provide a file format
|
---|
47 | validation scheme and als meta data extractor functioality in the same
|
---|
48 | way as other types such as Affymetrix file formats.
|
---|
49 |
|
---|
50 |
|
---|
51 | Illumina SNP data
|
---|
52 |
|
---|
53 | SNP data generate very large files with many reporters that will be
|
---|
54 | very demanding for the database backend to store and retrieve. To ease
|
---|
55 | up on the heavy database load it will be beneficial to support
|
---|
56 | information storage in files. Currently Illumina SNP data is
|
---|
57 | "3-channel" data but already there is some user request to extend this
|
---|
58 | to more channels. In traditional BASE sense the extra columns should
|
---|
59 | be stored as extra values but with files there is no need to create
|
---|
60 | these extra values. Of course, if someone wants to import the extra
|
---|
61 | values into the database then extra values must be created.
|
---|
62 |
|
---|
63 | Probably a 'Data file' type needs to be defined for analyzed Illumina
|
---|
64 | SNP data, or even the orginal data type could be used.
|
---|
65 |
|
---|
66 |
|
---|
67 | MeV
|
---|
68 |
|
---|
69 | Today MeV users can start MeV from BASE, download a certain bioassay
|
---|
70 | set. (There is a modified MeV developed in France that allows user to
|
---|
71 | connect to BASE to browse and download data.) Once data is downloaded
|
---|
72 | from MeV there is no more interaction with BASE but it would be nice
|
---|
73 | and benefitial to be able to store MeV analysis results back into
|
---|
74 | BASE. MeV supports storage of results in a MeV proprietary file
|
---|
75 | format.
|
---|
76 |
|
---|
77 | MeV should be possible to start as it is currently done, but also
|
---|
78 | selecting an analysis node that has a MeV result file attached to it.
|
---|
79 |
|
---|
80 | Further into the future one could imagine that MeV analysis may be
|
---|
81 | possible to run as separate plug-ins in BASE. The results are stored
|
---|
82 | in BASE and the user can visualize the result using MeV.
|
---|
83 |
|
---|
84 | The MeV case is complicated in the sense that web services and
|
---|
85 | external client communication needs to be further developed.
|
---|
86 |
|
---|
87 | A 'MeV' data file type need to be defined.
|
---|