Ticket #1029: files_in_analysis-0.1.txt

File files_in_analysis-0.1.txt, 4.1 KB (added by Jari Häkkinen, 13 years ago)
Line 
1Lund, June 3, 2008
2
3
4BASE support of data storage in files in the analysis domain of BASE
5should be extended. Currently, the analysis performed on information
6residing in the dynamic database with a fairly strict specification on
7how data should look like. The specification allows for creating
8additional columns in the database and create items such as files that
9are associated with specific analysis step. The additional column
10information is available for the next step of analysis, this may be
11also true for attached files but I have not seen usage of associated
12files in BASE. However, for analysis steps where only files are
13created and associated with the transformation there is no possiblity
14to run further analysis, at least not through the web
15interface. Probably, this is because analysis in BASE can only be done
16on bioassay sets.
17
18In general there should be a possibility to run plug-ins for nodes
19where there are files only or files with some other data. This can be
20accomplished by making it possible to attach files to bioassay set and
21extravalues. Since we try to have some context sensitiveness for
22plugins a scheme to create context is needed. A straight forward
23approach would be to create a "FILE" context for use to trigger file
24based plug-ins.
25
26Once file storage is possible one can imagine situations where "store
27to database" plug-in is needed to copy file information into the
28dynamic database. This will enable the user to use BASE features that
29does not work with files and other non-file plug-ins. Of course, it
30may not be possible to store information from files into database.
31
32
33There is two use cases already at hand that shows what kind of support
34should be built into BASE. These are straight forward use cases, no
35strange features are required except that analysis steps are based on
36file storage. The issue is rather how to store information about
37files, information that enables plug-ins to decide whether they can
38handle the files stored or not. The 'Data file' type can be used to
39tag files created and stored by plug-ins. Then further down stream
40plug-ins can use the data file type to decide whether it can handle
41the file. Remember, the files have more meta-data associated to them
42since they reside in a specific experiment (array platform, number of
43channels) that can be used by the plug-ins for deciding if it can
44handle a file or not.
45
46For any added data file type we should provide a file format
47validation scheme and als meta data extractor functioality in the same
48way as other types such as Affymetrix file formats.
49
50
51Illumina SNP data
52
53SNP data generate very large files with many reporters that will be
54very demanding for the database backend to store and retrieve. To ease
55up on the heavy database load it will be beneficial to support
56information storage in files. Currently Illumina SNP data is
57"3-channel" data but already there is some user request to extend this
58to more channels. In traditional BASE sense the extra columns should
59be stored as extra values but with files there is no need to create
60these extra values. Of course, if someone wants to import the extra
61values into the database then extra values must be created.
62
63Probably a 'Data file' type needs to be defined for analyzed Illumina
64SNP data, or even the orginal data type could be used.
65
66
67MeV
68
69Today MeV users can start MeV from BASE, download a certain bioassay
70set. (There is a modified MeV developed in France that allows user to
71connect to BASE to browse and download data.) Once data is downloaded
72from MeV there is no more interaction with BASE but it would be nice
73and benefitial to be able to store MeV analysis results back into
74BASE. MeV supports storage of results in a MeV proprietary file
75format.
76
77MeV should be possible to start as it is currently done, but also
78selecting an analysis node that has a MeV result file attached to it.
79
80Further into the future one could imagine that MeV analysis may be
81possible to run as separate plug-ins in BASE. The results are stored
82in BASE and the user can visualize the result using MeV.
83
84The MeV case is complicated in the sense that web services and
85external client communication needs to be further developed.
86
87A 'MeV' data file type need to be defined.