Lund, June 3, 2008 BASE support of data storage in files in the analysis domain of BASE should be extended. Currently, the analysis performed on information residing in the dynamic database with a fairly strict specification on how data should look like. The specification allows for creating additional columns in the database and create items such as files that are associated with specific analysis step. The additional column information is available for the next step of analysis, this may be also true for attached files but I have not seen usage of associated files in BASE. However, for analysis steps where only files are created and associated with the transformation there is no possiblity to run further analysis, at least not through the web interface. Probably, this is because analysis in BASE can only be done on bioassay sets. In general there should be a possibility to run plug-ins for nodes where there are files only or files with some other data. This can be accomplished by making it possible to attach files to bioassay set and extravalues. Since we try to have some context sensitiveness for plugins a scheme to create context is needed. A straight forward approach would be to create a "FILE" context for use to trigger file based plug-ins. Once file storage is possible one can imagine situations where "store to database" plug-in is needed to copy file information into the dynamic database. This will enable the user to use BASE features that does not work with files and other non-file plug-ins. Of course, it may not be possible to store information from files into database. There is two use cases already at hand that shows what kind of support should be built into BASE. These are straight forward use cases, no strange features are required except that analysis steps are based on file storage. The issue is rather how to store information about files, information that enables plug-ins to decide whether they can handle the files stored or not. The 'Data file' type can be used to tag files created and stored by plug-ins. Then further down stream plug-ins can use the data file type to decide whether it can handle the file. Remember, the files have more meta-data associated to them since they reside in a specific experiment (array platform, number of channels) that can be used by the plug-ins for deciding if it can handle a file or not. For any added data file type we should provide a file format validation scheme and als meta data extractor functioality in the same way as other types such as Affymetrix file formats. Illumina SNP data SNP data generate very large files with many reporters that will be very demanding for the database backend to store and retrieve. To ease up on the heavy database load it will be beneficial to support information storage in files. Currently Illumina SNP data is "3-channel" data but already there is some user request to extend this to more channels. In traditional BASE sense the extra columns should be stored as extra values but with files there is no need to create these extra values. Of course, if someone wants to import the extra values into the database then extra values must be created. Probably a 'Data file' type needs to be defined for analyzed Illumina SNP data, or even the orginal data type could be used. MeV Today MeV users can start MeV from BASE, download a certain bioassay set. (There is a modified MeV developed in France that allows user to connect to BASE to browse and download data.) Once data is downloaded from MeV there is no more interaction with BASE but it would be nice and benefitial to be able to store MeV analysis results back into BASE. MeV supports storage of results in a MeV proprietary file format. MeV should be possible to start as it is currently done, but also selecting an analysis node that has a MeV result file attached to it. Further into the future one could imagine that MeV analysis may be possible to run as separate plug-ins in BASE. The results are stored in BASE and the user can visualize the result using MeV. The MeV case is complicated in the sense that web services and external client communication needs to be further developed. A 'MeV' data file type need to be defined.