File upload and disk quota
1. Files and directories
- BASE should be able to store files related to experiments and
- A file may have a type, for example, raw data, protocol, reporter list,
etc. The type of the file is only used for giving client applications
a better way to filter files, not for stopping a user from using
a file wherever it can be used.
- BASE should keep track if a file has been used or not. With "used"
we mean that it has been linked to another item, for example a
- It should be possible to use a file multiple times.
- A user should be able to delete the "physical" file from disk, but the
information about the file should still remain in the database.
- A deleted file can be re-uploaded in case it is needed again.
- BASE may rename an uploaded file to avoid overwriting an existing one.
- The core should calculate and store a unique value, for example
the MD5 sum, for each file. This value is used to warn a user
that is re-uploading a file. The user is not prevented from uploading
since it is possible that errors may have been corrected.
- Files brought back from secondary storage, should however be checked
for a valid MD5 value.
- It should be possible to create a directory structure. The directory
structure doesn't have to be physically represented on the disk.
- Each directory may contain multiple files, but a single file can only
appear inside one directory.
- The directory structure may not limit how a file can be used, but
is only used as a means for users to organise their files.
- A client application may ignore the directory structure and
display all files as if they were part of the root directory.
- It is not possible to delete directories that contains files.
[NOTE] Alternatively, if a directory that contains files is deleted
the files are moved to the root directory, but this is a client issue and not
a core issue.
- Some client applications, for example a FTP client, requires that a file
can be uniquely identified by name. This implies that all files in a
directory must be unique.
- [QUESTION] How do we handle sharing of files
with users and groups? Should we require that all parent directories
must also be shared? Or do we magically "create" a parallel directory
structure like: /shared/johan, /shared/nicklas
[ANSWER] This is mainly a client issue. But, the core must allow
a client to traverse the path leading to the file.
2. Secondary storage
- BASE can be configured to support a secondary storage, where files
that are rarely used can be placed, for example on tape-backup.
[NOTE] The secondary storage is intended to be used for large
files that are not regularly used once they have been parsed after
the upload, for example images and raw data files. Such files may be
moved to cheaper long-term storage.
- A user may flag that a file should be moved to the secondary storage.
Information about the file should remain in the database.
- A user may flag that a file placed in the secondary storage should be
retreived and placed in the primary storage again.
- The BASE core will only handle the flagging of files to be moved.
It is the responsibility of an external application to actually move
the files between the primary and secondary storage.
- The external application should check if files need to be moved at
regular intervals. For example once every night.
- A file that is placed in secondary storage can also be flagged
to be deleted.
3. Disk quota
- A user must be assigned a disk quota that may not be exceeded.
- The quota is checked in the beginning of an operation, ie. before
uploading a file. If the check is successful the operation is
allowed to proceed, even if the quota is exceeded after the operation.
[NOTE] This is because if a plugin runs for
several hours it should not be rejected while saving the result.
- The quota applies to uploaded files, and other data that takes
a lot of disk space. What we mean with "other data" and "lot of
disk space" is decided for each case and should not matter to
the quota system.
- Quota values may be specified as a total sum, or with values
for each type of data or file.
- It should be possible to have independent quota settings for
primary and secondary storage.
- Files that have been deleted should not be counted.
- [IMPLEMENTATION NOTE] Checking against quota values
is something that is done fairly often. Used disk space should not
have to recalculated each time. A cache holding the most recent
values should be considered.
- A group may also be assigned quota values.
- A user may be configured to use the quota from one
of the groups where the user is a member. Then, both the user's
individual quota and the group's quota are checked.
- The amount of disk space used should be stored per user,
item and type. It will then be possible to generate reports over
disc usage for groups and projects as well.
- [QUESTION] How do we handle removing a user from
a group from which the user has used quota? How do we handle adding
a user to a group? How do we handle changing owner on an item
that uses quota?
Answer: We do not make any automatical changes that require batch
updates. If a user
is removed/added from the quota group, the disc usage is still
counted against the original group. Changing the owner of the
item will cause the disc usage to be taken over by
the new owner.