#612 closed (fixed)
Add support for compressed files
Reported by: | Fredrik Levander | Owned by: | olle |
---|---|---|---|
Milestone: | Proteios SE 2.10 | Keywords: | |
Cc: |
Description (last modified by )
Proteios SE should handle gzip-compressed files, similarly to BASE. The BASE core File code could be merged to the Proteios core File code. When downloading files they should be decompressed on the fly. This also applies to remote files, which are downloaded from an URI. Example: the remote file test.mzML.gz could be registered in Proteios as test.mzML with the compressed flag set to true. It will appear as uncompressed to the user. One could imagine letting users chose to download files as compressed, as well, but only through the GUI (not through the ftp).
Change History (22)
comment:1 Changed 14 years ago by
Description: | modified (diff) |
---|---|
Owner: | Gregory Vincic deleted |
comment:2 Changed 14 years ago by
Owner: | set to olle |
---|
comment:4 Changed 14 years ago by
Traceability note:
Proteios SE sister project BASE (http://base.thep.lu.se/) has had two major tickets related to compressed files:
- BASE2 ticket #366 (Batch upload using zip-files). Mainly intended for uploading several files combined into a single compressed file, which was uncompressed when uploaded. This also reduced the band width required for the file transfer.
- BASE2 ticket #411 (Extending compressed file support and storage). Mainly intended for internal storage in compressed format, in order to reduce disk storage space.
The two cases complement each other, and both include a performance penalty related to the compression and/or decompression of data.
This ticket will be mostly concerned with case 2, BASE2 ticket #411 (Extending compressed file support and storage), but might differ in what features are implemented.
comment:5 Changed 14 years ago by
Design discussion.
BASE2 ticket #411 (Extending compressed file support and storage) adds two ways to automatically store files internally in compressed format:
- Some types of files are known to have their size substantially reduced when compressed. BASE could therefore be configured to automatically store files of specific MIME types in compressed format internally.
- A directory could be configured to store all files uploaded to it in compressed format internally.
In addition to this, a user could control internal compressed file storage in BASE in other ways:
- When uploading a new file, a user could select to store it internally in compressed format.
- Methods were added to the
File
class allowing the internal storage of a file to be changed, i.e. a file previously stored uncompressed to be stored compressed, and vice versa.
Design of first version of support in Proteios SE for storing files internally in compressed format (management of remote files defined through a URI is not included here):
- The first version of support in Proteios SE for storing files internally in compressed format will not include any automatic internal storage in compressed format (cases 1 and 2 above), but only cases where the user can control the storage mode (cases 3 and 4 above).
- When uploading a file to Proteios SE, the user should be able to select that the file should be stored in compressed format internally. Default is to store a file in uncompressed format internally.
- An extension should be developed to allow a user to select files that should have their internal storage format changed, i.e. a file previously stored uncompressed to be stored compressed, and vice versa. This will allow the storage format of files uploaded through FTP to be changed, since they will be stored uncompressed by default.
- In order for previous functionality to work without major code changes, public method
InputStream getDownloadStream(long offset)
in classFile
should be modified to decompressed the contents of a file stored in compressed format internally. - For cases when an unmodified input stream of an internally stored file is desired, i.e. without decompression of the contents of a file stored in compressed format internally, a new public method
InputStream getRawDownloadStream(long offset)
should be added to classFile
.
Note that if an already compressed file is uploaded to Proteios SE and stored internally
without (extra) compression, which is the default, its "compress
" flag will have value false
,
even though the file may have a ".gz
" file extension. When downloaded, no decompression will
therefore be applied, and the original (already compressed) file will result.
comment:6 Changed 14 years ago by
Design update:
Differences in BASE2 and Proteios current code regarding public method OutputStream getUploadStream(...)
in class File
:
- In BASE2, public method
OutputStream getUploadStream(boolean checkMd5)
in classFile
was updated 2007-09-12 15:02:59 in changeset [3719] for BASE2 ticket #411 with a new argument "Boolean compress
", resulting in:OutputStream getUploadStream(boolean checkMd5, Boolean compress)
.
- In Proteios, public method
OutputStream getUploadStream(boolean checkMd5)
in classFile
was updated 2007-10-30 16:19:58 in changeset [2313] (no ticket referenced) with a new argument "boolean append
", resulting in:OutputStream getUploadStream(boolean checkMd5, boolean append)
.
When these are to be merged in the Proteios code, the question arises of what call to support
for two arguments, e.g. will the call "getUploadStream(checkMd5, true)
" mean that variable append
has value true
and variable compress
has a default value, or vice versa?
Keeping the BASE2 and Proteios code as compatible as possible for classes used in both
projects is desirable, as it makes it easier to transfer added features between
them.
Inspection of the Proteios core code showed that only three classes currently called
getUploadStream(...)
with a specified value for argument "append
", corresponding
to files:
- client/servlet/src/org/proteios/Service.java
- plugin/src/org/proteios/plugins/MsInspectSearchResultsToCSVFile.java
- plugin/src/org/proteios/plugins/SearchResultsPepCSVMatch.java
Since these calls could easily be modified to include a default value for an extra argument
"Boolean compress
", it was decided that the call "OutputStream getUploadStream(boolean checkMd5, Boolean compress)
" should be supported in Proteios SE.
comment:7 Changed 14 years ago by
comment:8 Changed 14 years ago by
(In [3473]) Refs #612. Refs #209. Refs #287. Basic support for storing files in compressed format internally added:
- XML file with common queries conf/common-queries.xml in
api/core/ updated with new SQL query "SET_COMPRESSED_ON_FILES
"
for setting the "compressed
" property to false
and the
"compressedSize
" property to current size for all files with
a null value.
- Class/file core/Install.java in api/core/ updated by
changing the value of integer constant NEW_SCHEMA_VERSION
from 8 to 9.
- New class/file core/UpdateToSchemaVersion9.java in api/core/
added. Protected method
int adjustItemsToNewSchemaVersion(org.hibernate.Session session)
executes pre-defined SQL query SET_COMPRESSED_ON_FILES
.
- New interface/file core/AbsoluteProgressReporter.java in
api/core/ added.
- New class/file core/SimpleProgressReporter.java in api/core/
added. It implements interrface ProgressReporter
.
- New class/file core/SimpleAbsoluteProgressReporter.java in
api/core added. It extends class SimpleProgressReporter
and implements interface AbsoluteProgressReporter
.
- Class/file core/data/FileData.java in api/core/ updated:
- New private instance variables
boolean compressed
and
long compressedSize
added, together with public accessor
methods.
- Class/file core/File.java in api/core/ updated:
- New private instance variable
boolean uploadCompressed
added. It indicates cases where automatic compression of
uploaded files should be used. Default value is false
.
Currently not used, but added to increase similarity with
BASE2 coded and possible future use.
- New public method
void compress(ProgressReporter progress)
added. It compresses the file stored on disk, if not already compressed.
- New public method
void decompress(ProgressReporter progress)
added. It decompresses the file stored on disk if the former is stored in compressed format.
- New private method
void compressOrDecompress(ProgressReporter progress, boolean compress)
added. It compresses or decompresses the file depending on the
value of boolean
argument compress
.
- New public method
boolean isCompressed()
added.
It reports if the file is stored in compressed format internally.
- New public method
long getCompressedSizeBytes()
added.
It reports the compressed size in bytes of the file. If the file is not stored in compressed format internally, the normal size in bytes of the file is reported.
- Public method
void upload(InputStream in, boolean checkMd5)
updated to call new public method
void upload(InputStream in, boolean checkMd5, Boolean compress)
with argument compress
set to null
.
- New public method
void upload(InputStream in, boolean checkMd5, Boolean compress)
added. It calls new public method
OutputStream getUploadStream(boolean checkMd5, Boolean compress, boolean append)
with argument append
set to false
.
- Public method
OutputStream getUploadStream(boolean checkMd5)
updated to call new public method
OutputStream getUploadStream(boolean checkMd5, Boolean compress, boolean append)
with argument compress
set to null
and argument append
set to false
.
- New public method
OutputStream getUploadStream(boolean checkMd5, Boolean compress)
added. It calls new public method
OutputStream getUploadStream(boolean checkMd5, Boolean compress, boolean append)
with argument append
set to false
.
- New public method
OutputStream getUploadStream(boolean checkMd5, Boolean compress, boolean append)
added. It supports optional storage in compressed format internally.
- Public method
InputStream getDownloadStream(long offset)
updated to decompress the content of a file stored in compressed format internally.
- New public method
InputStream getRawDownloadStream(long offset)
added.
It does not attempt to decompress the content of a file stored
in compressed format internally.
- Private method
java.io.File getNewFile()
updated with
argument boolean compress
. If the latter has value true
,
the name of the internal file is given file extension ".gz
".
- Private inner class
UploadStream
updated with new
private instance variables java.io.File file
and
boolean compress
. Constructor
UploadStream(java.io.File file, boolean calculateMd5, boolean checkMd5,
boolean append)
updated with new argument boolean compress
to
UploadStream(java.io.File file, boolean calculateMd5, boolean checkMd5,
boolean compress, boolean append)
.
Supports optional storage in compressed format internally.
- Class/file util/FileUtil.java in api/core/ updated:
- Public static method
long copy(InputStream in, OutputStream out)
updated to call new public static method
long copy(InputStream in, OutputStream out, AbsoluteProgressReporter progress)
with argument progress
set to value null
.
- New public static method
long copy(InputStream in, OutputStream out, AbsoluteProgressReporter progress)
added. It supports an optional AbsoluteProgressReporter
.
- Class/file util/ConsoleProgressReporter.java in api/core/
updated:
- Public method
void display(int percent, String message)
updated to display en empty message string if the input
message is null
.
- Class/file Service.java in client/servlet/ updated:
- Private method
void createItems(RequestData data)
updated to call method getUploadStream(...)
in class File
with argument compress
set to value null
.
- Class/file plugins/MsInspectSearchResultsToCSVFile.java
in plugin/ updated:
- Private method
OutputStream getOutputStream(File f, DbControl dc, List<String> oldcontent)
updated to call method getUploadStream(...)
in class File
with argument compress
set to value null
.
- Class/file plugins/SearchResultsPepCSVMatch.java in plugin/
updated:
- Private method
OutputStream getOutputStream(File f, DbControl dc)
updated to call method getUploadStream(...)
in class File
with argument compress
set to value null
.
- English dictionary file locale/en/dictionary in client/servlet/
updated with new entries for various string keys.
comment:9 Changed 14 years ago by
(In [3474]) Refs #612. Class/file core/File.java in api/core/ updated:
- Public method
void upload(InputStream in, boolean checkMd5)
updated to use boolean
constant COMPRESS_NULL
instead of
explicit null
value when calling public method
void upload(InputStream in, boolean checkMd5, Boolean compress)
with argument compress
set to null
.
comment:10 Changed 14 years ago by
(In [3475]) Refs #612. Added option for user to request that a file to be uploaded should be stored in compressed format internally:
- Class/file gui/form/FormFactory.java in client/servlet/ updated:
- Public method
Form getNewFileForm(DbControl dc)
updated to
show a check box to request that the file to be uploaded should be
stored in compressed format internally. The check box is coupled
to valid parameter VBoolean VSTOREDINCOMPRESSEDFORMAT
in class
SaveFile
.
- Class/file action/file/SaveFile.java in client/servlet/ updated:
- New valid parameter
VBoolean VSTOREDINCOMPRESSEDFORMAT
added.
Default value is false
.
- Public method
void runMe()
updated to obtain value of
valid parameter VBoolean VSTOREDINCOMPRESSEDFORMAT
, and if
the value of the former is true
, private method
void saveUploadedFile(FileItem item, Integer dirId, boolean compress)
is called with argument compress
set to true
, else false
.
- Private method
void saveUploadedFile(FileItem item, Integer dirId)
updated with new argument boolean compress
to
private void saveUploadedFile(FileItem item, Integer dirId, boolean compress)
.
The value of the new argument is used when calling
public method
void upload(InputStream in, boolean checkMd5, Boolean compress)
in class File
to upload and store the file.
comment:11 Changed 14 years ago by
(In [3476]) Refs #612. Added Proteios extension and plug-in for selecting files to have the internal storage format changed from uncompressed to compressed, and vice versa:
- New class/file action/file/CompressDecompressFilesExtension.java
in client/servlet/ added. It obtains the list of selected files
and transfers it to a job created for new plug-in class
CompressDecompressFilesPlugin
.
- New class/file plugins/CompressDecompressFilesPlugin.java in
plugin/ added. It obtains the list of files from a job parameter
and calls either public method
void compress(ProgressReporter progress)
or
void decompress(ProgressReporter progress)
in class File
, in both cases with argument progress
set to
null
, as progress is only reported depending on how many files
has been processed.
comment:12 Changed 14 years ago by
comment:13 Changed 14 years ago by
Design update for remote files:
- The Proteios extension and plug-in for selecting files to have the storage format changed from uncompressed to compressed, and vice versa, should be updated to toggle the value of the boolean "
compressed
" flag for remote files, not stored internally. This will trigger decompression of the download stream of a remote file with "compressed
" flag set totrue
. - The filename and size values will not be changed by the plug-in when the value of the "
compressed
" flag is changed for a remote file. - The job done message for a compress/decompress job will be updated to specify how many internal and external files were processed.
comment:14 Changed 14 years ago by
(In [3479]) Refs #612. The Proteios extension and plug-in for selecting
files to have the storage format changed from uncompressed
to compressed, and vice versa, should be updated to toggle
the value of the boolean
"compressed
" flag for remote files,
not stored internally. This will trigger decompression of
the download stream of a remote file with "compressed
"
flag set to true:
- Class/file core/File.java in api/core/ updated:
- Public method
void compress(ProgressReporter progress)
updated to set the value of boolean
"compressed
" flag
for a remote file to true
.
- Public method
void decompress(ProgressReporter progress)
updated to set the value of boolean
"compressed
" flag
for a remote file to false
.
- Class/file plugins/CompressDecompressFilesPlugin.java
in plugin/ updated in public method
void execute(Request request, Response response, ProgressReporter progress)
to specify in the job done message how many internal and
external files were processed.
comment:15 Changed 14 years ago by
Design update:
- It was decided to update class
File
with public setter methodsvoid setSizeInBytes(long size)
andvoid setMd5(String Md5)
. These values are set automatically for internally stored files, and the new methods should therefore only change the value of remote files.
comment:16 Changed 14 years ago by
(In [3480]) Refs #612. Class/file core/File.java in api/core/ updated with public setter methods to set size and MD5 value for remote files:
- Class/file core/File.java in api/core/ updated:
- New public method
void setSizeInBytes(long size)
added.
It sets the size in bytes, provided that the location of the
file is different from Location.PRIMARY
, i.e. it is a remote file.
- New public method
void setMd5(String Md5)
added.
It sets the MD5 value, provided that the location of the
file is different from Location.PRIMARY
, i.e. it is a remote file.
comment:17 Changed 14 years ago by
Design update:
- It was decided to update class
File
with public setter methodsvoid setCompressed(boolean compressed)
andvoid setCompressedSizeInBytes(long size)
. These values are set automatically for internally stored files, and the new methods should therefore only change the value of remote files (at least in the first version).
comment:18 Changed 14 years ago by
(In [3481]) Refs #612. Class/file core/File.java in api/core/ updated with public setter methods to set compression flag and compressed size for remote files:
- Class/file core/File.java in api/core/ updated:
- New public method
void setCompressed(boolean compressed)
added.
It sets the flag indicating that the file is stored in compressed
format, provided that the location of the file is different from
Location.PRIMARY
, i.e. it is a remote file.
- New public method
void setCompressedSizeInBytes(long size)
added.
It sets the compressed size in bytes, provided that the location
of the file is different from Location.PRIMARY
, i.e. it is a
remote file.
comment:19 Changed 14 years ago by
Design update:
- In order to comply with previous coding principles in Proteios, public getter method
long getCompressedSizeBytes()
should be renamedlong getCompressedSizeInBytes()
.
comment:20 Changed 14 years ago by
(In [3482]) Refs #612. Refs #287. Class/file core/File.java in api/core/
updated by changing name of public getter method
long getCompressedSizeBytes()
to long getCompressedSizeInBytes()
,
in order to comply with previous coding principles in Proteios:
- Class/file core/File.java in api/core/
updated by changing name of public getter method
long getCompressedSizeBytes()
to long getCompressedSizeInBytes()
.
- English dictionary file locale/en/dictionary in client/servlet/
updated with new entries for various string keys.
comment:21 Changed 14 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Ticket closed as the requested functionality has been added.
Ticket reassigned.