Opened 14 years ago

Closed 14 years ago

Last modified 11 years ago

#612 closed (fixed)

Add support for compressed files

Reported by: Fredrik Levander Owned by: olle
Milestone: Proteios SE 2.10 Keywords:
Cc:

Description (last modified by Fredrik Levander)

Proteios SE should handle gzip-compressed files, similarly to BASE. The BASE core File code could be merged to the Proteios core File code. When downloading files they should be decompressed on the fly. This also applies to remote files, which are downloaded from an URI. Example: the remote file test.mzML.gz could be registered in Proteios as test.mzML with the compressed flag set to true. It will appear as uncompressed to the user. One could imagine letting users chose to download files as compressed, as well, but only through the GUI (not through the ftp).

Change History (22)

comment:1 Changed 14 years ago by Fredrik Levander

Description: modified (diff)
Owner: Gregory Vincic deleted

comment:2 Changed 14 years ago by olle

Owner: set to olle

Ticket reassigned.

comment:3 Changed 14 years ago by olle

Status: newassigned

Ticket accepted.

comment:4 Changed 14 years ago by olle

Traceability note:

Proteios SE sister project BASE (http://base.thep.lu.se/) has had two major tickets related to compressed files:

  1. BASE2 ticket #366 (Batch upload using zip-files). Mainly intended for uploading several files combined into a single compressed file, which was uncompressed when uploaded. This also reduced the band width required for the file transfer.
  2. BASE2 ticket #411 (Extending compressed file support and storage). Mainly intended for internal storage in compressed format, in order to reduce disk storage space.

The two cases complement each other, and both include a performance penalty related to the compression and/or decompression of data.

This ticket will be mostly concerned with case 2, BASE2 ticket #411 (Extending compressed file support and storage), but might differ in what features are implemented.

comment:5 Changed 14 years ago by olle

Design discussion.

BASE2 ticket #411 (Extending compressed file support and storage) adds two ways to automatically store files internally in compressed format:

  1. Some types of files are known to have their size substantially reduced when compressed. BASE could therefore be configured to automatically store files of specific MIME types in compressed format internally.
  2. A directory could be configured to store all files uploaded to it in compressed format internally.

In addition to this, a user could control internal compressed file storage in BASE in other ways:

  1. When uploading a new file, a user could select to store it internally in compressed format.
  2. Methods were added to the File class allowing the internal storage of a file to be changed, i.e. a file previously stored uncompressed to be stored compressed, and vice versa.

Design of first version of support in Proteios SE for storing files internally in compressed format (management of remote files defined through a URI is not included here):

  • The first version of support in Proteios SE for storing files internally in compressed format will not include any automatic internal storage in compressed format (cases 1 and 2 above), but only cases where the user can control the storage mode (cases 3 and 4 above).
  • When uploading a file to Proteios SE, the user should be able to select that the file should be stored in compressed format internally. Default is to store a file in uncompressed format internally.
  • An extension should be developed to allow a user to select files that should have their internal storage format changed, i.e. a file previously stored uncompressed to be stored compressed, and vice versa. This will allow the storage format of files uploaded through FTP to be changed, since they will be stored uncompressed by default.
  • In order for previous functionality to work without major code changes, public method InputStream getDownloadStream(long offset) in class File should be modified to decompressed the contents of a file stored in compressed format internally.
  • For cases when an unmodified input stream of an internally stored file is desired, i.e. without decompression of the contents of a file stored in compressed format internally, a new public method InputStream getRawDownloadStream(long offset) should be added to class File.

Note that if an already compressed file is uploaded to Proteios SE and stored internally without (extra) compression, which is the default, its "compress" flag will have value false, even though the file may have a ".gz" file extension. When downloaded, no decompression will therefore be applied, and the original (already compressed) file will result.

comment:6 Changed 14 years ago by olle

Design update:

Differences in BASE2 and Proteios current code regarding public method OutputStream getUploadStream(...) in class File:

  • In BASE2, public method OutputStream getUploadStream(boolean checkMd5) in class File was updated 2007-09-12 15:02:59 in changeset [3719] for BASE2 ticket #411 with a new argument "Boolean compress", resulting in:

    OutputStream getUploadStream(boolean checkMd5, Boolean compress).

  • In Proteios, public method OutputStream getUploadStream(boolean checkMd5) in class File was updated 2007-10-30 16:19:58 in changeset [2313] (no ticket referenced) with a new argument "boolean append", resulting in:

    OutputStream getUploadStream(boolean checkMd5, boolean append).

When these are to be merged in the Proteios code, the question arises of what call to support for two arguments, e.g. will the call "getUploadStream(checkMd5, true)" mean that variable append has value true and variable compress has a default value, or vice versa? Keeping the BASE2 and Proteios code as compatible as possible for classes used in both projects is desirable, as it makes it easier to transfer added features between them.

Inspection of the Proteios core code showed that only three classes currently called getUploadStream(...) with a specified value for argument "append", corresponding to files:

  1. client/servlet/src/org/proteios/Service.java
  2. plugin/src/org/proteios/plugins/MsInspectSearchResultsToCSVFile.java
  3. plugin/src/org/proteios/plugins/SearchResultsPepCSVMatch.java

Since these calls could easily be modified to include a default value for an extra argument "Boolean compress", it was decided that the call "OutputStream getUploadStream(boolean checkMd5, Boolean compress)" should be supported in Proteios SE.

comment:7 Changed 14 years ago by olle

Traceability note:

  • Both plug-ins MsInspectSearchResultsToCSVFile and SearchResultsPepCSVMatch were introduced in Ticket #209 (Enable LC-MS comparisons and include list generation) in changeset [2319].

comment:8 Changed 14 years ago by olle

(In [3473]) Refs #612. Refs #209. Refs #287. Basic support for storing files in compressed format internally added:

  1. XML file with common queries conf/common-queries.xml in

api/core/ updated with new SQL query "SET_COMPRESSED_ON_FILES" for setting the "compressed" property to false and the "compressedSize" property to current size for all files with a null value.

  1. Class/file core/Install.java in api/core/ updated by

changing the value of integer constant NEW_SCHEMA_VERSION from 8 to 9.

  1. New class/file core/UpdateToSchemaVersion9.java in api/core/

added. Protected method int adjustItemsToNewSchemaVersion(org.hibernate.Session session) executes pre-defined SQL query SET_COMPRESSED_ON_FILES.

  1. New interface/file core/AbsoluteProgressReporter.java in

api/core/ added.

  1. New class/file core/SimpleProgressReporter.java in api/core/

added. It implements interrface ProgressReporter.

  1. New class/file core/SimpleAbsoluteProgressReporter.java in

api/core added. It extends class SimpleProgressReporter and implements interface AbsoluteProgressReporter.

  1. Class/file core/data/FileData.java in api/core/ updated:
  2. New private instance variables boolean compressed and

long compressedSize added, together with public accessor methods.

  1. Class/file core/File.java in api/core/ updated:
  2. New private instance variable boolean uploadCompressed

added. It indicates cases where automatic compression of uploaded files should be used. Default value is false. Currently not used, but added to increase similarity with BASE2 coded and possible future use.

  1. New public method void compress(ProgressReporter progress)

added. It compresses the file stored on disk, if not already compressed.

  1. New public method void decompress(ProgressReporter progress)

added. It decompresses the file stored on disk if the former is stored in compressed format.

  1. New private method

void compressOrDecompress(ProgressReporter progress, boolean compress) added. It compresses or decompresses the file depending on the value of boolean argument compress.

  1. New public method boolean isCompressed() added.

It reports if the file is stored in compressed format internally.

  1. New public method long getCompressedSizeBytes() added.

It reports the compressed size in bytes of the file. If the file is not stored in compressed format internally, the normal size in bytes of the file is reported.

  1. Public method void upload(InputStream in, boolean checkMd5)

updated to call new public method void upload(InputStream in, boolean checkMd5, Boolean compress) with argument compress set to null.

  1. New public method

void upload(InputStream in, boolean checkMd5, Boolean compress) added. It calls new public method OutputStream getUploadStream(boolean checkMd5, Boolean compress, boolean append) with argument append set to false.

  1. Public method OutputStream getUploadStream(boolean checkMd5)

updated to call new public method OutputStream getUploadStream(boolean checkMd5, Boolean compress, boolean append) with argument compress set to null and argument append set to false.

  1. New public method

OutputStream getUploadStream(boolean checkMd5, Boolean compress) added. It calls new public method OutputStream getUploadStream(boolean checkMd5, Boolean compress, boolean append) with argument append set to false.

  1. New public method

OutputStream getUploadStream(boolean checkMd5, Boolean compress, boolean append) added. It supports optional storage in compressed format internally.

  1. Public method InputStream getDownloadStream(long offset)

updated to decompress the content of a file stored in compressed format internally.

  1. New public method

InputStream getRawDownloadStream(long offset) added. It does not attempt to decompress the content of a file stored in compressed format internally.

  1. Private method java.io.File getNewFile() updated with

argument boolean compress. If the latter has value true, the name of the internal file is given file extension ".gz".

  1. Private inner class UploadStream updated with new

private instance variables java.io.File file and boolean compress. Constructor UploadStream(java.io.File file, boolean calculateMd5, boolean checkMd5, boolean append) updated with new argument boolean compress to UploadStream(java.io.File file, boolean calculateMd5, boolean checkMd5, boolean compress, boolean append). Supports optional storage in compressed format internally.

  1. Class/file util/FileUtil.java in api/core/ updated:
  2. Public static method long copy(InputStream in, OutputStream out)

updated to call new public static method long copy(InputStream in, OutputStream out, AbsoluteProgressReporter progress) with argument progress set to value null.

  1. New public static method

long copy(InputStream in, OutputStream out, AbsoluteProgressReporter progress) added. It supports an optional AbsoluteProgressReporter.

  1. Class/file util/ConsoleProgressReporter.java in api/core/

updated:

  1. Public method void display(int percent, String message)

updated to display en empty message string if the input message is null.

  1. Class/file Service.java in client/servlet/ updated:
  2. Private method void createItems(RequestData data)

updated to call method getUploadStream(...) in class File with argument compress set to value null.

  1. Class/file plugins/MsInspectSearchResultsToCSVFile.java

in plugin/ updated:

  1. Private method

OutputStream getOutputStream(File f, DbControl dc, List<String> oldcontent) updated to call method getUploadStream(...) in class File with argument compress set to value null.

  1. Class/file plugins/SearchResultsPepCSVMatch.java in plugin/

updated:

  1. Private method

OutputStream getOutputStream(File f, DbControl dc) updated to call method getUploadStream(...) in class File with argument compress set to value null.

  1. English dictionary file locale/en/dictionary in client/servlet/

updated with new entries for various string keys.

comment:9 Changed 14 years ago by olle

(In [3474]) Refs #612. Class/file core/File.java in api/core/ updated:

  1. Public method void upload(InputStream in, boolean checkMd5)

updated to use boolean constant COMPRESS_NULL instead of explicit null value when calling public method void upload(InputStream in, boolean checkMd5, Boolean compress) with argument compress set to null.

comment:10 Changed 14 years ago by olle

(In [3475]) Refs #612. Added option for user to request that a file to be uploaded should be stored in compressed format internally:

  1. Class/file gui/form/FormFactory.java in client/servlet/ updated:
  2. Public method Form getNewFileForm(DbControl dc) updated to

show a check box to request that the file to be uploaded should be stored in compressed format internally. The check box is coupled to valid parameter VBoolean VSTOREDINCOMPRESSEDFORMAT in class SaveFile.

  1. Class/file action/file/SaveFile.java in client/servlet/ updated:
  2. New valid parameter VBoolean VSTOREDINCOMPRESSEDFORMAT added.

Default value is false.

  1. Public method void runMe() updated to obtain value of

valid parameter VBoolean VSTOREDINCOMPRESSEDFORMAT, and if the value of the former is true, private method void saveUploadedFile(FileItem item, Integer dirId, boolean compress) is called with argument compress set to true, else false.

  1. Private method

void saveUploadedFile(FileItem item, Integer dirId) updated with new argument boolean compress to private void saveUploadedFile(FileItem item, Integer dirId, boolean compress). The value of the new argument is used when calling public method void upload(InputStream in, boolean checkMd5, Boolean compress) in class File to upload and store the file.

comment:11 Changed 14 years ago by olle

(In [3476]) Refs #612. Added Proteios extension and plug-in for selecting files to have the internal storage format changed from uncompressed to compressed, and vice versa:

  1. New class/file action/file/CompressDecompressFilesExtension.java

in client/servlet/ added. It obtains the list of selected files and transfers it to a job created for new plug-in class CompressDecompressFilesPlugin.

  1. New class/file plugins/CompressDecompressFilesPlugin.java in

plugin/ added. It obtains the list of files from a job parameter and calls either public method void compress(ProgressReporter progress) or void decompress(ProgressReporter progress) in class File, in both cases with argument progress set to null, as progress is only reported depending on how many files has been processed.

comment:12 Changed 14 years ago by olle

Traceability note:

  • The basic support for storing files in compressed format internally in Proteios SE added 2009-11-06 12:30:35 is based on updates added to BASE2 2007-09-12 15:02:59 in changeset [3719] for BASE2 ticket #411.

comment:13 Changed 14 years ago by olle

Design update for remote files:

  • The Proteios extension and plug-in for selecting files to have the storage format changed from uncompressed to compressed, and vice versa, should be updated to toggle the value of the boolean "compressed" flag for remote files, not stored internally. This will trigger decompression of the download stream of a remote file with "compressed" flag set to true.
  • The filename and size values will not be changed by the plug-in when the value of the "compressed" flag is changed for a remote file.
  • The job done message for a compress/decompress job will be updated to specify how many internal and external files were processed.

comment:14 Changed 14 years ago by olle

(In [3479]) Refs #612. The Proteios extension and plug-in for selecting files to have the storage format changed from uncompressed to compressed, and vice versa, should be updated to toggle the value of the boolean "compressed" flag for remote files, not stored internally. This will trigger decompression of the download stream of a remote file with "compressed" flag set to true:

  1. Class/file core/File.java in api/core/ updated:
  2. Public method void compress(ProgressReporter progress)

updated to set the value of boolean "compressed" flag for a remote file to true.

  1. Public method void decompress(ProgressReporter progress)

updated to set the value of boolean "compressed" flag for a remote file to false.

  1. Class/file plugins/CompressDecompressFilesPlugin.java

in plugin/ updated in public method void execute(Request request, Response response, ProgressReporter progress) to specify in the job done message how many internal and external files were processed.

comment:15 Changed 14 years ago by olle

Design update:

  • It was decided to update class File with public setter methods void setSizeInBytes(long size) and void setMd5(String Md5). These values are set automatically for internally stored files, and the new methods should therefore only change the value of remote files.

comment:16 Changed 14 years ago by olle

(In [3480]) Refs #612. Class/file core/File.java in api/core/ updated with public setter methods to set size and MD5 value for remote files:

  1. Class/file core/File.java in api/core/ updated:
  2. New public method void setSizeInBytes(long size) added.

It sets the size in bytes, provided that the location of the file is different from Location.PRIMARY, i.e. it is a remote file.

  1. New public method void setMd5(String Md5) added.

It sets the MD5 value, provided that the location of the file is different from Location.PRIMARY, i.e. it is a remote file.

comment:17 Changed 14 years ago by olle

Design update:

  • It was decided to update class File with public setter methods void setCompressed(boolean compressed) and void setCompressedSizeInBytes(long size). These values are set automatically for internally stored files, and the new methods should therefore only change the value of remote files (at least in the first version).

comment:18 Changed 14 years ago by olle

(In [3481]) Refs #612. Class/file core/File.java in api/core/ updated with public setter methods to set compression flag and compressed size for remote files:

  1. Class/file core/File.java in api/core/ updated:
  2. New public method void setCompressed(boolean compressed) added.

It sets the flag indicating that the file is stored in compressed format, provided that the location of the file is different from Location.PRIMARY, i.e. it is a remote file.

  1. New public method void setCompressedSizeInBytes(long size) added.

It sets the compressed size in bytes, provided that the location of the file is different from Location.PRIMARY, i.e. it is a remote file.

comment:19 Changed 14 years ago by olle

Design update:

  • In order to comply with previous coding principles in Proteios, public getter method long getCompressedSizeBytes() should be renamed long getCompressedSizeInBytes().

comment:20 Changed 14 years ago by olle

(In [3482]) Refs #612. Refs #287. Class/file core/File.java in api/core/ updated by changing name of public getter method long getCompressedSizeBytes() to long getCompressedSizeInBytes(), in order to comply with previous coding principles in Proteios:

  1. Class/file core/File.java in api/core/

updated by changing name of public getter method long getCompressedSizeBytes() to long getCompressedSizeInBytes().

  1. English dictionary file locale/en/dictionary in client/servlet/

updated with new entries for various string keys.

comment:21 Changed 14 years ago by olle

Resolution: fixed
Status: assignedclosed

Ticket closed as the requested functionality has been added.

comment:22 Changed 11 years ago by olle

(In [4411]) Refs #612. Class/file plugins/CompressDecompressFilesPlugin.java in plugin/ updated by fix of typo in debug message.

Note: See TracTickets for help on using tickets.