API Docs¶
Extension¶
Submission Information Package store for Invenio.
API¶
API for Invenio-SIPStore.
-
class
invenio_sipstore.api.
RecordSIP
(recordsip, sip)[source]¶ API for managing SIPRecords.
Constructor.
Parameters: - recordsip (
invenio_sipstore.models.RecordSIP
) – the RecordSIP model to manage - sip (
invenio_sipstore.api.SIP
) – the SIP associated
-
classmethod
create
(pid, record, archivable, create_sip_files=True, user_id=None, agent=None)[source]¶ Create a SIP, from the PID and the Record.
Apart from the SIP itself, it also creates
RecordSIP
for the SIP-PID-Record relationship, as well asSIPFile
objects for each of the files in the record, along withSIPMetadata
for the metadata. Those objects are not returned by this function but can be fetched by the corresponding RecordSIP attributessip
,sip.files
andsip.metadata
.Parameters: - pid (
invenio_pidstore.models.PersistentIdentifier
) – PID of the published record (‘recid’). - record (
invenio_records.api.Record
) – Record for which the SIP should be created. - archivable (bool) – tells if the record should be archived.
Usefull when
Invenio-Archivematica
is installed. - create_sip_files (bool) – If True the SIPFiles will be created.
Returns: RecordSIP object.
Return type: - pid (
-
sip
¶ Return the SIP corresponding to this record.
Return type: invenio_sipstore.api.SIP
- recordsip (
-
class
invenio_sipstore.api.
SIP
(sip)[source]¶ API for managing SIPs.
Constructor.
Parameters: sip ( invenio_sipstore.models.SIP
) – the SIP model associated-
agent
¶ Return the agent of the associated model.
-
archivable
¶ Tell if the SIP should be archived.
-
archived
¶ Tell if the SIP has been archived.
-
attach_file
(file)[source]¶ Add a file to the SIP.
Parameters: file – the file to attach. It must at least implement a key and a valid file_id. See invenio_files_rest.models.ObjectVersion
.Returns: the created SIPFile Return type: invenio_sipstore.models.SIPFile
-
attach_metadata
(type, metadata)[source]¶ Add metadata to the SIP.
Parameters: - type (str) – the type of metadata (a valid
invenio_sipstore.models.SIPMetadataType
name) - metadata (str) – the metadata to attach.
Returns: the created SIPMetadata
Return type: - type (str) – the type of metadata (a valid
-
classmethod
create
(archivable, files=None, metadata=None, user_id=None, agent=None)[source]¶ Create a SIP, from the PID and the Record.
Apart from the SIP itself, it also creates
SIPFile
objects for each of the files in the record, along withSIPMetadata
for the metadata. Those objects are not returned by this function but can be fetched by the corresponding SIP attributes ‘files’ and ‘metadata’. The created model is stored in the attribute ‘model’.Parameters: - archivable (bool) – tells if the SIP should be archived or not.
Usefull if
Invenio-Archivematica
is installed. - files – The list of files to associate with the SIP. See
invenio_sipstore.api.SIP.attach_file()
- metadata (dict) – A dictionary of metadata. The keys are the
type (valid
invenio_sipstore.models.SIPMetadataType
name) and the values are the content (string) - user_id – the ID of the user. If not given, automatically computed
- agent – If not given, automatically computed
Returns: API SIP object.
Return type: - archivable (bool) – tells if the SIP should be archived or not.
Usefull if
-
files
¶ Return the list of files associated with the SIP.
Return type: list( invenio_sipstore.models.SIPFile
)
-
id
¶ Return the ID of the associated model.
-
metadata
¶ Return the list of metadata associated with the SIP.
Return type: list( invenio_sipstore.models.SIPMetadata
)
-
user
¶ Return the user of the associated model.
-
Models¶
Invenio-SIPStore database models.
-
class
invenio_sipstore.models.
RecordSIP
(**kwargs)[source]¶ An association table for Records and SIPs.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
pid
¶ Relation to the PID associated with the record SIP.
-
pid_id
¶ Id of the PID pointing to the record.
-
sip
¶ Relation to the SIP associated with the record.
-
sip_id
¶ Id of SIP.
-
-
class
invenio_sipstore.models.
SIP
(**kwargs)[source]¶ Submission Information Package model.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
agent
¶ Agent information regarding given SIP.
-
archivable
¶ Boolean stating if the SIP should be archived or not.
-
archived
¶ Boolean stating if the SIP has been archived or not.
-
classmethod
create
(user_id=None, agent=None, id_=None, archivable=True, archived=False)[source]¶ Create a Submission Information Package object.
Parameters:
-
id
¶ Id of SIP.
-
user
¶ Relation to the User responsible for the SIP.
-
user_id
¶ User responsible for the SIP.
-
-
class
invenio_sipstore.models.
SIPFile
(**kwargs)[source]¶ Extra SIP info regarding files.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
checksum
¶ Return the checksum of the file.
-
file
¶ Relation to the SIP along which given file was submitted.
-
file_id
¶ Id of the FileInstance.
-
filepath
¶ Filepath of submitted file within the SIP record.
-
sip
¶ Relation to the SIP along which given file was submitted.
-
sip_id
¶ Id of SIP.
-
size
¶ Return the size of the file.
-
storage_location
¶ Return the location of the file in the current storage.
-
-
class
invenio_sipstore.models.
SIPMetadata
(**kwargs)[source]¶ Extra SIP info regarding metadata.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
content
¶ Text blob of the metadata content.
-
sip
¶ Relation to the SIP along which given metadata was submitted.
-
sip_id
¶ Id of SIP.
-
type
¶ Relation to the SIPMetadataType.
-
type_id
¶ ID of the metadata type.
-
-
class
invenio_sipstore.models.
SIPMetadataType
(**kwargs)[source]¶ Type of the metadata added to an SIP.
The type describes the type of file along with an eventual schema used to validate the structure of the content.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
format
¶ The format of the metadata (xml, json, txt...).
This is used as the extension of the created file during an export.
-
id
¶ ID of the SIPMetadataType object.
-
name
¶ The unique name tag of the metadata type.
-
schema
¶ URI to a schema that describes the metadata (json or xml schema).
-
title
¶ The title of type of metadata (i.e. ‘Invenio JSON Record v1.0.0’).
-
Errors¶
Errors for Submission Information Packages.
Proxies¶
Proxy definitions.
-
invenio_sipstore.proxies.
current_sipstore
= <LocalProxy unbound>¶ Helper proxy to access the SIPStore state object.
Signals¶
Signals for the module.
-
invenio_sipstore.signals.
sipstore_archiver_status
= <blinker.base.NamedSignal object at 0x7f8b3de60210; 'sipstore_archiver_status'>¶ Signal sent during the archiving processing.
Sends a dict with the following information inside: - total_files: the total number of files to copy - total_size: the total size to copy - copied_files: the number of copied files - copied_size: the size copied - current_filename: the name of the last copied file - current_filesize: the size of the last copied file
-
invenio_sipstore.signals.
sipstore_created
= <blinker.base.NamedSignal object at 0x7f8b3de601d0; 'sipstore_created'>¶ Signal sent each time a SIP has been created.
Send the SIP as a parameter:
invenio_sipstore.api.SIP
Example subscriber
def listener(sender, *args, **kwargs): # sender is the SIP being archived for f in sender.files: print(f.filepath) from invenio_sipstore.signals import sipstore_created sipstore_created.connect(listener)
Utilities¶
SIPStore utility functions.
-
invenio_sipstore.utils.
load_or_import_from_config
(key, app=None, default=None)[source]¶ Load or import value from config.
Returns: The loaded value.
-
invenio_sipstore.utils.
obj_or_import_string
(value, default=None)[source]¶ Import string or return object.
Params value: Import path or class object to instantiate. Params default: Default object to return if the import fails. Returns: The imported object.
Utilities for SIPStore archivers.
-
invenio_sipstore.archivers.utils.
chunks
(iterable, n)[source]¶ Yield iterable split into chunks.
If ‘n’ is an integer, yield the iterable as n-sized chunks. If ‘n’ is a list of integers, yield chunks of sizes: n[0], n[1], ..., len(iterable) - sum(n)
>>> from invenio_sipstore.archivers.utils import chunks >>> list(chunks('abcdefg', 3)) ['abc', 'def', 'g'] >>> list(chunks('abcdefg', [1, ])) ['a', 'bcdefg'] >>> list(chunks('abcdefg', [1, 2, 3])) ['a', 'bc', 'def', 'g']
-
invenio_sipstore.archivers.utils.
default_archive_directory_builder
(sip)[source]¶ Build a directory structure for the archived SIP.
Creates a structure that is based on the SIP’s UUID. ‘abcdefgh-1234-1234-1234-1234567890ab’ -> [‘ab’, ‘cd’, ‘efgh-1234-1234-1234-1234567890ab’]
Parameters: sip – SIP which is to be archived Returns: list of str
-
invenio_sipstore.archivers.utils.
default_sipfile_name_formatter
(sipfile)[source]¶ Default generator the SIPFile filenames.
Writes doen the file in the archive under the original filename.
WARNING: This can potentially cause security and portability issues if the SIPFile filenames come from the users.
-
invenio_sipstore.archivers.utils.
default_sipmetadata_name_formatter
(sipmetadata)[source]¶ Default generator for the SIPMetadata filenames.
-
invenio_sipstore.archivers.utils.
secure_sipfile_name_formatter
(sipfile)[source]¶ Secure filename generator for the SIPFiles.
Since the filenames can be potentially dangerous, not compatible with the underlying file system, or not portable across operating systems this formatter writes the files as a generic name: UUID-<secure_filename>, where <secure_filename> is the original filename which was stripped from any malicious parts (UNIX directory navigation ‘.’, ‘..’, ‘/’), special protocol parts (‘ftp://‘, ‘http://‘), special device names on Windows systems, etc. and for maximum portability contains only ASCII characters. Since this operation can cause name collisions, the UUID of the underlying FileInstance is appended as prefix of the filename. For more information on the
secure_filename
function visit:http://werkzeug.pocoo.org/docs/utils/#werkzeug.utils.secure_filename
Archivers¶
Archivers for SIPStore module.
An archiver is an controller that can serialize a SIP to disk according to a specific format. Currently Invenio-SIPStore comes with a BagIt archiver that can write packages according to “The BagIt File Packaging Format (V0.97)”.
New formats can be implemented by subclassing
BaseArchiver
.
Base archiver¶
Base archiver for SIPs.
The base archiver implements a basic API that allows subclasses to not having to worry about e.g. files to disk.
-
class
invenio_sipstore.archivers.base_archiver.
BaseArchiver
(sip, data_dir=u'files', metadata_dir=u'metadata', extra_dir=u'', storage_factory=None, filenames_mapping_file=None)[source]¶ Base archiver.
The archiving is done in two steps:
- Generation of a list containing file information which contains all relevant information for writing down each file.
- Actual IO operation on the storage, which takes the previously generated list as input and writes it down to disk.
The first step contains all archiver specific information on the archive structure and all relevant archive metadata that is to be written in addition to the “core” files, which are
SIPFile
andSIPMetadata
files.The first step does not produce any side effects to the system. Specific archivers which inherit from this class are expected to primarily overwrite the
BaseArchiver.get_all_files()
method to implement the, archiver-specific structure and any additional archived files.Relevant public method:
Relevant protected methods:
The second step writes down the generated file information to disk using the configured storage class. By default it uses the file storage factory specified in
SIPSTORE_FILE_STORAGE_FACTORY
configuration variable. This behaviour is overwritable bystorage_factory
parameter that can be provided to the constructor of this class.Relevant public method:
Relevant protected methods:
Base archiver constructor.
Parameters: - sip (
invenio_sipstore.api.SIP
) – the SIP to archive. - data_dir – Subdirectory in archive where the SIPFiles will be written.
- metadata_dir – Subdirectory in archive where the SIPMetadata files will be written.
- extra_dir – Subdirectory where all any extra files, that are specific to an archive standard, should be written.
- storage_factory – Storage factory used to create a new storage class instance.
- filenames_mapping_file – Mapping of file names.
-
_generate_extra_info
(content, filename)[source]¶ Generate the file information dictionary from a raw content.
-
_generate_sipmetadata_info
(sipmetadata)[source]¶ Generate the file information dictionary from a SIP metadata.
-
_get_data_files
()[source]¶ Get the file information for all the data files.
The structure is defined by the JSON Schema
sipstore/file-v1.0.0.json
.Returns: list of dict containing file information.
-
_get_extra_files
(data_files, metadata_files)[source]¶ Get file information on any additional files in the archive.
Return any additional files that are to be written. If
filenames_mapping_file
was set in the constructor, this method will generate a file containing the SIP filenames mapping.The structure is defined by the JSON Schema
sipstore/file-v1.0.0.json
.Parameters: - data_files – File information on the SIP files.
- metadata_files – File information on the SIP metadata files
Returns: list of dict containing any additional files information.
-
_get_metadata_files
()[source]¶ Get the file information for the metadata files.
The structure is defined by the JSON Schema
sipstore/file-v1.0.0.json
.Returns: list of dict containing file information.
-
_get_sipfile_filename_mapping
(filesinfo)[source]¶ Generate filename mapping for SIPFiles.
Due to archive file system specific issues, security reasons and archive package portability reasons, one might want to write down the SIP file under a different name than the one that was provided in the system (often by the user). In that case it is important to generate a mapping file between the original
invenio_sipstore.models.SIPFile.filepath
entries and the archived filenames. It is important to include this mapping in the archive ifSIPSTORE_ARCHIVER_SIPFILE_NAME_FORMATTER
was set to anything other than the default formatter.See
default_sipfile_name_formatter()
andsecure_sipfile_name_formatter()
.
-
_write_extra
(fileinfo=None, content=None, filename=None)[source]¶ Write any extra file to the archive.
Requires EITHER `fileinfo` or (`content` AND `filename`).
Parameters:
-
_write_sipfile
(fileinfo=None, sipfile=None)[source]¶ Write a SIP file to disk.
*Requires either fileinfo or sipfile to be passed.
Parameter fileinfo with the file information (‘file_uuid’ key required) or sipfile - the
SIPFile
instance, in which case the relevant file information will be generated on the spot.Parameters: - fileinfo (dict) – File information on the SIPFile that is to be written.
- sipfile (
invenio_sipstore.models.SIPFile
) – SIP file to be written.
-
get_all_files
()[source]¶ Get the complete list of files in the archive.
Returns: the list of all relative final path
-
get_archive_base_uri
()[source]¶ Get the base URI (absolute path) for the archive location.
To configure the URI, specify the relevant configuration variable
SIPSTORE_ARCHIVER_LOCATION_NAME
, with the name of theLocation
object which will be used as the archive base URI.Returns the absolute path to the archive location, e.g.:
/data/archive/
root://eospublic.cern.ch//eos/archive
-
get_archive_subpath
()[source]¶ Generate the relative directory path of the archived SIP.
The behaviour of this method can be changed by changing the
SIPSTORE_ARCHIVER_DIRECTORY_BUILDER
configuration variable.Generates the relative directory path for the archived SIP, which should be unique for given SIP and is usually built from the SIP information and/or its assigned objects, e.g.:
/ab/cd/ab12-abcd-1234-dcba-123412341234
(3-level chunk of SIP UUID identifier)./12345/r/5
(/<PID value>/r/<record revision id>)
The return value of this method is a location that is relative to the base archive URI, the full path that is constructed later can look as follows: (based on examples from
BaseArchiver.get_archive_base_uri()
):/data/archive/ab/cd/ab12-abcd-1234-dcba-123412341234
root://eospublic.cern.ch//eos/archive/12345/r/5
-
get_fullpath
(filepath)[source]¶ Generate the absolute (full path) to the file in the archive system.
Parameters: filepath (str) – path to the file, relative to archive subdirectory e.g. data/myfile.dat
.Returns: Absolute path, e.g. root://eospublic.cern.ch//eos/archive/12345/data/myfile.dat
Return type: str
-
write_all_files
(filesinfo=None)[source]¶ Write all files to the archive.
The only parameter of this method filesinfo is a list of dict, each containing information on the files that are to be written. There are three types of file-information dict that are recognizable:
- SIPFile-originated, which copy the related FileInstance bytes.
- SIPMetadata-originated, which write down the content of the metadata to the archive.
- Extra files, which writes down short text files, that are usually specific to the archiver format, e.g.: manifest file, README, archive creation timestamp, etc.
By the default when ‘filesinfo’ is omitted, the base archiver will generate the file info for all attached SIPFiles and SIPMetadata files (but only those which SIPMetadata.type.name was specified in the SIPSTORE_ARCHIVER_METADATA_TYPES). Specific archivers are expected to overwrite the self.get_all_files method, or craft the filesinfo parameter of this method externally.
For more information on the structure of the file-info dict, see JSON Schema: invenio_sipstore.jsonschemas.sipstore.file-v1.0.0.json.
Parameters: filesinfo – A list of dict, specifying the file information that is to be written down to the archive. If not specified, will execute the self.get_all_files to build the files list.
BagIt archiver¶
Archivers for SIP.
-
class
invenio_sipstore.archivers.bagit_archiver.
BagItArchiver
(sip, data_dir=u'data/files', metadata_dir=u'data/metadata', extra_dir=u'', patch_of=None, include_all_previous=False, tags=None, filenames_mapping_file=u'data/filenames.txt')[source]¶ BagIt archiver for SIPs.
Archives the SIP in the BagIt archive format (v0.97). For more information on the BagIt standard visit: https://tools.ietf.org/html/draft-kunze-bagit
Constructor of the BagIt Archiver.
When specifying ‘patch_of’ parameter the ‘include_all_previous’ flag determines whether files that are missing in the archived SIP (w.r.t. the SIP specified in ‘patch_of’) should be treated as explicitly deleted (include_all_previous=False) or if they should still be included in the manifest.
- Example:
- include_all_previous = True
- SIP_1:
- SIPFiles: a.txt, b.txt BagIt Manifest: a.txt, b.txt
- SIP_2 (Bagged with patch_of=SIP_1):
- SIPFiles: b.txt, c.txt BagIt Manifest: a.txt, b.txt, c.txt fetch.txt: a.txt, b.txt
- include_all_previous = False
- SIP_1:
- SIPFiles: a.txt, b.txt BagIt Manifest: a.txt, b.txt
- SIP_2 (Bagged with patch_of=SIP_1):
- SIPFIles: b.txt, c.txt BagIt Manifest: b.txt, c.txt fetch.txt: b.txt
Parameters: - sip (
invenio_sipstore.api.SIP
orinvenio_sipstore.models.SIP
) – API instance of the SIP that is to be archived. - data_dir – directory where the SIPFiles will be written.
- metadata_dir – directory where the SIPMetadata will be written.
- extra_dir – directory where all extra files will be written, including the BagIt-specific files.
- patch_of (
invenio_sipstore.api.SIP
orinvenio_sipstore.models.SIP
) – Write a ‘lightweight’ bag, which will archive only the new SIPFiles, and refer to the repeated ones in “fetch.txt” file. The provided argument is a SIP API, which will be taken as a base for determining the “diff” between two bags. - tags – a list of 2-tuple containing the tags of the bagit, which will be written to the ‘bag-info.txt’ file.
- filenames_mapping_file – filepath of the file in the archive which contains all of SIPFile mappings. If this parameter is boolean-resolvable as False, the file will not be created.
-
classmethod
_get_bagit_metadata_type
()[source]¶ Return the SIPMetadataType for the BagIt metadata files.
-
static
_get_checksum
(checksum, expected=u'md5')[source]¶ Return the checksum if the type is the expected.
-
archiver_version
= u'SIPBagIt-v1.0.0'¶ Specification of the SIP bag structure.
This name will be formatted as
External-Identifier
tag:External-Identifier: <SIP.id>/<archiver_version>
-
bagit_metadata_type_name
= u'bagit'¶ Name of the SIPMetadataType for internal use of BagItArchiver.
-
get_bagit_file
()[source]¶ Create the bagit.txt file which specifies the version and encoding.
Returns: File information dictionary Return type: dict
-
classmethod
get_bagit_metadata
(sip, as_dict=False)[source]¶ Fetch the BagIt metadata information (SIPMetadata).
Parameters: sip – SIP for which to fetch the metadata. Returns: Return the BagIt metadata information (SIPMetadata) instace or None if the object does not exist.
-
get_manifest_file
(filesinfo)[source]¶ Create the manifest file specifying the checksum of the files.
Returns: the name of the file and its content Return type: tuple
-
get_tagmanifest_file
(filesinfo)[source]¶ Create the tagmanifest file using the files info.
Returns: the name of the file and its content Return type: tuple