This document explains the ontology for the Workflow Motif catalogue described in [Workflow Catalogue]. The catalogue highlights the results obtained from a manual analysis performed over a set of real-world scientific workflows from Taverna [Taverna], Wings [Wings], Galaxy [Galaxy] and Vistrails [Vistrails]. Workflow Motifs outline the kinds of data-intensive activities that are observed in workflows (data-operation motifs) and the different manners in which activities are implemented within workflows (workflow-oriented motifs). These motifs are helpful to identify the functionality of the steps in a given workflow, to develop best practices for workflow design, and to develop approaches for automated generation of workflow abstractions
The latest OWL encoding of the Workflow Motifs Ontology can be found here
Most of the content displayed in this document has been retrieved from [Workflow Catalogue].
Scientific workflows have been increasingly used in the last decade as an instrument for data intensive science. Workflows serve a dual function: first, as detailed documentation of the scientific method used for an experiment (i. e. the input sources and processing steps taken for the derivation of a certain data item), and second, as re-usable, executable artifacts for data-intensive analysis. Scientific workflows are composed of a variety of data manipulation activities such as Data Movement, data transformation, Data Analysis and Data Visualization to serve the goals of the scientific study. The composition is done through the constructs made available by the workflow system used, and is largely shaped by the function undertaken by the workflow and the environment in which the system operates.
A major difficulty in understanding workflows is their complex nature. A workflow may contain several scientifically-significant analysis steps, combined with other Data Preparation or result delivery activities, and in different implementation styles depending on the environment and context in which the workflow is executed. This difficulty in understanding stands in the way of reusing workflows.
As a first step towards addressing this issue [Workflow Catalogue] describes a catalogue of domain independent conceptual abstractions for workflow steps called scientific Workflow Motifs. The catalogue was built based on an empirical analysis performed over 260 workflow descriptions from Taverna [Taverna], Wings [Wings], Galaxy [Galaxy] and Vistrails [Vistrails]. Motifs are provided through i) a characterization of the kinds of data-operation activities that are carried out within workflows, which are referred to as data-operation motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented within workflows, referred to as workflow-oriented motifs.
This document specifies the classes and properties of the Workflow Motifs ontology, the OWL 2 encoding ot the aforementioned motif catalogue. The goal of this ontology is to provide the means to annotate workflows and their steps with the motifs of the vocabulary, without setting any restriction on how the workflows are defined themselves.
|ex||A prefix used for the examples. It could be, for instance, <http:example.org#>|
The classes and properties of the ontology can be found below. It is important to note that in order to keep the ontology simple the properties that link a workflow step with a motif have no domain specified. This decission has been taken due to the possibilities of representing workflows. How workflows are represented is out of the scope of this ontology. Examples on how to annotate workflows can be found in section 3.1.
The classes identify the different motifs obtained after performing the manual analysis. They correspond to those shown in the previous hierarchy.
The object properties relate a workflow or step of a workflow to the motif or motifs describing its functionality.
The goal of this document is to define an OWL 2 ontology for the motif catalogue proposed in [Workflow Catalogue]. The Workflow Motif ontology provides the means to annotate scientific workflows, helping creators and curators to describe their functionality and facilitating the search of workflows with a particular purpose (e.g., retrieving workflows with merging, analysis and filtering steps).
The Workflow Motif ontology can also be used to annotate other types of scientific processes such as laboratory protocols or business workflows. Since these scientific processes may be defined according to different models, we define no domain for the object properties that bind a motif to a process (or a workflow step).
Figure 1 shows an example of how the Workflow Motif ontology can be used, considering a workflow specification that has workflows (ex:Workflow) and workflow steps (ex:WorkflowStep). In this case we also use the notion of processes (ex:Process) to identify a generic class used to refer to both workflow steps and workflows. To annotate a workflow, the user can associate it with the corresponding Workflow Motif using the wfm:hasWorkflowMotif property. Similarily, to annotate a given workflow step the user can associate it with the corresponding data operation motif using the wfm:hasDataOperationMotif property. Finally, in order to simplify the annotation process, the user may use the more general property wfm:hasMotif to associate a workflow or a workflow step to a motif (represented by the class wfm:Motif).
Section 3.1.1 and Section 3.1.2 show how to annotate workflows from Taverna and Wings with the Workflow Motif ontology. Both systems use different workflow specification ontologies (wfdesc [Wfdesc] and OPMW [OPMW]) but they are easily annotated with the Workflow Motif ontology.
Figure 2 shows a workflow created in the Taverna workflow system for functional genomics, where different motifs have been identified in the workflow steps (dotted boxes). Three processes are stateful invocations of web services (getJobState, sleep and warp2D), two are moving data to external servers (DataUpload and DownloadResults), one performs the data analysis of the workflow (warp2D) and one augments the input for the warp2D from several input parameters of the workflow (warp2D_input).
The workflow is defined according to the wfdesc ontology [Wfdesc], where all the workflow steps
are encoded as
wfdesc:Processes, and have associated a different URI. Each
wfdesc:Process has one or more inputs and produces an output.
More details about the ontology are available on the wfdesc specification web page.
Figure 3 illustrates how to annotate four of the processors of the workflow in Figure 2 with their correspondent motif instances (the rest have
been excluded for simplicity). Each processor (of type
wfdesc:Process) is taken as subject for the wfm:hasDataOperationMotif or
wfm:hasWorkflowMotif properties, which link them to the appropriate instances of the motifs of the catalog. In this example the motif instances
are blank nodes, but any other identifier would be valid as well.
Figure 4 shows a workflow created in the Wings workflow system for performing a ligand binding sites comparison of the inputs. Eight motifs have been identified in the workflow: two perform the data anlysis of the input datasets (both instances of SMAPV2), two sort the obtained results (both instances of ResultSorter), two merge both branches of the workflow (Merger and SMAPAlignementMerger) and two identify a repetitive sequence (the sequence SMAPV2 plus ResultSorter occur two times).
The workflow is defined according to the OPMW ontology [OPMW], where each step of the workflow is defined as
opmw:WorkflowTemplateProcess uses one or more inputs and produces an output. More information about the
OPMW ontology can be found in the ontology specification web page.
Figure 5 shows how some of the motifs identified in Figure 4 can be annotated with the Workflow Motif ontology
(the rest have been ommited for clarity). The way the annotations are performed is similar to the method followed in Section 3.1.1.
opmw:WorkflowTemplateProcess is bound to a motif instance with the wfm:hasDataOperationMotif or the wfm:hasWorkflowMotif object properties.
A special case is the binding to the internal macro (_:mtf3), where the subject of the property is the sequence of the :SMAP_V2_1 and
the :SMAPResultSorter_1 steps. In this case a named graph is used to group the steps (:namedGraph1) and bind them to the internal macro motif instance (_:mtf3).
We would like to thank Silvio Peroni for developing the LODE framework, partially used in for the cross reference section of this document and Raul Alcazar and Miguel Angel García Delgado for their technical support.