Title: | R Provenance Tracking |
---|---|
Description: | Provide methods to record data provenance about R script executions. Provenance data includes files that were read and written by the script, along with information about the execution, such as start time end time, the R modules loaded during the execution, and other information describing the execution environment. |
Authors: | Peter Slaughter [aut, cre], Matthew Jones [aut], Chris Jones [ctb], Lauren Palmer [ctb] |
Maintainer: | Peter Slaughter <[email protected]> |
License: | Apache License 2.0 |
Version: | 1.0.3.9000 |
Built: | 2024-10-30 05:01:58 UTC |
Source: | https://github.com/NCEAS/recordr |
Change the recordr home directory
changeHome(recordr, currentDir, newDir = as.character(NA), copy, ...)
changeHome(recordr, currentDir, newDir = as.character(NA), copy, ...)
recordr |
A recordr object |
currentDir |
A character value specifying the current recordr home directory |
newDir |
A character value, specifying the new recordr home directory |
copy |
A logical value. A value of TRUE causes data to be copied from the old |
... |
Additional arguments directory to the new one. A default value is not set. |
Create a coverage element
coverageElement(gc, tempc)
coverageElement(gc, tempc)
gc |
An EML::geographicCoverage object |
tempc |
A EML::temporalCoverage object |
An EML::Coverage object
The execution metadata and all archived files associated with each matching run are permanently deleted from the file system. No backup is maintained by the recordr package, so this deletion is irreversible, unless the user maintains their own backup.
deleteRuns(recordr, ...) ## S4 method for signature 'Recordr' deleteRuns(recordr, id = as.character(NA), file = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.integer(NA), noop = FALSE, quiet = FALSE)
deleteRuns(recordr, ...) ## S4 method for signature 'Recordr' deleteRuns(recordr, id = as.character(NA), file = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.integer(NA), noop = FALSE, quiet = FALSE)
recordr |
A Recordr instance |
... |
additional arguments |
id |
An execution identifier |
file |
The name of script to match. |
start |
A one or two element character list specifying a date range to match for run start time |
end |
A one or tow element character list specifying a date range to match for run end time |
tag |
The text of the tags to match. |
error |
The text of the error message to match. |
seq |
The run sequence number (can be a single value or a range, e.g |
noop |
Don't delete any date, just show what would be deleted. |
quiet |
A |
A data.frame containing execution metadata for the runs that were deleted.
Recordr
class description
startRecord()
The recordring session started by the startRecord()
method is
terminated and all provenance collecting is discontinued. A log of all the
console commands is saved.
endRecord(recordr) ## S4 method for signature 'Recordr' endRecord(recordr)
endRecord(recordr) ## S4 method for signature 'Recordr' endRecord(recordr)
recordr |
A Recordr instance |
id The execution identifier that uniquely identifiers this execution.
Recordr
class description
## Not run: rc <- new("Recordr") startRecord(rc, tag="my first console run") x <- read.csv(file="./test.csv") runIdentifier <- endRecord(rc) ## End(Not run)
## Not run: rc <- new("Recordr") startRecord(rc, tag="my first console run") x <- read.csv(file="./test.csv") runIdentifier <- endRecord(rc) ## End(Not run)
A class representing a script execution with the run manager
executionId
A character
containing the unique indentifier for this execution.
metadataId
A character
containing the unique identifier for the associated metadata object.
tag
A character
vector containing text associated with this execution.
datapackageId
A character
containing the unique identifier for an uploaded package.
user
A character
containing the user name that ran the execution.
subject
A character
containing the user identity that uploaded the package.
hostId
A character
containing the host identifier to which the package was uploaded.
startTime
A character
containing a the start time of the execution.
operatingSystem
A character
continaing the operating system name.
runtime
A character
containing R build and version information.
softwareApplication
A character
containing the software application used, e.g. ("R")
moduleDependencies
A character
containing the modules used by the software application.
endTime
A character
containing the end time of the execution.
errorMessage
A character
containing any error messages captured during the execution.
publishTime
A character
containing the time that the execution package was uploaded to a repository.
publishNodeId
A character
containing the node name that the execution was published to.
publishId
A character
containing the identifier for the uploaded package.
console
A logical
indicating whether this was a console session, i.e. startRecord() -> endRecord()
seq
A integer
containing a simple integer value associated with the exection.
initialize
: Initialize an execution metadata object
readExecMeta
: Retrieve saved Execution metadata.
writeExecMeta
: Save a single execution metadat.
updateExecMeta
: Update saved execution metadata.
slaughter
recordr
package description.
A class containing information about a file or group of files
This class is used internally by the recordr package.
fileId
a character
containing the unique identifier for the file entry
executionId
a characgter
containing the identifier associated with the file entry
filePath
a character
containing the location of the file
sha256
a character
containign the check of the file
size
a numeric
containing the size fo the file
user
a character
containing the user associated with the file entry.
createTime
a character
containing the file creation time.
modifyTime
a character
containing the file modification time.
access
a character
containing the type of access made to the file ("read", "write", "execute")
format
a character
containing the file format (e.g. "text/csv")
archivedFilePath
a character
containing the location of the archived file
initialize
: Initialize a FileMetadata object
readFileMeta
: Retrieve saved file metadata for one or more files
writeFileMeta
: Save metadata for a single file.
recordr
package description.
Create a geographic coverage element from a description and bounding coordinates
geoCoverage(geoDescription, west, east, north, south)
geoCoverage(geoDescription, west, east, north, south)
geoDescription |
a character string containing the description of the geogragraphic covereage |
west |
a character string containing the western most coordinate of the coverage (ex. "-134.32") |
east |
a character string containing the eastern most coordinate of the coverage (ex. "-120.42") |
north |
a character string containing the northern most coordinate of the coverage (ex. "34.32") |
south |
a character string containing the southern ost coordinate of the coverage (ex. "30.14") |
Get a database connection
getDBconnection(dbFile)
getDBconnection(dbFile)
dbFile |
the path to the recordr database file (default: ~/.recordr/recordr.sqlite) |
When a script or console session is recorded (see record() and startrecord()),
a metadata object is created that describes the objects associated with the run, using the
Ecological Metadata Language https://knb.ecoinformatics.org/#external//emlparser/docs/index.html.
This metadata can be retrieved from the recordr cache for review or editing if desired. If the metadata
is updated, it can be re-inserted into the recordr cache using the putMetadata
method.
getMetadata(recordr, ...) ## S4 method for signature 'Recordr' getMetadata(recordr, id = as.character(NA), seq = as.character(NA), as = as.character("text"))
getMetadata(recordr, ...) ## S4 method for signature 'Recordr' getMetadata(recordr, id = as.character(NA), seq = as.character(NA), as = as.character("text"))
recordr |
a Recordr instance |
... |
additional parameters
seealso |
id |
The identifier for a run |
seq |
The sequence number for a run |
as |
Form to return the metadata as. Possible values are: "text", "parsed" (for parsed XML), or "EML" (for an EML R package S4 object) |
A character vector containing the metadata
Initialize an execution metadata object
## S4 method for signature 'ExecMetadata' initialize(.Object, executionId = as.character(NA), metadataId = as.character(NA), tag = as.character(NA), datapackageId = as.character(NA), user = as.character(NA), subject = as.character(NA), hostId = as.character(NA), startTime = as.character(NA), operatingSystem = as.character(NA), runtime = as.character(NA), moduleDependencies = as.character(NA), programName = as.character(NA), endTime = as.character(NA), errorMessage = as.character(NA), publishTime = as.character(NA), publishNodeId = as.character(NA), publishId = as.character(NA), console = FALSE, seq = as.integer(0))
## S4 method for signature 'ExecMetadata' initialize(.Object, executionId = as.character(NA), metadataId = as.character(NA), tag = as.character(NA), datapackageId = as.character(NA), user = as.character(NA), subject = as.character(NA), hostId = as.character(NA), startTime = as.character(NA), operatingSystem = as.character(NA), runtime = as.character(NA), moduleDependencies = as.character(NA), programName = as.character(NA), endTime = as.character(NA), errorMessage = as.character(NA), publishTime = as.character(NA), publishNodeId = as.character(NA), publishId = as.character(NA), console = FALSE, seq = as.integer(0))
.Object |
The ExecMetada object |
executionId |
a |
metadataId |
a |
tag |
A character vector that describes this execution. |
datapackageId |
a |
user |
a |
subject |
a |
hostId |
a |
startTime |
a |
operatingSystem |
a |
runtime |
a |
moduleDependencies |
a |
programName |
a |
endTime |
a |
errorMessage |
a |
publishTime |
a |
publishNodeId |
a |
publishId |
a |
console |
a |
seq |
an |
ExecMetadata
class description
Initialize a file metadata object.
## S4 method for signature 'FileMetadata' initialize(.Object, file, fileId = as.character(NA), sha256 = as.character(NA), size = as.numeric(0), user = as.character(NA), createTime = as.character(NA), modifyTime = as.character(NA), executionId, access = as.character(NA), format = as.character(NA), archivedFilePath = as.character(NA))
## S4 method for signature 'FileMetadata' initialize(.Object, file, fileId = as.character(NA), sha256 = as.character(NA), size = as.numeric(0), user = as.character(NA), createTime = as.character(NA), modifyTime = as.character(NA), executionId, access = as.character(NA), format = as.character(NA), archivedFilePath = as.character(NA))
.Object |
a |
file |
a |
fileId |
a |
sha256 |
a |
size |
a |
user |
a |
createTime |
a |
modifyTime |
a |
executionId |
a |
access |
|
format |
a |
archivedFilePath |
a |
This method is used internally by the recordr package.
FileMetadata
class description
Initialize a provenance relationship object.
## S4 method for signature 'ProvRels' initialize(.Object, executionId = as.character(NA), subject = as.character(NA), predicate = as.character(NA), object = as.character(0), subjectType = as.character(NA), objectType = as.character(NA), dataTypeURI = as.character(NA))
## S4 method for signature 'ProvRels' initialize(.Object, executionId = as.character(NA), subject = as.character(NA), predicate = as.character(NA), object = as.character(0), subjectType = as.character(NA), objectType = as.character(NA), dataTypeURI = as.character(NA))
.Object |
a |
executionId |
a |
subject |
a |
predicate |
a |
object |
a |
subjectType |
a |
objectType |
a |
dataTypeURI |
a |
This method is used internally by the recordr package.
ProvRels
class description
Initialize a Recorder object
## S4 method for signature 'Recordr' initialize(.Object, newDir = as.character(NA), copy = TRUE, ...)
## S4 method for signature 'Recordr' initialize(.Object, newDir = as.character(NA), copy = TRUE, ...)
.Object |
The Recordr object |
newDir |
The recordr home directory is changed to the new location. |
copy |
A logical value: if TRUE and |
... |
Additional parameters |
A recordr object is returned that can be used with other recordr
package
methods. When the optional newDir
argument is used, the recordr home directory is
changed to the new value. The default behaviour is to have data copied from the old
home directory to the new one, but this can be changed by using the copy
argument, i.e.
See the recordr vignette 'recordr Package Introduction'
for more information about
information that recordr stores in the recordr home directory.
Recordr
class description
If no search terms are specified, then all runs are listed. The method arguments are search terms that limit the runs listed, with anly runs listed that match all arguments.
listRuns(recordr, ...) ## S4 method for signature 'Recordr' listRuns(recordr, id = as.character(NA), script = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.character(NA), orderBy = "-startTime", quiet = FALSE, full = FALSE)
listRuns(recordr, ...) ## S4 method for signature 'Recordr' listRuns(recordr, id = as.character(NA), script = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.character(NA), orderBy = "-startTime", quiet = FALSE, full = FALSE)
recordr |
A Recordr instance |
... |
additional parameters |
id |
a |
script |
|
start |
|
end |
a |
tag |
|
error |
|
seq |
|
orderBy |
The column that will be used to sort the output. This can include a minus sign before the name, e.g. -startTime |
quiet |
A |
full |
A |
The "start"
and "end"
parameters can be used to specify a time
range to find runs that started execution and ended in the specified time range. For examples, specifying
"start=c("2015-01-01, "2015-01-31")
will cause the search to return any execution with a starting
time in the first month of 2015.
data frame containing information for each run
Recordr
class description
## Not run: rc <- new("Recordr") # List runs that started in January 2015 listRuns(rc, start=c("2015-01-01", "2015-01-31")) # List runs that started on or after March 1, 2014 listruns(rc, start="2014-03-01") # List runs that contain a tag with the string "analysis v1.3") listRuns(rc, tag="analysis v1.3") ## End(Not run)
## Not run: rc <- new("Recordr") # List runs that started in January 2015 listRuns(rc, start=c("2015-01-01", "2015-01-31")) # List runs that started on or after March 1, 2014 listruns(rc, start="2014-03-01") # List runs that contain a tag with the string "analysis v1.3") listRuns(rc, tag="analysis v1.3") ## End(Not run)
An EML document is create from the values passed in.
makeEML(recordr, id, system, title, creators, abstract = NA, methodDescription = NA, geo_coverage = NA, temp_coverage = NA, endpoint = NA)
makeEML(recordr, id, system, title, creators, abstract = NA, methodDescription = NA, geo_coverage = NA, temp_coverage = NA, endpoint = NA)
recordr |
A Recordr object. |
id |
The identifier for the EML document. |
system |
The system for the document. |
title |
The document title. |
creators |
A list of creator elements. |
abstract |
The document abstract. |
methodDescription |
The dataset method description. |
geo_coverage |
The geographic coverage element. |
temp_coverage |
The temporal coverage element. |
endpoint |
The online distribution URL. |
A data processing workflow might include multiple processing steps, with
each step being performed by a separate R script. These multiple steps are linked by
the files that one step writes and the next step in the workflow reads. The plotRuns
method finds these connections between executions to determine the executions that
comprise a processing workflow.
plotRuns(recordr, ...) ## S4 method for signature 'Recordr' plotRuns(recordr, id = as.character(NA), file = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.character(NA), orderBy = "-startTime", direction = "both", quiet = TRUE, ...)
plotRuns(recordr, ...) ## S4 method for signature 'Recordr' plotRuns(recordr, id = as.character(NA), file = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.character(NA), orderBy = "-startTime", direction = "both", quiet = TRUE, ...)
recordr |
a Recordr instance |
... |
additional parameters |
id |
The identifier for a run. Either |
file |
The name of script to match |
start |
Match runs that started in this time range (inclusive) Times must be entered in the form 'YYYY-MM-DD HH:MM:SS' but can be shortened to not less that "YYYY" |
end |
Match runs that ended in this time range (inclusive) Times must be entered in the form 'YYYY-MM-DD HH:MM:SS' but can be shortened to not less that "YYYY" |
tag |
The text of tag to match |
error |
The text of error message to match. |
seq |
The sequence number for a run. #' @param id The execution identifier of a run to view |
orderBy |
Sort the results according to the specified column. A hypen ('-') prepended to the column name denoes a descending sort. The default value is "-startTime" |
direction |
The direction to trace the lineage, either |
quiet |
A |
If the run id
or seq
number is know for the run to be traced, then one or the
other of these values can be used. Alternatively, other run attributes can be used to determine the run to be traced,
such as file
, start
, etc. If these other search parameters are used and multiple runs are selected,
only the first run selected will be traced. These search parameters can be used together to easily find certain runs,
for example, the latest run of a particular script, the latest run with a specified tag specified, etc. (see examples).
A list of the execution identifiers that are in the processing workflow.
Recordr
class description
## Not run: # Plot processing workflow for the run with sequence number '101' plotRuns(recordr, seq=101) # Plot processing workflow for the last execution of script "runModel.R" plotRuns(recordr, file="runModel.R", orderBy="-startTime") # Plot processing workflow for the last execution with the tag 'best run yet!' specified. plotRuns(recordr, tag="best run yet!", orderBy="-startTime") ## End(Not run)
## Not run: # Plot processing workflow for the run with sequence number '101' plotRuns(recordr, seq=101) # Plot processing workflow for the last execution of script "runModel.R" plotRuns(recordr, file="runModel.R", orderBy="-startTime") # Plot processing workflow for the last execution with the tag 'best run yet!' specified. plotRuns(recordr, tag="best run yet!", orderBy="-startTime") ## End(Not run)
A class containing information about a file or group of files
This class is used internally by the recordr package.
executionId
a characgter
containing the identifier associated with the file entry
subject
a character
containing the subject of a provenance relationship
predicate
a character
containign the predicate of a provenance relationship
object
a character
containing the object of a provenance relationship
subjectType,
a character
containing the RDF node type of the the subject, values can be 'uri', 'blank'
objectType
a character
containign the RDF node type of the object, each value can be 'uri', 'blank', or 'literal'
dataTypeURI
The RDF data type that specifies the type of the object
initialize
: Initialize a ProvRels object
readProvRels
: Retrieve saved provenance relationships.
writeProvRel
: Save a provenance relationship.object
recordr
package description.
Publish a recordr'd execution to DataONE
publishRun(recordr, ...) ## S4 method for signature 'Recordr' publishRun(recordr, id = as.character(NA), seq = as.character(NA), assignDOI = FALSE, update = FALSE, quiet = TRUE, retPkg = FALSE)
publishRun(recordr, ...) ## S4 method for signature 'Recordr' publishRun(recordr, id = as.character(NA), seq = as.character(NA), assignDOI = FALSE, update = FALSE, quiet = TRUE, retPkg = FALSE)
recordr |
a Recordr instance |
... |
additional parameters
seealso |
id |
the run identifier for the execution to upload to DataONE |
seq |
The sequence number for the execution to upload to DataONE |
assignDOI |
a boolean value: if TRUE, assign DOI values for system metadata, otherwise assign uuid values |
update |
a boolean value: if TRUE, republish a previously published execution |
quiet |
A boolean value: if TRUE, informational messages are not printed (default=TRUE) |
retPkg |
A boolean value: if TRUE, then the package that was uploaded is returned, if FALSE then the identifier of the package is returned (default=FALSE). |
The published identifier of the uploaded package
Put a metadata document into the recordr cache for an run, replacing the existing metadata object for the specified run, if one exists.
putMetadata(recordr, ...) ## S4 method for signature 'Recordr' putMetadata(recordr, id = as.character(NA), seq = as.character(NA), metadata = as.character(NA), asText = TRUE)
putMetadata(recordr, ...) ## S4 method for signature 'Recordr' putMetadata(recordr, id = as.character(NA), seq = as.character(NA), metadata = as.character(NA), asText = TRUE)
recordr |
a Recordr instance |
... |
additional parameters |
id |
The identifier for a run |
seq |
The sequence number for a run |
metadata |
The replacement metadata, as the actual text, or as a filename containing the metadata |
asText |
A logical. See 'Details'.
If TRUE, then the |
The metadata
parameter can specify either a character vector that contains the metadata
this parameter can be a filename that contains the metadata. The asText
parameter is used to
specify which type of value is specified. If asText
is TRUE, then the metadata
parameter
is a character vector, if it is FALSE, then the metadata
parameter is a filename.
A character vector containing the metadata
Recordr
class description
Execution metadata is retrived from recordr database table _execmeta_ based on search parameters.
readExecMeta(recordr, ...) ## S4 method for signature 'Recordr' readExecMeta(recordr, executionId = as.character(NA), script = as.character(NA), startTime = as.character(NA), endTime = as.character(NA), tag = as.character(NA), errorMessage = as.character(NA), seq = as.integer(NA), orderBy = as.character(NA), sortOrder = "ascending", delete = FALSE, ...)
readExecMeta(recordr, ...) ## S4 method for signature 'Recordr' readExecMeta(recordr, executionId = as.character(NA), script = as.character(NA), startTime = as.character(NA), endTime = as.character(NA), tag = as.character(NA), errorMessage = as.character(NA), seq = as.integer(NA), orderBy = as.character(NA), sortOrder = "ascending", delete = FALSE, ...)
recordr |
A Recordr object |
... |
additional parameters |
executionId |
A character value that specifies an execution identifier to search for. |
script |
A character value that specifies a script name to search for. |
startTime |
A character value that specifies the start of a time range. This value must be entered in the form 'YYYY-MM-DD HH:MM:SS' but can be shortened to "YYYY-MM-DD" |
endTime |
A character value that specifies the end of a time to to search. This value must be entered in the form 'YYYY-MM-DD HH:MM:SS' but can be shortened to "YYYY-MM-DD" |
tag |
A tag value to search for |
errorMessage |
An execution error message to search for. |
seq |
An exectioin sequence nuber |
orderBy |
The column to sort the result set by. |
sortOrder |
The sort order. Values include "ascending", "descending". |
delete |
a |
The "startTime"
and "endTime"
parameters are used to specify a time
range to find runs that started execution between the start and end times that are specified.
A list of ExecMetadata objects
ExecMetadata
class description
File metadata is retrived from the recordr database table filemeta based on search parameters.
readFileMeta(recordr, ...) ## S4 method for signature 'Recordr' readFileMeta(recordr, fileId = as.character(NA), executionId = as.character(NA), filePath = as.character(NA), sha256 = as.character(NA), user = as.character(NA), access = as.character(NA), format = as.character(NA), orderBy = as.character(NA), sortOrder = "ascending", delete = FALSE, ...)
readFileMeta(recordr, ...) ## S4 method for signature 'Recordr' readFileMeta(recordr, fileId = as.character(NA), executionId = as.character(NA), filePath = as.character(NA), sha256 = as.character(NA), user = as.character(NA), access = as.character(NA), format = as.character(NA), orderBy = as.character(NA), sortOrder = "ascending", delete = FALSE, ...)
recordr |
A recordr object |
... |
Additional parameters |
fileId |
The id of the file to search for |
executionId |
A character value that specifies an execution identifier to search for. |
filePath |
The path name of the file to search for. |
sha256 |
The sha256 checksum value for the uncompressed file. |
user |
The user that ran the execution that created or accessed the file. |
access |
The type of access for the file. Values include "read", "write", "execute" |
format |
The format type of the object, e.g. "text/plain" |
orderBy |
The column to sort the result set by. |
sortOrder |
The sort type. Values include ("ascending", "descending") |
delete |
a |
This method is used internally by the recordr package.
A dataframe containing file metadata objects
FileMetadata
class description
File metadata is retrived from the recordr database table filemeta based on search parameters.
readProvRels(recordr, ...) ## S4 method for signature 'Recordr' readProvRels(recordr, executionId = as.character(NA), subject = as.character(NA), predicate = as.character(NA), object = as.character(NA), subjectType = as.character(NA), objectType = as.character(NA), dataTypeURI = as.character(NA), orderBy = as.character(NA), sortOrder = "ascending", delete = FALSE, ...)
readProvRels(recordr, ...) ## S4 method for signature 'Recordr' readProvRels(recordr, executionId = as.character(NA), subject = as.character(NA), predicate = as.character(NA), object = as.character(NA), subjectType = as.character(NA), objectType = as.character(NA), dataTypeURI = as.character(NA), orderBy = as.character(NA), sortOrder = "ascending", delete = FALSE, ...)
recordr |
A recordr object |
... |
Additional parameters |
executionId |
A character value that specifies an execution identifier to search for. |
subject |
The subject of the provenance relationships to match |
predicate |
The predicate of the provenance relationships to match |
object |
The object of the provenance relationships to match |
subjectType |
A character value containing the subject type of the relationship to match |
objectType |
A character value containing the object type of the relationship to match |
dataTypeURI |
A character value containing the data type of the relationship to match |
orderBy |
The column to sort the result set by. |
sortOrder |
The sort type. Values include ("ascending", "descending") |
delete |
a |
This method is used internally by the recordr package.
A dataframe containing file metadata objects
ProvRels
class description
The R script is executed and information about file reads and writes is recorded.
record(recordr, file, ...) ## S4 method for signature 'Recordr' record(recordr, file, tag = "", ...)
record(recordr, file, ...) ## S4 method for signature 'Recordr' record(recordr, file, tag = "", ...)
recordr |
a Recordr instance |
file |
The name of the R script to run and collect provenance information for |
... |
additional parameters that will be passed to the R |
tag |
A string that will be associated with this run |
Input files, the script itself and igenerated files are archived. Information about the execution environment is also saved.
The execution identifier for this run
Recordr
class description
## Not run: rc <- new("Recordr") executionId <- record(rc, file="myscript.R", tag="first run of myscript.R") ## End(Not run)
## Not run: rc <- new("Recordr") executionId <- record(rc, file="myscript.R", tag="first run of myscript.R") ## End(Not run)
The R package recordr provides methods to easily record data provenance about R script executions, such as the files that were read and written by the script, along with information about the execution, such as start time end time, the R modules loaded during the execution, etc. This provenance information along with any files created by the script can then be combined into a data package and uploaded to a data repository such as DataONE.
An overview of the recordr package is available with the R command: 'vignette("recordr_overview")'
.
Recordr
: A class containing methods to record, review and publish data provenance
Peter Slaughter (NCEAS), Matthew B. Jones (NCEAS), Christopher Jones (NCEAS)
## Not run: # This example shows how to record provenance for an R script and view the recorded information. library(recordr) rc <- new("Recordr") record(rc, "./myScript.R", tag="Simple script recording #1") listRuns(rc, tag="recording #1") viewRuns(rc, tag="recording #1") ## End(Not run)
## Not run: # This example shows how to record provenance for an R script and view the recorded information. library(recordr) rc <- new("Recordr") record(rc, "./myScript.R", tag="Simple script recording #1") listRuns(rc, tag="recording #1") viewRuns(rc, tag="recording #1") ## End(Not run)
Override the dataone::createOjbect method and record a provenance relationship for the object created.
recordr_createObject()
recordr_createObject()
This function is not intended to be called directly by a user.
Override the dataone::getObject method and record a provenance relationship for the object that was downloaded.
recordr_getObject()
recordr_getObject()
This function is not intended to be called directly by a user.
Override the ggplot2::ggsave function and record a provenance relationship for the file that was written.
recordr_ggsave()
recordr_ggsave()
This function is not intended to be called directly by a user.
Override the raster::raster function and record a provenance relationship for the file read.
recordr_raster()
recordr_raster()
This function is not intended to be called directly by a user.
Override the readr::read_csv function and record a provenance relationship for the file that was read.
recordr_read_csv()
recordr_read_csv()
... |
function parameters |
This function is not intended to be called directly by a user.
Override the utils::read.csv function and record a provenance relationship for the file that was read.
recordr_read.csv()
recordr_read.csv()
... |
function parameters |
This function is not intended to be called directly by a user.
Override the base::readLines function and record a provenance relationship for the file read.
recordr_readLines()
recordr_readLines()
This function is not intended to be called directly by a user.
Override the rgdal::readOGR function and record a provenance relationship for the file read.
recordr_readOGR()
recordr_readOGR()
This function is not intended to be called directly by a user.
Override the png::read function and record a provenance relationship for the file read.
recordr_readPNG()
recordr_readPNG()
This function is not intended to be called directly by a user.
Override the base::scan function and record a provenance relationship for the scanned file.
recordr_scan()
recordr_scan()
This function is not intended to be called directly by a user.
Override the dataone::updateObject method and record a provenance relationship for the object uploaded.
recordr_updateObject()
recordr_updateObject()
This function is not intended to be called directly by a user.
Override the readr::write_csv function and record a provenance relationship for the written file.
recordr_write_csv()
recordr_write_csv()
This function is not intended to be called directly by a user.
Override the utils::write.csv function and record a provenance relationship for the written file.
recordr_write.csv()
recordr_write.csv()
This function is not intended to be called directly by a user.
Override the base::writeLines function and record a provenance relationship for the file that was written.
recordr_writeLines()
recordr_writeLines()
This function is not intended to be called directly by a user.
Override the rgdal::writeOGR function and record a provenance relationship for the file that was written.
recordr_writeOGR()
recordr_writeOGR()
This function is not intended to be called directly by a user.
Override the png::write function and record a provenance relationship for the file that was written.
recordr_writePNG()
recordr_writePNG()
This function is not intended to be called directly by a user.
Override the raster::writeRaster function and record a provenance relationship for the file that was written.
recordr_writeRaster()
recordr_writeRaster()
The name of the output file
This function is not intended to be called directly by a user.
The Recordr class provides methods to record, search, review and publish data provenance about R script executions. Information about files read and written by a script and the execution environment can be captured for each script execution. Script executions can then be reviewed and selected to be published to the DataONE data repository, by retrieving archived copies of the R script, the files read and written by a script and a description of the provenance relationships between objects in the run, which are then combined into a package and uploaded to the requested member node.
recordrDir
value of type "character"
containing a path to the Recordr working directory
dbConn
A value of type "SQLiteConnection"
that contains the connection of the recordr database
dbFile
A valof of type "character"
that contains the location of the recordr database file
initialize
: Initialize a Recordr object
startRecord
: Begin recording provenance for an R session
endRecord
: Get the Identifiers of Package Members
record
: Get the data content of a specified data object
listRuns
: Output a list of recorded runs to the console
viewRuns
: Record relationships of objects in a DataPackage
deleteRuns
: Record derivation relationships between objects in a DataPackage
publishRun
: Upload all objects associated with a run to a repository
traceRuns
: Trace processing lineage by finding related executions.
plotRuns
: Trace processing lineage for a run and plot it.
recordr
package description.
This method is used to retrieve execution metadata for runs that match the search parameters.
selectRuns(recordr, ...) ## S4 method for signature 'Recordr' selectRuns(recordr, runId = as.character(NA), script = as.character(NA), startTime = as.character(NA), endTime = as.character(NA), tag = as.character(NA), errorMessage = as.character(NA), seq = as.integer(NA), orderBy = "-startTime", delete = FALSE)
selectRuns(recordr, ...) ## S4 method for signature 'Recordr' selectRuns(recordr, runId = as.character(NA), script = as.character(NA), startTime = as.character(NA), endTime = as.character(NA), tag = as.character(NA), errorMessage = as.character(NA), seq = as.integer(NA), orderBy = "-startTime", delete = FALSE)
recordr |
A Recordr instance |
... |
additional parameters |
runId |
An execution identifiers |
script |
The flle name of script to match. |
startTime |
Match executions that started after this time (inclusive) |
endTime |
Match executions that ended before this time (inclusive) |
tag |
The text of tag to match. |
errorMessage |
The text of error message to match. |
seq |
The run sequence number |
orderBy |
The column that will be used to sort the output. This can include a minus sign before the name, e.g. -startTime |
delete |
A logical value, if TRUE then the selected runs are deleted from the Recordr database. |
This method is used internally by the recordr package.
A data.frame that contains execution metadata for executions that matched the search criteria
Recordr
class description
Standardise a function call
standardizeCall(call, env = parent.frame())
standardizeCall(call, env = parent.frame())
call |
A call |
env |
Environment in which to look up call value. |
from Hadley Wicham's pryr standarize_call
This method starts the recording process and the method endRecord() completes it.
startRecord(recordr, ...) ## S4 method for signature 'Recordr' startRecord(recordr, tag = as.character(NA), .file = as.character(NA), .console = TRUE, log = as.character(NA))
startRecord(recordr, ...) ## S4 method for signature 'Recordr' startRecord(recordr, tag = as.character(NA), .file = as.character(NA), .console = TRUE, log = as.character(NA))
recordr |
a Recordr instance |
... |
additional parameters |
tag |
a string that is associated with this run |
.file |
the filename for the script to run (only used internally when startRecord() is called from record()) |
.console |
a logical argument that is used internally by the recordr package |
log |
A character string. If .console=TRUE, the file to log console commands to. The default is 'console.log'. |
The startRecord() method can be called from the R console to begin a recording session during which provenance is captured for any functions that are inspected by Recordr. This recordr session can be closed by calling the endRecord() method. When the record() function is called to record a script, the startRecord() function is called automatically.
execution identifier that uniquely identifies this recorded session
Recordr
class description
## Not run: rc <- new("Recordr") startRecord(rc, tag="my first console run") x <- read.csv(file="./test.csv") runIdentifier <- endRecord(rc) ## End(Not run)
## Not run: rc <- new("Recordr") startRecord(rc, tag="my first console run") x <- read.csv(file="./test.csv") runIdentifier <- endRecord(rc) ## End(Not run)
A data processing workflow might include multiple processing steps, with
each step being performed by a separate R script. These multiple steps are linked by
the files that one step writes and the next step in the workflow reads. The traceRuns
method finds these connections between executions to determine the executions that
comprise a processing workflow, and returns information for each run in the processing workflow
including all files that were read and written by each script.
traceRuns(recordr, ...) ## S4 method for signature 'Recordr' traceRuns(recordr, id = as.character(NA), file = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.character(NA), orderBy = "-startTime", direction = "both", quiet = TRUE, ...)
traceRuns(recordr, ...) ## S4 method for signature 'Recordr' traceRuns(recordr, id = as.character(NA), file = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.character(NA), orderBy = "-startTime", direction = "both", quiet = TRUE, ...)
recordr |
a Recordr instance |
... |
additional parameters |
id |
The identifier for a run. Either |
file |
The name of script to match |
start |
Match runs that started in this time range (inclusive) Times must be entered in the form 'YYYY-MM-DD HH:MM:SS' but can be shortened to not less that "YYYY" |
end |
Match runs that ended in this time range (inclusive) Times must be entered in the form 'YYYY-MM-DD HH:MM:SS' but can be shortened to not less that "YYYY" |
tag |
The text of tag to match |
error |
The text of error message to match. |
seq |
The sequence number for a run. #' @param id The execution identifier of a run to view |
orderBy |
Sort the results according to the specified column. A hypen ('-') prepended to the column name denoes a descending sort. The default value is "-startTime" |
direction |
The direction to trace the lineage, either |
quiet |
A |
If the run id
or seq
number is know for the run to be traced, then one or the
other of these values can be used. Alternatively, other run attributes can be used to determine the run to be traced,
such as file
, start
, etc. If these other search parameters are used and multiple runs are selected,
only the first run selected will be traced. These search parameters can be used together to easily find certain runs,
for example, the latest run of a particular script, the latest run with a specified tag specified, etc. (see examples).
A list of the execution identifiers that are in the processing workflow.
Recordr
class description
## Not run: # Trace lineage for the run with sequence number '101' linkedRuns <- traceRuns(recordr, seq=101) # Trace lineage for the last execution of script "runModel.R" linkedRuns <- traceRuns(recordr, file="runModel.R", orderBy="-startTime") # Trace lineage for the last execution with the tag 'best run yet!' specified. linkedRuns <- traceRuns(recordr, tag="best run yet!", orderBy="-startTime") ## End(Not run)
## Not run: # Trace lineage for the run with sequence number '101' linkedRuns <- traceRuns(recordr, seq=101) # Trace lineage for the last execution of script "runModel.R" linkedRuns <- traceRuns(recordr, file="runModel.R", orderBy="-startTime") # Trace lineage for the last execution with the tag 'best run yet!' specified. linkedRuns <- traceRuns(recordr, tag="best run yet!", orderBy="-startTime") ## End(Not run)
Remove a file from the recordr archive directory
unArchiveFile(recordr, fileId)
unArchiveFile(recordr, fileId)
recordr |
A Recordr object |
fileId |
The fileId to remove from the archive |
A logical value - TRUE if the file is remove, FALSE if not
This function is intended to run only during a record() session, i.e. the recordr environment needs to be available.
UPdate an existing execution metadata entry with the values supplied.
updateExecMeta(recordr, ...) ## S4 method for signature 'Recordr' updateExecMeta(recordr, executionId = as.character(NA), subject = as.character(NA), endTime = as.character(NA), errorMessage = as.character(NA), publishTime = as.character(NA), publishNodeId = as.character(NA), publishId = as.character(NA))
updateExecMeta(recordr, ...) ## S4 method for signature 'Recordr' updateExecMeta(recordr, executionId = as.character(NA), subject = as.character(NA), endTime = as.character(NA), errorMessage = as.character(NA), publishTime = as.character(NA), publishNodeId = as.character(NA), publishId = as.character(NA))
recordr |
A Recordr object |
... |
additional arguments |
executionId |
The execution id of the execution to be updated |
subject |
The authorized subject, i.e. from the client certificate. |
endTime |
The ending time of the exection. |
errorMessage |
An error message generated by the execution. |
publishTime |
The data and time that the execution was published |
publishNodeId |
The node identifier, e.g. "urn:node:testKNB" that the execution was published to. |
publishId |
The identifier that the execution was published with. In DataONE, this can be the identifier of the metadata object describing the datasets that were uploaded. |
Saved execution metadata is typically first stored when an execution begins, then updated at the end of a run (with error messages and ending time, for example). Also, excution can be updated when a run is published, with information about the publishing process.
ExecMetadata
class description
Update the recordr database to the current version
upgradeRecordr(recordr)
upgradeRecordr(recordr)
recordr |
A recordr object |
logical TRUE if the upgrade was successful, FALSE if a problem was encountered.
Detailed information for an execution is printed to the display.
viewRuns(recordr, ...) ## S4 method for signature 'Recordr' viewRuns(recordr, id = as.character(NA), file = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.character(NA), orderBy = "-startTime", sections = c("details", "used", "generated"), verbose = FALSE, page = TRUE, output = TRUE)
viewRuns(recordr, ...) ## S4 method for signature 'Recordr' viewRuns(recordr, id = as.character(NA), file = as.character(NA), start = as.character(NA), end = as.character(NA), tag = as.character(NA), error = as.character(NA), seq = as.character(NA), orderBy = "-startTime", sections = c("details", "used", "generated"), verbose = FALSE, page = TRUE, output = TRUE)
recordr |
A Recordr instance |
... |
additional parameter |
id |
The execution identifier of a run to view |
file |
The name of script to match |
start |
Match runs that started in this time range (inclusive) Times must be entered in the form 'YYYY-MM-DD HH:MM:SS' but can be shortened to not less that "YYYY" |
end |
Match runs that ended in this time range (inclusive) Times must be entered in the form 'YYYY-MM-DD HH:MM:SS' but can be shortened to not less that "YYYY" |
tag |
The text of tag to match |
error |
The text of error message to match. |
seq |
A run sequence number (can be a range, e.g |
orderBy |
Sort the results according to the specified column. A hypen ('-') prepended to the column name denoes a descending sort. The default value is "-startTime" |
sections |
Print the specified sections of the output. Default=c("details", "used", "generated") |
verbose |
a |
page |
A logical value - if TRUE then pause after each run is displayed. |
output |
a |
The execution and file information for runs that match the search criteria are
printed to the console. The output is divided into three sections: "details", "used"
and "generated". The "details" section shows execution information such as the start and end time
of the run, run identifier, etc. The "used" section lists files that were read by a run. The
"generated" section lists files that were created by a run. The list that is returned from "viewRuns"
contains two elements - a data.frame with the execution information, and a data.frame that contains
file information.
A list that contains information about all selected runs.
Recordr
class description
## Not run: rc <- new("Recordr") # View the tenth run that was recorded viewRuns(rc, seq=10) # View the first ten runs, with only the files "generated" section displayed info <- viewRuns(rc, seq="1:10", sections="generated") nrow(info$runs) nrow(info$files) ## End(Not run)
## Not run: rc <- new("Recordr") # View the tenth run that was recorded viewRuns(rc, seq=10) # View the first ten runs, with only the files "generated" section displayed info <- viewRuns(rc, seq="1:10", sections="generated") nrow(info$runs) nrow(info$files) ## End(Not run)
Save a single execution metadata.
writeExecMeta(recordr, ...) ## S4 method for signature 'Recordr' writeExecMeta(recordr, execMeta, ...)
writeExecMeta(recordr, ...) ## S4 method for signature 'Recordr' writeExecMeta(recordr, execMeta, ...)
recordr |
A Recordr object |
... |
Not yet used. |
execMeta |
an ExecMetadata object to save. |
ExecMetadata
class description
Metadata for a file is written to an RSQLite database.
writeFileMeta(recordr, fileMeta, ...) ## S4 method for signature 'Recordr,FileMetadata' writeFileMeta(recordr, fileMeta, ...)
writeFileMeta(recordr, fileMeta, ...) ## S4 method for signature 'Recordr,FileMetadata' writeFileMeta(recordr, fileMeta, ...)
recordr |
A recordr object |
fileMeta |
A fileMetadata object |
... |
(Not yet used) |
This method is used internally by the recordr package.
FileMetadata
class description
Metadata for a provenance relationship is written to the recordr RSQLite database.
writeProvRel(recordr, provRels, ...) ## S4 method for signature 'Recordr' writeProvRel(recordr, provRels, ...)
writeProvRel(recordr, provRels, ...) ## S4 method for signature 'Recordr' writeProvRel(recordr, provRels, ...)
recordr |
A recordr object |
provRels |
A ProvRels object. |
... |
(Not yet used) |
This method is used internally by the recordr package.
ProvRels
class description