This demo serves no other purpose than to illustrate the technical capabilities of the Memento Tracer Framework for creating high quality captures of web publications. It has been selected because it illustrates the ability of the framework to capture resources that would be very difficult, if not impossible, to capture using regular web crawling approaches.
https://github.com/gorilla/mux
.
The extension intercepts all interactions (e.g., clicks) of the curator with this GitHub repository and
records them as a Trace, serialized in JSON. In this Trace, each interaction is expressed
such that the entity that is subject to interaction is uniquely and abstractly identified.
That is, the interaction is not
tied to this specific repository on which the Trace is recorded. As a result, the recorded Trace will be re-usable in Step 3 to
automatically interact with other, similar repositories, in order to generate quality captures.
https://github.com/gorilla/mux
and activates the Memento Tracer browser extension.github.json
.portal_url_match
, which
has a value the URL pattern (expressed as a regular expression) for which this Trace applies.
{
"portal_url_match": "(github.com)\/([^\/]+)\/([^\/]+)",
"actions": [{
"action_order": "1",
"value": "summary.btn.btn-sm.btn-primary",
"type": "CSSSelector",
"action": "click"
},
{
"action_order": "2",
"value": "id(\"js-repo-pjax-container\")/div[2]/div[1]/div[5]/details[1]/div[1]/div[1]/div[1]/div[2]/a[2]",
"type": "XPath",
"action": "click"
},
{
"action_order": "3",
"value": "table.files.js-navigation-container.js-active-navigation-container a",
"type": "CSSSelector",
"action": "click"
}
],
"action_count": 3,
"resource_url": "https://github.com/gorilla/mux",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3431.0 Safari/537.36"
}
https://github.com/mementoweb/node-solid-server
that must be crawled.
Note that this is a different GitHub repository than the one used to create the Trace in Step 1.portal_url_match
above)
that matches the URL at hand. In this case, the crawler will match the URL to the Trace shown in Step 2,
and will start executing its sequence of interactions on https://github.com/mementoweb/node-solid-server
.
The many URLs that are being crawled are shown in log messages outputted by the Storm-Crawler.https://github.com/mementoweb/node-solid-server
captured according to the above Trace, is saved.Last update: May 23 2018