This demo serves no other purpose than to illustrate the technical capabilities of the Memento Tracer Framework for creating high quality captures of web publications. It has been selected because it illustrates the ability of the framework to capture resources that would be very difficult, if not impossible, to capture using regular web crawling approaches.
https://www.slideshare.net/hvdsomp/paul-evan-peters-lecture
.
The extension intercepts all interactions (e.g., clicks) of the curator with this SlideShare presentation and
records them as a Trace, serialized in JSON. In this Trace, each interaction is expressed
such that the entity that is subject to interaction is uniquely and abstractly identified.
That is, the interaction is not
tied to this specific repository on which the Trace is recorded. As a result, the recorded Trace will be re-usable in Step 3 to
automatically interact with other, similar repositories, in order to generate quality captures.
https://www.slideshare.net/hvdsomp/paul-evan-peters-lecture
and activates the Memento Tracer browser extension.slideshare.json
.portal_url_match
, which
has a value the URL pattern (expressed as a regular expression) for which this Trace applies.
{
"portal_url_match": "(slideshare.net)\/([^\/]+)\/([^\/]+)",
"actions": [
{
"action_order": "1",
"value": "div.j-next-btn.arrow-right",
"type": "CSSSelector",
"action": "repeated_click",
"repeat_until": {
"condition": "changes",
"type": "resource_url"
}
}],
"resource_url": "https://www.slideshare.net/hvdsomp/paul-evan-peters-lecture",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3417.0 Safari/537.36"
}
https://pt.slideshare.net/elfpavlik/api-standardization-work-in-w3c-groups/
that must be crawled.
Note that this is a different SlideShare presentation than the one used to create the Trace in Step 1.portal_url_match
above)
that matches the URL at hand. In this case, the crawler will match the URL to the Trace shown in Step 2,
and will start executing its sequence of interactions on https://pt.slideshare.net/elfpavlik/api-standardization-work-in-w3c-groups/
.
The many URLs that are being crawled are shown in log messages outputted by the Storm-Crawler.https://pt.slideshare.net/elfpavlik/api-standardization-work-in-w3c-groups/
captured according to the above Trace, is saved.Last update: May 23 2018