weblock.io — Robust Linking and Archiving of Websites Cited in Scholarly, Scientific and Legal Documents

https://weblock.io
Weblock is currently in beta. You can already use the service, but things may still change a bit.

Contents

  1. Supported Formats
  2. Authentication
    1. Obtaining a Token
    2. Including the Token
  3. Accessing Data
    1. Snapshot Index Method
    2. Snapshot View Method
  4. Sending Data
    1. Snapshot Add Method

Weblock API Documentation

1. Supported Formats

The Weblock API supports application/json and text/xml formats. We recommend to use JSON. For simplicity the following documentation and examples all use JSON. To use XML instead of JSON, simply interchange application/json with text/xml, and post data as XML in cases where you have to post data to the API

To test the API functionality, you can use the RestClient, which is available as a plug-in for Firefox, Chrome and Safari browsers. Additionally, if you are signed in to Weblock, you can explore index and view methods directly through your browser.

2. Authentication

The Weblock API traffic is encrypted via HTTPS / SSL, and uses JSON Web Tokens (JWT) to authenticate and securely communicate with API users. Tokens expire after 7 days.

2a. Obtaining a Token

To start an API session, obtain your token via the /api/users/token method, by posting your apikey and apisecret as properly formatted JSON in the body of the request. Your apikey and apisecret are available from your user account. Tokens expire after 7 days and have thus to be renewed at least once per week.

https://weblock.io/api/users/token
POST
Accept: application/json
Content-Type: application/json
{ "apikey" : "YOUR_API_KEY" , "apisecret" : "YOUR_API_SECRET" }

On success, the API will respond with the token that your application should use subsequently to identfy itself when communicating with the API.

{
  "success": true,
  "data": {
    "token": "...(this will be your token)..."
  }
}

2b. Including the Token

Include the token obtained in 2a. as an authorization header in every subsequent request (note the word "Bearer" in the header, as well as the absence of any quotation marks around the token).

Accept: application/json
Authorization: Bearer YOUR_TOKEN

3. Accessing Data

Within Weblock.io, archived websites are called snapshots. Through the API you can currently access the list of snapshots that have been saved using your account ("index"), and the details for each single snapshot ("view"). For creating new snapshots via the API, please see the section on adding new snapshots further down.

3a. Snapshots Index Method

The snapshots index method returns a paginated results set with all website snapshots that have been taken from within your account, with a default of 100 results per page and ordered by date (the newest on top).

https://weblock.io/api/snapshots
GET
Accept: application/json
Authorization: Bearer {YOUR_TOKEN}
{
  "success": true,
  "data": {
    "snapshots": [
      {
        "id": ...,
        "url": ...,
        "url_domain": ...,
        "content_type": ...,
        "title": ...,
        "author": ...,
        "description": ...,
        "cache_date": ...,
        "cache_ip": ...,
        "weblock_url": ...
      },
      {
        "id": ...,
        "url": ...,
        ...
      },
    ]
  }
}

The response will also include a block with information on the pagination and the current position within the results set. limit is the current limit of results per page. page indicates the current page within the results set. page_prev tells you whether there is a previous page. page_next tells you whether there is a next page. results indicates the total number of results (i.e. snapshots) found. pages is the total number of pages within the results set (at the current results per page limit).

{
  "success": true,
  "data": {
    "snapshots": [
	  ...
    ],
    "pagination": {
      "limit": 100,
      "page": 1,
      "page_prev": false,
      "page_next": true,
      "results": 542,
      "pages": 6
    }
  }
}

To retrieve more than 100 results per page, set the URL query parameter limit (the maximum is 1.000)

https://weblock.io/api/snapshots?limit=...

To retrieve another page than the first page of the results set, use the URL query parameter page

https://weblock.io/api/snapshots?page=...

3b. Snapshots View Method

The retrieve a single snapshot, append the id of the snapshot to the URL. Note that you can only retrieve snapshots that were created from within your user account The view method includes the field http_code which indicates the current http status code of the archived URL. This status code is fetched live, and thus each view request will take roughly a bit more than one second to complete. To load large amounts of data from the API, use the index method instead.

https://weblock.io/api/snapshots/{id}
GET
Accept: application/json
Authorization: Bearer {YOUR_TOKEN}
{
  "success": true,
  "data": {
    "snapshot": {
      "id": ...,
      "url": ...,
      "url_domain": ...,
      "content_type": ...,
      "title": ...,
      "author": ...,
      "description": ...,
      "cache_date": ...,
      "cache_ip": ...,
      "http_code": ...,
      "http_code_updated": ...,
      "created": ...,
      "weblock_url": ..."
    }
  }
}

4. Sending Data

4a. Snapshots Add Method

You can create new snapshots for webpages via the API. For this, you simply need to post the URL in json (or xml) format to the API. If the URL is accessible over the Internet, the Weblock API will respond with a success code and return the id of the newly created snapshot along with some http response codes and content information. The API also returns the weblock_url field: this is the URL where the snapshot can be accessed from. Note that depending on the backlog of the system, it can take from seconds up to minutes until the snapshot of the website is effectively created. However, the weblock_url becomes immediately available.

https://weblock.io/api/snapshots
POST
Accept: application/json
Content-Type: application/json
Authorization: Bearer {YOUR_TOKEN}
{"url" : "URL_TO_CACHE"}
{
  "success": true,
  "data": {
    "snapshot": {
      "url_domain": ...,
      "url": ...,
      "http_response": ...,
      "http_code": ...,
      "content_type": ...,
      "cache_ip": ...,
      "id": ...,
      "weblock_url": ...
    },
  }
}