Natural Language Processing Centre,

Faculty of Informatics, Masaryk University

Botanická 68a, 602 00 Brno, CZECH REPUBLIC

(further as “NLP Centre”)

NLP Centre Web Corpus Agreement

to use the NLP Centre data set.

(further as “Agreement”)

I, _________________________________, am a person engaging in research and development of natural-language-processing, information-retrieval or document-understanding systems

Official mail address __________________________________________________



Telephone _____________________________________

E-mail ________________________________________

I hereby agree to use the collection designated as ______________________________ data set collected by the NLP Centre (the "Collection"). By signing this Agreement I hereby agree to abide by the following understandings, terms and conditions. These understandings, terms and conditions apply equally to all or to part of the Collection, including any updates or new versions of the Collection supplied under this Agreement.


  1. The Collection has been obtained by crawling the Internet. Due to the size of the Collection it has not been practicable to obtain permission from copyright owners to provide the Collection for the uses permitted under this Agreement (“Permitted Uses”).
  2. I understand that all the documents in the Collection are documents which have been at some time made publicly available on the Internet and which have been collected using a process which respects the commonly accepted methods (such as robots.txt) for indicating that the documents should not be so collected.
  3. Owners of copyright in individual documents may choose to request deletion of these documents from the Collection.
  4. The limitation on permitted use contained in the following section is intended to reduce the risk of any action being brought by copyright owners, but if this happens I agree to bear all associated liability.

Permitted Uses

  1. The Collection may only be used for research and development of natural-language processing, information-retrieval or document-understanding systems.
  2. Summaries, analyses and interpretations of the linguistic properties of the Collection may be derived and published, provided it is not possible to reconstruct the Collection from these summaries.
  3. Small excerpts of the Collection may be displayed to others or published in a scientific or technical context, solely for the purpose of describing the research and development carried out and related issues.
  4. All efforts must be made not to infringe the rights of any third party including, but not limited to, the authors and publishers of any excerpts used in accordance with clause 3 above in this “Permitted Uses” section.
  5. I must make sure that I only display the Collection to or share the Collection with persons who also signed this Agreement with NLP Centre.

Agreement to Delete Data on Request

I undertake to delete within thirty days of receiving notice all copies of any nominated document

that is part of the Collection whenever requested to do so by either:

  1. NLP Centre; or
  2. the owner of copyright for the particular document.

No Warranty

The Collection is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement.  In no event shall the NLP Centre be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising in any way of the use of the Collection.


Either NLP Centre or me may terminate this Agreement at any time by notifying the other party in writing. On termination of the Agreement I shall destroy all copies of the Collection.

Applicable Law

This Agreement is governed by the laws of the Czech Republic.

I hereby execute this Agreement in favor and for the benefit of the NLP Centre.

By the Individual:

Signature _________________________________________

Date ________________________________

Name (please print) _________________________________