Access to resources and services

Linguateca

Esta página em português


One of the goals of Linguateca is to improve significantly the conditions for NLP of Portuguese, namely

This page links the main resources, services or programs developed under the scope of Linguateca.

AC/DC

Main goals of the AC/DC project The corpora were annotated with Eckhard Bick's PALAVRAS parser, from the VISL project.

CETEMPúblico

CETEMPúblico (Corpus de Extractos de Textos Electrónicos MCT/Público) is a corpus containing some 180 million words in European Portuguese, built by the project Computacional Processing of Portuguese following an agreement between the Portuguese Ministry for Science and Technology (MCT) and the newspaper PÚBLICO.

CETENFolha

CETENFolha (Corpus de Extractos de Textos Electrónicos NILC/Folha de São Paulo) is a corpus containing some 24 million words in Brazilian Portuguese, built by the project Computacional Processing of Portuguese from the texts of Folha de S. Paulo belonging to the corpus NILC/São Carlos, compiled by Núcleo Interinstitucional de Lingüística Computacional (NILC).

COMPARA

A Portuguese-English parallel corpus project, including a novel interface, DISPARA, in collaboration with Ana Frankenberg-Garcia. COMPARA is an open-ended collection of Portuguese-English and English-Portuguese translations. One can use COMPARA to find out how translators have translated words and expressions from Portuguese into English and from English into Portuguese.

Corpógrafo

Corpógrafo was created by CLUP/FLUP node of Linguateca to facilitate the creation of specialized, "do-it-yourself" corpora. The system offers text preprocessing, terminology extraction and help in defining concepts. A toolbox is provided that allows the user to manage his/her own texts and terminological databases.

Esfinge

Esfinge is a general domain question answering system that answers questions in Portuguese based on the Web.

Floresta sintá(c)tica

This project, in collaboration with the VISL project, has as aim to create a syntactically annotated treebank for Portuguese, humanly revised, to advance computational syntax and to create a reosurce for future evaluation tasks of tools for Portuguese.

METRA

METRA is a meta translator: a service that submits a piece to be translated to several different commercial translation engines on the Web, and presents the results together. It deals with the English-Portuguese and Portuguese-English translation pairs.

PAPEL

PAPEL is a dictionary-based lexical ontology for Portuguese lexical, created from Porto Editora's Dicionário da Língua Portuguesa, created mainly at the Coimbra node of Linguateca. It will be made publicly available.

REPENTINO

REPENTINO is a repository of textual named entity instances, i.e. a set of proper nouns denoting a specific entity which in Portuguese is written with at least one capital, classified as to which kind of entity they denote (e.g, company, book title, place name, etc.). REPENTINO is organized in several major categories, in turn subdivided in subcategories.

Repositório

This space provides a kind of electronic Web shelf for all NLP resources for Portuguese that people want us to make available. We give access to IR collections, MT lexicons and corpora of summaries, among others.

WebJspell

WebJspell is a Web interface to Jspell, a morphological analyser and spell checker developed by Natura for Portuguese and English. Through WebJspell it is also possible to spellcheck entire Webpages by simply submitting their URL, as well as propose new entries for the dictionaries. WebJspell was created by the Braga node of Linguateca.

WPT 03

The WPT 03 is a collection of Web pages created from a crawl of the entire Portuguese Web in the year 2003. As far as we know, the WPT 03 is the first and only collection that spans the entire Web of a country which is freely available for research purposes. The WPT 03 is a result of a web crawl made between March and June of 2003 by the crawlers of tumba!, a Web search engine for the Portuguese community. In addition, the log of the queries to tumba! in the period from 1st October 2003 are also provided, after having run them through an anonymization procedure. The WPT03 was created by the XLDB group and made available here.

See also the Language Resource CatalogSearch for language resources: OLAC .


Last update: 18 March 2010
Send questions, comments and suggestions