Data

Milestone 1: Building the Lemma Bank

The first phase of the LiITA: Linking Italian project focusses on building a Lemma Bank from existing Italian lemma sets that will be meticulously selected and compared. The Lemma Bank is available as Linked Open Data, adhering to the widely accepted vocabulary outlined in the OntoLex-Lemon model for describing lexical resources:

Browse the Lemma Bank: https://liita-lod.github.io/query-interface/
View: http://liita.it/data/id/lemma/LemmaBank
Download: https://github.com/LiITA-LOD/LiITA_LemmaBank

Milestone 2: Linking the Resources

Access the data through our SPARQL endpoint.

The second phase revolves around integrating a set of freely available Italian linguistic resources with the Knowledge Base.

Lexical Resources

CompL-it

Computational Lexicon for Italian. See http://hdl.handle.net/20.500.11752/ILC-1007 for more details on the resource. View: (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-1007), download ttl (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/bitstream/handle/20.500.11752/ILC-1007/complit.ttl.gz?sequence=1&isAllowed=y).

Vocabolario della Lingua Parmigiana

A bilingual lexicon having Italian entries and the corresponding translations in Parmigiano, edited by Umberto Pavarini and Gruppo di Lavoro Memento Mori. View: (https://liita.it/data/id/DialettoParmigiano/lemma/LemmaBank.html), download ttl (linkhttps://github.com/LiITA-LOD/LocalVarieties/tree/main/Parmigiano).

Sicilian-Italian Lexicon

A Sicilian-Italian lexicon extracted from wikizziunariu.
View: https://liita.it/data/id/LexicalResources/DialettoSiciliano/Wikizziunariu.html
Download ttl: https://github.com/LiITA-LOD/LocalVarieties/tree/main/Siciliano.

Sentix

Sentix is an affective lexicon for the Italian language, incorporating 63,660 entries, with associated polarity scores (ranging from -1 to +1) and categorical polarity classifications (Positive, Neutral, Negative).

ELIta

Emotion Lexicon for Italian, where words are annotated for basic emotions with scores from 0 to 1 and the emotion dimensions from 1 to 9.

Textual Resources

Luigi Pirandello’s Novellas

This corpus contains a series of short stories by Luigi Pirandello. The texts have been tokenised, pos-tagged and lemmatised specifically for linking to the LiITA Knowledge Base.

View: https://liita.it/data/id/corpora/Pirandello/id/corpus.html

Milestone 3: Developing Tools

The project’s third phase brings forth the development of a tool that empowers resource providers to automatically connect their data to the Lemma Bank. Coupled with the project’s commitment to utilising established vocabularies for knowledge representation as LOD, this tool promotes an open-ended approach. As a result, the knowledge base becomes readily extensible and adaptable for future enrichment and expansion.

To ensure long-term sustainability and facilitate widespread dissemination, the project culminates with the integration of the Lemma Bank and its associated LOD resources into prominent infrastructures like CLARIN-IT and the European Language Grid. This final step guarantees the accessibility of the project’s valuable outcomes for the benefit of the wider research community.

Please check back to this page again soon to find more updates on the project’s data output.