linguistique
Multilingual corpora for the study of new concepts in the social sciences and humanities:
Kyriakoglou, Revekka, Pappa, Anna
This article presents a hybrid methodology for building a multilingual corpus designed to support the study of emerging concepts in the humanities and social sciences (HSS), illustrated here through the case of ``non-technological innovation''. The corpus relies on two complementary sources: (1) textual content automatically extracted from company websites, cleaned for French and English, and (2) annual reports collected and automatically filtered according to documentary criteria (year, format, duplication). The processing pipeline includes automatic language detection, filtering of non-relevant content, extraction of relevant segments, and enrichment with structural metadata. From this initial corpus, a derived dataset in English is created for machine learning purposes. For each occurrence of a term from the expert lexicon, a contextual block of five sentences is extracted (two preceding and two following the sentence containing the term). Each occurrence is annotated with the thematic category associated with the term, enabling the construction of data suitable for supervised classification tasks. This approach results in a reproducible and extensible resource, suitable both for analyzing lexical variability around emerging concepts and for generating datasets dedicated to natural language processing applications.
- North America > United States > Maine (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)
- Europe > Bulgaria > Sofia City Province > Sofia (0.04)
- Asia > South Korea (0.04)
Animer une base de connaissance: des ontologies aux mod{è}les d'I.A. g{é}n{é}rative
Animating a Knowledge Base: From Ontologies to Generative AI Models From Expert Systems and the Semantic W eb to Generative AI: Model - Driven and Data - Driven Approaches in Area Studies In a context where the social sciences and humanities are experimenting with non - anthropocentric analytical frames, this article proposes a semiotic (structural) reading of the hybridization between symbolic AI and neural (or sub - symbolic) AI based on a field of application: the design and use of a knowledge base for area studies. W e describe the LaCAS ecosystem - Open Archives in Linguistic and Cultural Studies (thesaurus; RDF/OWL ontology; LOD services; harvesting; expertise; publication), deployed at Inalco (National Institute for Oriental Languages and Civilizations) in Paris with the Okapi (Open Knowledge and Annotation Interface) software environment from Ina (National Audiovisual Institute), which now has around 160,000 documentary r esources and ten knowledge macro - domains grouping together several thousand knowledge objects. W e illustrate this approach using the knowledge domain "Languages of the world" (~540 languages) and the knowledge object "Quechua (language)". On this basis, we discuss the controlled integration of neural tools, more specifically generative tools, into the life cycle of a knowledge base: assistance with data localization/qualification, index extraction and aggregation, property suggestion and testing, dynamic file generation, and engineering of contextualized prompts (generic, contextual, explanatory, adjustment, procedural) aligned with a domain ontology. W e outline an ecosystem of specialized agents capable of animating the database while respe cting its symbolic constraints, by articulating model - driven and data - driven methods .
- Asia > Singapore (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
Un mod{\`e}le de base de connaissances terminologiques
Séguéla, Patrick, Aussenac-Gilles, Nathalie
In the present paper, we argue that Terminological Knowledge Bases (TKB) are all the more useful for addressing various needs as they do not fulfill formal criteria. Moreover, they intend to clarify the terminology of a given domain by illustrating term uses in various contexts. Thus we designed a TKB structure including 3 linked features: terms, concepts and texts, that present the peculiar use of each term in the domain. Note that concepts are represented into frames whose non-formal description is standardized. Associated with this structure, we defined modeling criteria at the conceptual level. Finaly, we discuss the situation of TKB with regard to ontologies, and the use of TKB for the development of AI systems.
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
- (2 more...)
Un modèle générique d'organisation de corpus en ligne: application à la FReeBank
Salmon-Alt, Susanne, Romary, Laurent, Pierrel, Jean-Marie
The few available French resources for evaluating linguistic models or algorithms on other linguistic levels than morpho-syntax are either insufficient from quantitative as well as qualitative point of view or not freely accessible. Based on this fact, the FREEBANK project intends to create French corpora constructed using manually revised output from a hybrid Constraint Grammar parser and annotated on several linguistic levels (structure, morpho-syntax, syntax, coreference), with the objective to make them available on-line for research purposes. Therefore, we will focus on using standard annotation schemes, integration of existing resources and maintenance allowing for continuous enrichment of the annotations. Prior to the actual presentation of the prototype that has been implemented, this paper describes a generic model for the organization and deployment of a linguistic resource archive, in compliance with the various works currently conducted within international standardization initiatives (TEI and ISO/TC 37/SC 4).
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Europe > Portugal (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- (4 more...)
Sur le statut référentiel des entités nommées
We show in this paper that, on the one hand, named entities can be designated using different denominations and that, on the second hand, names denoting named entities are polysemous. The analysis cannot be limited to reference resolution but should take into account naming strategies, which are mainly based on two linguistic operations: synecdoche and metonymy. Lastly, we present a model that explicitly represents the different denominations in discourse, unifying the way to represent linguistic knowledge and world knowledge.
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > United States > Virginia > Fairfax County > Herndon (0.04)
- North America > United States > New York (0.04)
- (6 more...)
Raisonner avec des diagrammes : perspectives cognitives et computationnelles
Diagrammatic, analogical or iconic representations are often contrasted with linguistic or logical representations, in which the shape of the symbols is arbitrary. The aim of this paper is to make a case for the usefulness of diagrams in inferential knowledge representation systems. Although commonly used, diagrams have for a long time suffered from the reputation of being only a heuristic tool or a mere support for intuition. The first part of this paper is an historical background paying tribute to the logicians, psychologists and computer scientists who put an end to this formal prejudice against diagrams. The second part is a discussion of their characteristics as opposed to those of linguistic forms. The last part is aimed at reviving the interest for heterogeneous representation systems including both linguistic and diagrammatic representations.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (8 more...)
Le terme et le concept : fondements d'une ontoterminologie
Most definitions of ontology, viewed as a "specification of a conceptualization", agree on the fact that if an ontology can take different forms, it necessarily includes a vocabulary of terms and some specification of their meaning in relation to the domain's conceptualization. And as domain knowledge is mainly conveyed through scientific and technical texts, we can hope to extract some useful information from them for building ontology. But is it as simple as this? In this article we shall see that the lexical structure, i.e. the network of words linked by linguistic relationships, does not necessarily match the domain conceptualization. We have to bear in mind that writing documents is the concern of textual linguistics, of which one of the principles is the incompleteness of text, whereas building ontology - viewed as task-independent knowledge - is concerned with conceptualization based on formal and not natural languages. Nevertheless, the famous Sapir and Whorf hypothesis, concerning the interdependence of thought and language, is also applicable to formal languages. This means that the way an ontology is built and a concept is defined depends directly on the formal language which is used; and the results will not be the same. The introduction of the notion of ontoterminology allows to take into account epistemological principles for formal ontology building.
- North America > United States > California > San Mateo County > San Mateo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)