Adaptive scheduling for adaptive sampling in POS taggers construction

Ferro, Manuel Vilares, Bilbao, Victor M. Darriba, Ferro, Jesús Vilares

Feb-4-2024–arXiv.org Artificial Intelligence

However, managing large amounts of information is an expensive, time-consuming and non-trivial activity, especially when expert knowledge is needed. Furthermore, having access to vast data bases does not imply that ml algorithms must use them all and a subset is therefore preferred, provided it does not reduce the quality of the mined knowledge. Such observations then supply the same learning power with far less computational cost and allow the training process to be speeded up, whilst their nature and optimal size are rarely obvious. This justifies the interest of developing efficient sampling techniques, which involves anticipating the link between performance and experience regarding the accuracy of the system we are generating. At this point, correctness with respect to the working hypotheses and robustness against changes to them should be guaranteed in order to supply a practical solution. The former ensures the effectiveness of the proposed strategy in the framework considered, while the latter enables fluctuations in the learning conditions to be assimilated without compromising correctness, thus providing reliability to our calculations. An area of work that is particularly sensitive to these inconveniences is natural language processing (nlp), the components of which are increasingly based on ml [3, 50].

local testing frame, proceedings, resp, (15 more...)

arXiv.org Artificial Intelligence

Feb-4-2024

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea (0.04)
- North America > United States
  - New York (0.04)
  - Illinois (0.04)
  - District of Columbia > Washington (0.04)
  - Florida > Broward County
    - Fort Lauderdale (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Europe
  - Spain > Galicia
    - Ourense Province > Ourense (0.04)
    - A Coruña Province > A Coruña (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Germany
    - Baden-Württemberg > Freiburg (0.04)
    - Bavaria > Upper Bavaria
      - Munich (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)

Genre:
- Instructional Material (0.68)
- Research Report (0.50)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Grammars & Parsing (1.00)
  - Machine Learning > Statistical Learning (1.00)