Goto

Collaborating Authors

 fernandez


Apple's App Course Runs 20,000 a Student. Is It Really Worth It?

WIRED

Is It Really Worth It? Apple, Michigan taxpayers, and one of Detroit's wealthiest families spent roughly $30 million training hundreds of people to build iPhone apps. Two years ago, Lizmary Fernandez took a detour from studying to be an immigration attorney to join a free Apple course for making iPhone apps . The Apple Developer Academy in Detroit launched as part of the company's $200 million response to the Black Lives Matter protests and aims to expand opportunities for people of color in the country's poorest big city. But Fernandez found the program's cost-of-living stipend lacking--"A lot of us got on food stamps," she says--and the coursework insufficient for landing a coding job. "I didn't have the experience or portfolio," says the 25-year-old, who is now a flight attendant and preparing to apply to law school. "Coding is not something I got back to."


BERnaT: Basque Encoders for Representing Natural Textual Diversity

Azurmendi, Ekhi, de Landa, Joseba Fernandez, Bengoetxea, Jaione, Heredia, Maite, Etxaniz, Julen, Zubillaga, Mikel, Soraluze, Ander, Soroa, Aitor

arXiv.org Artificial Intelligence

Language models depend on massive text corpora that are often filtered for quality, a process that can unintentionally exclude non-standard linguistic varieties, reduce model robustness and reinforce representational biases. In this paper, we argue that language models should aim to capture the full spectrum of language variation (dialectal, historical, informal, etc.) rather than relying solely on standardized text. Focusing on Basque, a morphologically rich and low-resource language, we construct new corpora combining standard, social media, and historical sources, and pre-train the BERnaT family of encoder-only models in three configurations: standard, diverse, and combined. We further propose an evaluation framework that separates Natural Language Understanding (NLU) tasks into standard and diverse subsets to assess linguistic generalization. Results show that models trained on both standard and diverse data consistently outperform those trained on standard corpora, improving performance across all task types without compromising standard benchmark accuracy. These findings highlight the importance of linguistic diversity in building inclusive, generalizable language models.


Motor neuron diseases took their voices. AI is bringing them back.

MIT Technology Review

"A tracheostomy is a scary endeavor for people living with ALS, because it signifies crossing a new stage in life, a stage that is close to the end," Rodriguez tells me using a communication device. "Before the procedure I still had some independence, and I could still speak somewhat, but now I am permanently connected to a machine that breathes for me." Rodriguez and his wife, Maria Fernandez, who live in Miami, thought they would never hear his voice again. After feeding old recordings of Rodriguez's voice into a tool trained on voices from film, television, radio, and podcasts, the couple were able to generate a voice clone--a way for Jules to communicate in his "old voice." "Hearing my voice again, after I hadn't heard it for some time, lifted my spirits," says Rodriguez, who today communicates by typing sentences using a device that tracks his eye movements, which can then be "spoken" in the cloned voice.


Euska\~nolDS: A Naturally Sourced Corpus for Basque-Spanish Code-Switching

Heredia, Maite, Barnes, Jeremy, Soroa, Aitor

arXiv.org Artificial Intelligence

Code-switching (CS) remains a significant challenge in Natural Language Processing (NLP), mainly due a lack of relevant data. In the context of the contact between the Basque and Spanish languages in the north of the Iberian Peninsula, CS frequently occurs in both formal and informal spontaneous interactions. However, resources to analyse this phenomenon and support the development and evaluation of models capable of understanding and generating code-switched language for this language pair are almost non-existent. We introduce a first approach to develop a naturally sourced corpus for Basque-Spanish code-switching. Our methodology consists of identifying CS texts from previously available corpora using language identification models, which are then manually validated to obtain a reliable subset of CS instances. We present the properties of our corpus and make it available under the name Euska\~nolDS.


A Neighbor-based Approach to Pitch Ownership Models in Soccer

Mendes-Neves, Tiago, Meireles, Luís, Mendes-Moreira, João

arXiv.org Artificial Intelligence

Pitch ownership models allow many types of analysis in soccer and provide valuable assistance to tactical analysts in understanding the game's dynamics. The novelty they provide over event-based analysis is that tracking data incorporates context that event-based data does not possess, like player positioning. This paper proposes a novel approach to building pitch ownership models in soccer games using the K-Nearest Neighbors (KNN) algorithm. Our approach provides a fast inference mechanism that can model different approaches to pitch control using the same algorithm. Despite its flexibility, it uses only three hyperparameters to tune the model, facilitating the tuning process for different player skill levels. The flexibility of the approach allows for the emulation of different methods available in the literature by adjusting a small number of parameters, including adjusting for different levels of uncertainty. In summary, the proposed model provides a new and more flexible strategy for building pitch ownership models, extending beyond just replicating existing algorithms, and can provide valuable insights for tactical analysts and open up new avenues for future research. We thoroughly visualize several examples demonstrating the presented models' strengths and weaknesses. The code is available at github.com/nvsclub/KNNPitchControl.


Political Leaning Inference through Plurinational Scenarios

de Landa, Joseba Fernandez, Agerri, Rodrigo

arXiv.org Artificial Intelligence

Social media users express their political preferences via interaction with other users, by spontaneous declarations or by participation in communities within the network. This makes a social network such as Twitter a valuable data source to study computational science approaches to political learning inference. In this work we focus on three diverse regions in Spain (Basque Country, Catalonia and Galicia) to explore various methods for multi-party categorization, required to analyze evolving and complex political landscapes, and compare it with binary left-right approaches. We use a two-step method involving unsupervised user representations obtained from the retweets and their subsequent use for political leaning detection. Comprehensive experimentation on a newly collected and curated dataset comprising labeled users and their interactions demonstrate the effectiveness of using Relational Embeddings as representation method for political ideology detection in both binary and multi-party frameworks, even with limited training data. Finally, data visualization illustrates the ability of the Relational Embeddings to capture intricate intra-group and inter-group political affinities.


Twitter's data center knocked out by extreme heat in California

Los Angeles Times

Extreme heat that exhausted California's overworked electric grid on Labor Day had knocked out one of Twitter's main data centers in Sacramento, according to a report. While Twitter avoided a shutdown on Sept. 5 by leaning on its other data centers in Portland, Ore., and Atlanta during the outage to keep its systems running, a company executive warned that if another center were lost, some users would have been unable to access the social media platform, according to an internal memo obtained by CNN. Temperatures in Sacramento on Labor Day broke a daily record of 114 degrees, punching thermometers up to 116 by the afternoon. To power their online services to users, tech companies such as Twitter, Google, or Meta lean on data centers that can demand heavy loads of power and often generate large amounts of heat, requiring cooling systems to keep things running. As climate change continues to heat the planet, Twitter's outage underscores how such extreme weather impacts the online systems that billions of people rely on daily.


Unravelling cell biology through artificial intelligence

#artificialintelligence

The AI algorithm was able to predict the presence and the location of nuclei in more than 8,000 cells. Scientists from the Singapore University of Technology and Design (SUTD) and the National University of Singapore and the Nanyang Technological University, Singapore have used artificial intelligence (AI) to demonstrate a correlation between cytoskeleton organisation and nuclear position. The study was recently published in PLOS. To ensure that the study's parameters would not be limited by human conceptualisation, they developed a unique generative algorithm to interpret the cytoskeleton of eukaryotic cells using qualitative data, without telling the system what it was observing and how to measure it. "We separated the information related to the nucleus and the fibres in independent databases of images, ensuring that there was not any information about the nucleus found in the images of the fibres, so that the system could not cheat. Then we trained the system to find the location of the nucleus using only information specific to fibres. To do so, the system had to take the qualitative data and figure out on its own if there was a relation between the organisation of the fibres and the position of the nucleus. This forced the programme to find the parameters defining the system, free from human interpretation and predefined concepts," Associate Professor Javier G. Fernandez explained.


How to design an edge computing system for space

#artificialintelligence

In 1962, when astronaut John Glenn was preparing for an orbital mission, mathematician Katherine Johnson was called by the US space agency Nasa for an important task: calculating trajectories. According to Nasa's website, the complexity of the orbital flight, which would make Glenn the first American to orbit Earth, required the construction of a communications network that would link tracking stations around the world to computers in Washington, Florida and Bermuda. These computers were programmed with orbital equations that would control the trajectory of Glenn's Friendship 7 spacecraft. But these machines were also prone to glitches. So, Glenn asked engineers to enlist Johnson to run the same numbers through the same equations programmed into the computer, but by hand, on a desktop mechanical calculating machine.


Artificial Intelligence in Health Care: COVID-Net Aids Triage

#artificialintelligence

As the number of COVID-19 infections are again spiking around the U.S., health care workers struggling to stay ahead have a tool with a novel approach to add to their arsenal in COVID-Net, an open source AI-based platform that uses radiological lung images to determine COVID-19-specific lung damage, as well as assess the degree of that damage. The technology was developed in March, during the early days of the pandemic, but has been gaining more notice as an example of artificial intelligence in health care as more organizations have adopted it. Although the nonprofit project is being led by Red Hat, Boston Children's Hospital and DarwinAI (a 3-year-old proprietary artificial intelligence startup headquartered in Waterloo, Ontario), it began as a collaboration between Canada's University of Waterloo and DarwinAI. "COVID-Net was an initiative to try to contribute to the whirlwind of the pandemic in March," DarwinAI CEO Sheldon Fernandez told ITPro Today. "We open sourced it and we didn't want it to be commercial.