Goto

Collaborating Authors

 date


Schema Lineage Extraction at Scale: Multilingual Pipelines, Composite Evaluation, and Language-Model Benchmarks

arXiv.org Artificial Intelligence

Enterprise data pipelines, characterized by complex transformations across multiple programming languages, often cause a semantic disconnect between original metadata and downstream data. This "semantic drift" compromises data reproducibility and governance, and impairs the utility of services like retrieval-augmented generation (RAG) and text-to-SQL systems. To address this, a novel framework is proposed for the automated extraction of fine-grained schema lineage from multilingual enterprise pipeline scripts. This method identifies four key components: source schemas, source tables, transformation logic, and aggregation operations, creating a standardized representation of data transformations. For the rigorous evaluation of lineage quality, this paper introduces the Schema Lineage Composite Evaluation (SLiCE), a metric that assesses both structural correctness and semantic fidelity. A new benchmark is also presented, comprising 1,700 manually annotated lineages from real-world industrial scripts. Experiments were conducted with 12 language models, from 1.3B to 32B small language models (SLMs) to large language models (LLMs) like GPT-4o and GPT-4.1. The results demonstrate that the performance of schema lineage extraction scales with model size and the sophistication of prompting techniques. Specially, a 32B open-source model, using a single reasoning trace, can achieve performance comparable to the GPT series under standard prompting. This finding suggests a scalable and economical approach for deploying schema-aware agents in practical applications.


Samsung Teases Z Fold Ultra, Bing Gets AI Video, and Nothing Sets A Date--Your Gear News of the Week

WIRED

Bing has added a new AI-powered video generation tool to its mobile app, that's built on OpenAI's Sora text-to-video model. That's a feature that, even now, is exclusive to ChatGPT subscribers--but Bing users will get it for free. The vertical video creations are 5 seconds long but aren't generated instantly--once you type in a prompt, you'll get a notification when the video is ready. The Standard generation speed is free, but you'll also be able to access the "Fast" option 10 times before you'll need to cough up 100 Microsoft Reward points to keep using it at that speed. You can share these videos anywhere, and they'll be stored in the Bing app for 90 days.


Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

Neural Information Processing Systems

From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.


The CIDOC Conceptual Reference Module

AI Magazine

This ease has spurred an increasing interest from professionals, the general public, and consequently politicians to make publicly available the tremendous wealth of information kept in museums, archives, and libraries--the so-called memory organizations. Quite naturally, their development has focused on presentation, such as web sites and interfaces to their local databases. Now with more and more information becoming available, there is an increasing demand for targeted global search, comparative studies, data transfer, and data migration between heterogeneous sources of cultural contents. The reality of semantic interoperability is getting frustrating. In the cultural area alone, dozens of standard and hundreds of proprietary metadata and data structures exist as well as hundreds of terminology systems.


Tennessee Offender Management Information System

AI Magazine

Sentences for the 50,000 offenders vary from community work release and probation to lifelong incarceration. Tennessee was one of 38 states required by court order to improve prison conditions and reduce overcrowding; it is the target of over 300 inmate lawsuits each year. The new $14 million system is the largest and most comprehensive computer system ever developed in the field of corrections. Sentences C and D are consecutive to sentence B, and sentence B is consecutive to sentence A. C, and D of an offender, as shown in figure 1, it must be determined which sentence is not consecutive to any others. In this case, A is the sentence that must first be calculated because its dates do not depend on a previous sentence.


Searching for Gas Turbine Maintenance Schedules

AI Magazine

Preventive-maintenance schedules occurring in industry are often suboptimal with regard to maintenance coallocation, loss-of-production costs, and availability. We describe the implementation and deployment of a software decision support tool for the maintenance planning of gas turbines, with the goal of reducing the direct maintenance costs and the often costly production losses during maintenance down time. The optimization problem is formally defined, and we argue that the feasibility version is NPcomplete. We outline a heuristic algorithm that can quickly solve the problem for practical purposes and validate the approach on a real-world scenario based on an oil production facility. We also compare the performance of our algorithm with results from using integer programming and discuss the deployment of the application.


Capturing Planned Protests from Open Source Indicators

AI Magazine

Civil unrest events (protests, strikes, and "occupy" events) are common occurrences in both democracies and authoritarian regimes. The study of civil unrest is a key topic for political scientists as it helps capture an important mechanism by which citizens express themselves. In countries where civil unrest is lawful, qualitative analysis has revealed that more than 75 percent of the protests are planned, organized, or announced in advance; therefore detecting references to future planned events in relevant news and social media is a direct way to develop a protest forecasting system. We report on a system for doing that in this article. It uses a combination of key-phrase learning to identify what to look for, probabilistic soft logic to reason about location occurrences in extracted results, and time normalization to resolve future time mentions.


AAAI News

AI Magazine

AI Magazine's New Section We're pleased to introduce this new section, "AAAI News," as a regular feature in the AI Magazine. The section's purpose is to inform Net book value of fixed assets $36,027.17 AAAI sponsored workshops and grants Revenues $ 0 Expenses $ (25,112 50) Gross Margin ($25,112.50)


Analyzing the relationship of Twitter users towards brands (e. g. Air Berlin)

@machinelearnbot

Social media platforms such as Twitter and Facebook enable everyone to voice their opinions about topics, companies, and products online. These comments are a great source for companies to analyze their customers' opinion about their brand or product. However, with billions of Tweets and posts daily, this is can take a lot of time. Unless of course, you use R J With just a few lines of R-code and the help of machine learning, we're able to build mood monitoring tools quickly, so that the public opinion about your or anyone's company can be monitored and evaluated. The aim of our script is to analyze the opinions about individual companies over a longer period of time.


la-hm-la-affairs-daniel-sanchez-20171014-story.html

Los Angeles Times

Are you a veteran of L.A.'s current dating scene? Even in the New Los Angeles, with Lyft and Uber giving us cheaper rides and two-thirds of voters passing Measure M, dating without a car is still playing the game with a serious handicap. "I don't mind that you don't drive," a woman I'd been dating for six months told me last year as she drove us to dinner at Broken Spanish for my birthday. L.A. Affairs chronicles the current dating scene in and around Los Angeles.