Goto

Collaborating Authors

 steward


Leveraging Retrieval Augmented Generative LLMs For Automated Metadata Description Generation to Enhance Data Catalogs

Singh, Mayank, Kumar, Abhijeet, Donaparthi, Sasidhar, Karambelkar, Gayatri

arXiv.org Artificial Intelligence

Data catalogs serve as repositories for organizing and accessing diverse collection of data assets, but their effectiveness hinges on the ease with which business users can look-up relevant content. Unfortunately, many data catalogs within organizations suffer from limited searchability due to inadequate metadata like asset descriptions. Hence, there is a need of content generation solution to enrich and curate metadata in a scalable way. This paper explores the challenges associated with metadata creation and proposes a unique prompt enrichment idea of leveraging existing metadata content using retrieval based fewshot technique tied with generative large language models (LLM). The literature also considers finetuning an LLM on existing content and studies the behavior of few-shot pretrained LLM (Llama, GPT3.5) vis-à-vis few-shot finetuned LLM (Llama2-7b) by evaluating their performance based on accuracy, factual grounding, and toxicity. Our preliminary results exhibit more than 80% Rouge-1 F1 for the generated content. This implied 87%- 88% of instances accepted as is or curated with minor edits by data stewards. By automatically generating descriptions for tables and columns in most accurate way, the research attempts to provide an overall framework for enterprises to effectively scale metadata curation and enrich its data catalog thereby vastly improving the data catalog searchability and overall usability. NTRODUCTION In the modern digital ecosystem, locating relevant data has become increasingly challenging due to the rapid expansion of data assets.


Steward: Natural Language Web Automation

Tang, Brian, Shin, Kang G.

arXiv.org Artificial Intelligence

Recently, large language models (LLMs) have demonstrated exceptional capabilities in serving as the foundation for AI assistants. One emerging application of LLMs, navigating through websites and interacting with UI elements across various web pages, remains somewhat underexplored. We introduce Steward, a novel LLM-powered web automation tool designed to serve as a cost-effective, scalable, end-to-end solution for automating web interactions. Traditional browser automation frameworks like Selenium, Puppeteer, and Playwright are not scalable for extensive web interaction tasks, such as studying recommendation algorithms on platforms like YouTube and Twitter. These frameworks require manual coding of interactions, limiting their utility in large-scale or dynamic contexts. Steward addresses these limitations by integrating LLM capabilities with browser automation, allowing for natural language-driven interaction with websites. Steward operates by receiving natural language instructions and reactively planning and executing a sequence of actions on websites, looping until completion, making it a practical tool for developers and researchers to use. It achieves high efficiency, completing actions in 8.52 to 10.14 seconds at a cost of $0.028 per action or an average of $0.18 per task, which is further reduced to 4.8 seconds and $0.022 through a caching mechanism. It runs tasks on real websites with a 40% completion success rate. We discuss various design and implementation challenges, including state representation, action sequence selection, system responsiveness, detecting task completion, and caching implementation.


Data Stewards Have The Worst Seat At The Table

#artificialintelligence

In his seminal 2017 blog post, The Downfall of the Data Engineer, Maxime Beauchemin wrote that the data engineer had the worst seat at the table. Data technology and teams have changed tremendously since that time, and now the Preset CEO and creator of Apache Airflow and Apache Superset has a brighter outlook on the future of the profession. I have also seen what was once a thankless position turn into a strategic driver of company value as data expanded beyond dashboards to machine learning models, customer-facing applications, and systems of record. So, if the data engineer no longer has the worst seat at the table, who then on the data team has inherited this unfortunate title? When you infer some of Maxime's original criteria–tedious tasks, low recognition, a lack of authority, and victim of operational creep–the data steward becomes the obvious choice.


What AI's Really Doing to the Enterprise: The Call for Delegated Data Governance

#artificialintelligence

Organizations are becoming more analytically inclined, automation is rampant, and business users are empowered to accomplish more at a greater scale than they previously could. Nonetheless, there's another side to the pervasive deployment of cognitive computing technologies throughout the data ecosystem, particularly in terms of the mounting ease, accessibility, and utility of advanced analytics. The increasing demand for predictive insight--and the data required to facilitate it--has very real repercussions in terms of data privacy and regulatory compliance which, if not properly addressed, can restrict AI's use for organizations. Many firms are attempting to balance the data demands for AI with what Privacera SVP of Marketing Piet Loubser termed the "let's stay out of trouble side of things. As much as we think externally of regulations from on top, the majority of organizations have much more stringent things going on inside their four walls."


Future of Artificial Intelligence (AI) for Business

#artificialintelligence

Artificial intelligence (AI) is continuing its migration out of the research lab and into the world of business. Leading companies across hundreds of industries are harnessing its power -- from banks analyzing countless data points in seconds to detect fraud, to call centers deploying chatbots to improve customer interactions. These early uses are still fairly limited, but huge advances in deep learning (a subset of machine learning) are starting to impact AI in ways that will soon help society and business tackle a wider set of more general problems. Such advances will also make it possible to automate more complex physical tasks that require adaptability and agility. At Salesforce, we believe AI has tremendous potential for improving the way organizations operate (and you can learn how AI is built into our entire Salesforce Customer 360 here).


Motional's fully driverless cars are coming to Nevada's roads for testing

Engadget

Motional, a joint venture between Hyundai and Aptiv, plans to start testing fully driverless cars in Nevada. The state is allowing the company to trial autonomous vehicles without having a safety driver behind the wheel. "The coming months will see the completion of a rigorous, self-imposed testing and assessment period, where we have studied the performance and safety of our vehicles across many thousands of miles and scenarios, on both public and private roads, in close partnership with one of the world's most respected safety assessors," Motional president and CEO Karl Iagnemma wrote in a blog post. "This process will include fully-driverless testing, on closed courses, this year." If all goes well with the closed-course tests, Motional plans to put driverless cars on public roads in Nevada in the coming months.


5 Healthcare predictions for 2020

#artificialintelligence

As the year ends, athenaInsight sat down with three healthcare experts to share their predictions for the coming year. A clear trend emerged: in 2020, the tide of value-based care will continue. To that end, the nexus of care will shift, employers and payers will drive innovation, and technology will pave the way for better risk analysis and patient outreach. According to Koustav Chatterjee, digital health industry analyst at Frost and Sullivan, "2020 is going to be a landmark year when, for the very first time, both payers and providers will embrace full-blown value-based care strategies." As regulatory requirements become clearer and more stable, and data is finally showing a tangible ROI, the transition to risk and quality-based programs will continue unabated.


VivaTech : We Must All Commit to Responsible AI and Data Practices - No Web Agency

#artificialintelligence

In a keynote at the VivaTech conference, IBM Chairman, President and CEO Ginni Rometty called on technology companies to adopt principles to protect client data and insights, and ensure the responsible and transparent use of artificial intelligence and other new technologies. "Every organization that develops or uses AI, or hosts or processes data, must do so responsibly and transparently. Companies are being judged not just by how we use data, but by whether we are trusted stewards of other people's data," Rometty said. "Society will decide which companies it trusts." Rometty underscored IBM's Principles for Trust and Transparency, which enumerate the company's decades-long approach to handling its clients' data and insights.


AllAnalytics - Pierre DeBois - How Analytics Has Changed (and Not Changed)

@machinelearnbot

That phrase became the popular song What's the Frequency, Kenneth? (Rather was mugged by a disturbed man who, thinking CBS was sending radio messages to his mind, referred to Rather as "Kenneth" while asking what "the frequency" was.) Some people even consider the phrase as exclamatory slang for something insane that happens. Creativity has certainly been applied to data, at least from what I have seen during my analytics career. That part has not changed. But the quality of its information has changed -- due to creative observations that digitally represents the activity of people, products, and services.


Unlocking the True Value of Finance as a Business Partner – Share Talk

#artificialintelligence

How can finance become a better business partner through utilizing emerging technologies? Here are 7 recommendations on how to unlock finance's potential. Over the last couple of years, companies have started to prepare for the 2020s and beyond; constantly responding to their rapidly changing environment. These changes are powered by emerging technologies, macroeconomic trends, consumer expectations and business models. Until recently, developments have been traditional and linear, following an incremental pace.