AITopics | Jackson

Collaborating Authors

Jackson

PUBLICSPEAK: Hearing the Public with a Probabilistic Framework in Local Government

Xu, Tianliang, Brown, Eva Maxfield, Dwyer, Dustin, Tomkins, Sabina

arXiv.org Artificial IntelligenceMar-14-2025

Local governments around the world are making consequential decisions on behalf of their constituents, and these constituents are responding with requests, advice, and assessments of their officials at public meetings. So many small meetings cannot be covered by traditional newsrooms at scale. We propose PUBLICSPEAK, a probabilistic framework which can utilize meeting structure, domain knowledge, and linguistic information to discover public remarks in local government meetings. We then use our approach to inspect the issues raised by constituents in 7 cities across the United States. We evaluate our approach on a novel dataset of local government meetings and find that PUBLICSPEAK improves over state-of-the-art by 10% on average, and by up to 40%.

public comment, public remark, utterance, (14 more...)

arXiv.org Artificial Intelligence

2503.11743

Country:

North America > United States > Washington > King County > Seattle (0.28)
North America > United States > California > Alameda County > Oakland (0.14)
North America > United States > Virginia > Richmond (0.14)
(12 more...)

Genre: Research Report (1.00)

Industry:

Media > News (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

Wang, Ming, Wang, Fang, Hu, Minghao, He, Li, Wang, Haiyang, Zhang, Jun, Yan, Tianwei, Li, Li, Luo, Zhunchen, Luo, Wei, Bai, Xiaoying, Geng, Guotong

arXiv.org Artificial IntelligenceMar-10-2025

Long-form article generation (LFAG) presents challenges such as maintaining logical consistency, comprehensive topic coverage, and narrative coherence across extended articles. Existing datasets often lack both the hierarchical structure and fine-grained annotation needed to effectively decompose tasks, resulting in shallow, disorganized article generation. To address these limitations, we introduce DeFine, a Decomposed and Fine-grained annotated dataset for long-form article generation. DeFine is characterized by its hierarchical decomposition strategy and the integration of domain-specific knowledge with multi-level annotations, ensuring granular control and enhanced depth in article generation. To construct the dataset, a multi-agent collaborative pipeline is proposed, which systematically segments the generation process into four parts: Data Miner, Cite Retreiver, Q&A Annotator and Data Cleaner. To validate the effectiveness of DeFine, we designed and tested three LFAG baselines: the web retrieval, the local retrieval, and the grounded reference. We fine-tuned the Qwen2-7b-Instruct model using the DeFine training dataset. The experimental results showed significant improvements in text quality, specifically in topic coverage, depth of information, and content fidelity. Our dataset publicly available to facilitate future research.

dataset, foo fighter, hawkins, (16 more...)

arXiv.org Artificial Intelligence

2503.0717

Country:

Europe > United Kingdom (0.14)
North America > United States > California > Orange County > Laguna Beach (0.04)
North America > United States > Texas > Tarrant County > Fort Worth (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

WavePulse: Real-time Content Analytics of Radio Livestreams

Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay

arXiv.org Artificial IntelligenceDec-23-2024

Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.

large language model, machine learning, real time system, (23 more...)

arXiv.org Artificial Intelligence

2412.17998

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > New York > Kings County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(215 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Media > Radio (1.00)
Leisure & Entertainment (1.00)
Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback

Pathologist-like explainable AI for interpretable Gleason grading in prostate cancer

Mittmann, Gesa, Laiouar-Pedari, Sara, Mehrtens, Hendrik A., Haggenmüller, Sarah, Bucher, Tabea-Clara, Chanda, Tirtha, Gaisa, Nadine T., Wagner, Mathias, Klamminger, Gilbert Georg, Rau, Tilman T., Neppl, Christina, Compérat, Eva Maria, Gocht, Andreas, Hämmerle, Monika, Rupp, Niels J., Westhoff, Jula, Krücken, Irene, Seidl, Maximillian, Schürch, Christian M., Bauer, Marcus, Solass, Wiebke, Tam, Yu Chun, Weber, Florian, Grobholz, Rainer, Augustyniak, Jaroslaw, Kalinski, Thomas, Hörner, Christian, Mertz, Kirsten D., Döring, Constanze, Erbersdobler, Andreas, Deubler, Gabriele, Bremmer, Felix, Sommer, Ulrich, Brodhun, Michael, Griffin, Jon, Lenon, Maria Sarah L., Trpkov, Kiril, Cheng, Liang, Chen, Fei, Levi, Angelique, Cai, Guoping, Nguyen, Tri Q., Amin, Ali, Cimadamore, Alessia, Shabaik, Ahmed, Manucha, Varsha, Ahmad, Nazeel, Messias, Nidia, Sanguedolce, Francesca, Taheri, Diana, Baraban, Ezra, Jia, Liwei, Shah, Rajal B., Siadat, Farshid, Swarbrick, Nicole, Park, Kyung, Hassan, Oudai, Sakhaie, Siamak, Downes, Michelle R., Miyamoto, Hiroshi, Williamson, Sean R., Holland-Letz, Tim, Schneider, Carolin V., Kather, Jakob Nikolas, Tolkach, Yuri, Brinker, Titus J.

arXiv.org Artificial IntelligenceOct-19-2024

The aggressiveness of prostate cancer, the most common cancer in men worldwide, is primarily assessed based on histopathological data using the Gleason scoring system. While artificial intelligence (AI) has shown promise in accurately predicting Gleason scores, these predictions often lack inherent explainability, potentially leading to distrust in human-machine interactions. To address this issue, we introduce a novel dataset of 1,015 tissue microarray core images, annotated by an international group of 54 pathologists. The annotations provide detailed localized pattern descriptions for Gleason grading in line with international guidelines. Utilizing this dataset, we develop an inherently explainable AI system based on a U-Net architecture that provides predictions leveraging pathologists' terminology. This approach circumvents post-hoc explainability methods while maintaining or exceeding the performance of methods trained directly for Gleason pattern segmentation (Dice score: 0.713 $\pm$ 0.003 trained on explanations vs. 0.691 $\pm$ 0.010 trained on Gleason patterns). By employing soft labels during training, we capture the intrinsic uncertainty in the data, yielding strong results in Gleason pattern segmentation even in the context of high interobserver variability. With the release of this dataset, we aim to encourage further research into segmentation in medical tasks with high levels of subjectivity and to advance the understanding of pathologists' reasoning processes.

explanation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.15012

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Washington > King County > Seattle (0.14)
(46 more...)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Urology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Prostate Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.70)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

Zeng, Fengzhu, Li, Wenqian, Gao, Wei, Pang, Yan

arXiv.org Artificial IntelligenceSep-29-2024

Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

dataset, detection, misinformation, (15 more...)

arXiv.org Artificial Intelligence

2409.19656

Country:

Europe > France > Île-de-France > Paris > Paris (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(25 more...)

Genre: Research Report (0.82)

Industry:

Media > News (1.00)
Leisure & Entertainment > Sports > Football (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
(2 more...)

Add feedback

The Emerging AI Divide in the United States

Daepp, Madeleine I. G., Counts, Scott

arXiv.org Artificial IntelligenceApr-30-2024

The digital divide describes disparities in access to and usage of digital tooling between social and economic groups. Emerging generative artificial intelligence tools, which strongly affect productivity, could magnify the impact of these divides. However, the affordability, multi-modality, and multilingual capabilities of these tools could also make them more accessible to diverse users in comparison with previous forms of digital tooling. In this study, we characterize spatial differences in U.S. residents' knowledge of a new generative AI tool, ChatGPT, through an analysis of state- and county-level search query data. In the first six months after the tool's release, we observe the highest rates of users searching for ChatGPT in West Coast states and persistently low rates of search in Appalachian and Gulf states. Counties with the highest rates of search are relatively more urbanized and have proportionally more educated, more economically advantaged, and more Asian residents in comparison with other counties or with the U.S. average. In multilevel models adjusting for socioeconomic and demographic factors as well as industry makeup, education is the strongest positive predictor of rates of search for generative AI tooling. Although generative AI technologies may be novel, early differences in uptake appear to be following familiar paths of digital marginalization.

chatgpt, fraction, highest rate, (17 more...)

arXiv.org Artificial Intelligence

2404.11988

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(23 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.68)
Banking & Finance > Economy (0.68)
Government > Regional Government (0.68)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Improvement in Semantic Address Matching using Natural Language Processing

Gupta, Vansh, Gupta, Mohit, Garg, Jai, Garg, Nitesh

arXiv.org Artificial IntelligenceApr-17-2024

Address matching is an important task for many businesses especially delivery and take out companies which help them to take out a certain address from their data warehouse. Existing solution uses similarity of strings, and edit distance algorithms to find out the similar addresses from the address database, but these algorithms could not work effectively with redundant, unstructured, or incomplete address data. This paper discuss semantic Address matching technique, by which we can find out a particular address from a list of possible addresses. We have also reviewed existing practices and their shortcoming. Semantic address matching is an essentially NLP task in the field of deep learning. Through this technique We have the ability to triumph the drawbacks of existing methods like redundant or abbreviated data problems. The solution uses the OCR on invoices to extract the address and create the data pool of addresses. Then this data is fed to the algorithm BM-25 for scoring the best matching entries. Then to observe the best result, this will pass through BERT for giving the best possible result from the similar queries. Our investigation exhibits that our methodology enormously improves both accuracy and review of cutting-edge technology existing techniques.

algorithm, belleville, category, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/INCET51464.2021.9456342

2404.11691

Country:

North America > United States > Nebraska > Douglas County > Omaha (0.05)
North America > United States > Ohio > Lake County > Mentor (0.05)
North America > United States > Mississippi > Hinds County > Jackson (0.05)
(13 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Shao, Yijia, Jiang, Yucheng, Kanell, Theodore A., Xu, Peter, Khattab, Omar, Lam, Monica S.

arXiv.org Artificial IntelligenceApr-8-2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline. For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.

computational linguistic, hawkins, information, (14 more...)

arXiv.org Artificial Intelligence

2402.14207

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
North America > Canada > Ontario > Toronto (0.04)
(19 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

Russinovich, Mark, Salem, Ahmed, Eldan, Ronen

arXiv.org Artificial IntelligenceApr-2-2024

Large Language Models (LLMs) have risen significantly in popularity and are increasingly being adopted across multiple applications. These LLMs are heavily aligned to resist engaging in illegal or unethical topics as a means to avoid contributing to responsible AI harms. However, a recent line of attacks, known as "jailbreaks", seek to overcome this alignment. Intuitively, jailbreak attacks aim to narrow the gap between what the model can do and what it is willing to do. In this paper, we introduce a novel jailbreak attack called Crescendo. Unlike existing jailbreak methods, Crescendo is a multi-turn jailbreak that interacts with the model in a seemingly benign manner. It begins with a general prompt or question about the task at hand and then gradually escalates the dialogue by referencing the model's replies, progressively leading to a successful jailbreak. We evaluate Crescendo on various public systems, including ChatGPT, Gemini Pro, Gemini-Ultra, LlaMA-2 70b Chat, and Anthropic Chat. Our results demonstrate the strong efficacy of Crescendo, with it achieving high attack success rates across all evaluated models and tasks. Furthermore, we introduce Crescendomation, a tool that automates the Crescendo attack, and our evaluation showcases its effectiveness against state-of-the-art models.

crescendo, crescendomation, jailbreak, (15 more...)

arXiv.org Artificial Intelligence

2404.01833

Country:

North America > United States > Mississippi > Hinds County > Jackson (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Media (0.95)
Government (0.93)
Law Enforcement & Public Safety (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data

Veillette, Mark S., Kurdzo, James M., Stepanian, Phillip M., Cho, John Y. N., Samsi, Siddharth, McDonald, Joseph

arXiv.org Artificial IntelligenceJan-26-2024

Weather radar is the primary tool used by forecasters to detect and warn for tornadoes in near-real time. In order to assist forecasters in warning the public, several algorithms have been developed to automatically detect tornadic signatures in weather radar observations. Recently, Machine Learning (ML) algorithms, which learn directly from large amounts of labeled data, have been shown to be highly effective for this purpose. Since tornadoes are extremely rare events within the corpus of all available radar observations, the selection and design of training datasets for ML applications is critical for the performance, robustness, and ultimate acceptance of ML algorithms. This study introduces a new benchmark dataset, TorNet to support development of ML algorithms in tornado detection and prediction. TorNet contains full-resolution, polarimetric, Level-II WSR-88D data sampled from 10 years of reported storm events. A number of ML baselines for tornado detection are developed and compared, including a novel deep learning (DL) architecture capable of processing raw radar imagery without the need for manual feature extraction required for existing ML algorithms. Despite not benefiting from manual feature engineering or other preprocessing, the DL model shows increased detection performance compared to non-DL and operational baselines. The TorNet dataset, as well as source code and model weights of the DL baseline trained in this work, are made freely available.

algorithm, dataset, tornado, (16 more...)

arXiv.org Artificial Intelligence

2401.16437

Country:

North America > United States > Iowa (0.04)
North America > United States > Mississippi > Hinds County > Jackson (0.04)
North America > United States > Massachusetts > Middlesex County > Lexington (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback