AITopics | ptc

Collaborating Authors

ptc

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs

Nuti, Felipe, Franzmeyer, Tim, Henriques, João

arXiv.org Artificial IntelligenceJul-1-2025

Past work has studied the effects of fine-tuning on large language models' (LLMs) overall performance on certain tasks. However, a quantitative and systematic method for analyzing its effect on individual outputs is still lacking. Here, we propose a new method for measuring the contribution that fine-tuning makes to individual LLM responses, assuming access to the original pre-trained model. Our method tracks the model's intermediate hidden states, providing a more fine-grained insight into the effects of fine-tuning than a simple comparison of final outputs from pre-trained and fine-tuned models. We introduce and theoretically analyze an exact decomposition of any fine-tuned LLM into a pre-training component and a fine-tuning component. Empirically, we find that model behavior and performance can be steered by up- or down-scaling the fine-tuning component during the forward pass. Motivated by this finding and our theoretical analysis, we define the Tuning Contribution (TuCo) as the ratio of the magnitudes of the fine-tuning component to the pre-training component. We observe that three prominent adversarial attacks on LLMs circumvent safety measures in a way that reduces TuCo, and that TuCo is consistently lower on prompts where these attacks succeed compared to those where they do not. This suggests that attenuating the effect of fine-tuning on model outputs plays a role in the success of such attacks. In summary, TuCo enables the quantitative study of how fine-tuning influences model behavior and safety, and vice versa.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.23423

Country:

North America > United States (0.14)
South America (0.04)
North America > Canada (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Regional Government (0.47)
Government > Immigration & Customs (0.46)
Information Technology > Security & Privacy (0.34)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

CogLM: Tracking Cognitive Development of Large Language Models

Wang, Xinglin, Yuan, Peiwen, Feng, Shaoxiong, Li, Yiwei, Pan, Boyuan, Wang, Heda, Hu, Yao, Li, Kan

arXiv.org Artificial IntelligenceAug-17-2024

Piaget's Theory of Cognitive Development (PTC) posits that the development of cognitive levels forms the foundation for human learning across various abilities. As Large Language Models (LLMs) have recently shown remarkable abilities across a wide variety of tasks, we are curious about the cognitive levels of current LLMs: to what extent they have developed and how this development has been achieved. To this end, we construct a benchmark CogLM (Cognitive Ability Evaluation for Language Model) based on PTC to assess the cognitive levels of LLMs. CogLM comprises 1,220 questions spanning 10 cognitive abilities crafted by more than 20 human experts, providing a comprehensive testbed for the cognitive levels of LLMs. Through extensive experiments across multiple mainstream LLMs with CogLM, we find that: (1) Human-like cognitive abilities have emerged in advanced LLMs (GPT-4), comparable to those of a 20-year-old human. (2) The parameter size and optimization objective are two key factors affecting the cognitive levels of LLMs. (3) The performance on downstream tasks is positively correlated with the level of cognitive abilities. These findings fill the gap in research on the cognitive abilities of LLMs, tracing the development of LLMs from a cognitive perspective and guiding the future direction of their evolution.

cognitive ability, llm, piaget, (14 more...)

arXiv.org Artificial Intelligence

2408.0915

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Multi-level Traffic-Responsive Tilt Camera Surveillance through Predictive Correlated Online Learning

Li, Tao, Bian, Zilin, Lei, Haozhe, Zuo, Fan, Yang, Ya-Ting, Zhu, Quanyan, Li, Zhenning, Ozbay, Kaan

arXiv.org Artificial IntelligenceAug-4-2024

In urban traffic management, the primary challenge of dynamically and efficiently monitoring traffic conditions is compounded by the insufficient utilization of thousands of surveillance cameras along the intelligent transportation system. This paper introduces the multi-level Traffic-responsive Tilt Camera surveillance system (TTC-X), a novel framework designed for dynamic and efficient monitoring and management of traffic in urban networks. By leveraging widely deployed pan-tilt-cameras (PTCs), TTC-X overcomes the limitations of a fixed field of view in traditional surveillance systems by providing mobilized and 360-degree coverage. The innovation of TTC-X lies in the integration of advanced machine learning modules, including a detector-predictor-controller structure, with a novel Predictive Correlated Online Learning (PiCOL) methodology and the Spatial-Temporal Graph Predictor (STGP) for real-time traffic estimation and PTC control. The TTC-X is tested and evaluated under three experimental scenarios (e.g., maximum traffic flow capture, dynamic route planning, traffic state estimation) based on a simulation environment calibrated using real-world traffic data in Brooklyn, New York. The experimental results showed that TTC-X captured over 60\% total number of vehicles at the network level, dynamically adjusted its route recommendation in reaction to unexpected full-lane closure events, and reconstructed link-level traffic states with best MAE less than 1.25 vehicle/hour. Demonstrating scalability, cost-efficiency, and adaptability, TTC-X emerges as a powerful solution for urban traffic management in both cyber-physical and real-world environments.

road segment, traffic state, ttc-x, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.trc.2024.104804

2408.02208

Country:

North America > United States > New York > Kings County > New York City (0.24)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Education > Educational Setting > Online (0.63)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution

Yin, Ziang, Gangi, Nicholas, Zhang, Meng, Zhang, Jeff, Huang, Rena, Gu, Jiaqi

arXiv.org Artificial IntelligenceJul-7-2024

Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great opportunity for hardware-efficient AI accelerators. However, current dense photonic accelerators fail to fully exploit the power-saving potential of algorithmic sparsity. It requires sparsity-aware hardware specialization with a fundamental re-design of photonic tensor core topology and cross-layer device-circuit-architecture-algorithm co-optimization aware of hardware non-ideality and power bottleneck. To trim down the redundant power consumption while maximizing robustness to thermal variations, we propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating. A power-optimized, crosstalk-aware dynamic sparse training framework is introduced to explore row-column structured sparsity and ensure marginal accuracy loss and maximum power efficiency. The extensive evaluation shows that our cross-stacked optimized accelerator SCATTER achieves a 511X area reduction and 12.4X power saving with superior crosstalk tolerance that enables unprecedented circuit layout compactness and on-chip power efficiency.

accelerator, crosstalk, sparsity, (15 more...)

arXiv.org Artificial Intelligence

2407.0551

Country: North America > United States > Arizona (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Digital Twin-based Driver Risk-Aware Intelligent Mobility Analytics for Urban Transportation Management

Li, Tao, Bian, Zilin, Lei, Haozhe, Zuo, Fan, Yang, Ya-Ting, Zhu, Quanyan, Li, Zhenning, Chen, Zhibin, Ozbay, Kaan

arXiv.org Artificial IntelligenceJul-2-2024

Traditional mobility management strategies emphasize macro-level mobility oversight from traffic-sensing infrastructures, often overlooking safety risks that directly affect road users. To address this, we propose a Digital Twin-based Driver Risk-Aware Intelligent Mobility Analytics (DT-DIMA) system. The DT-DIMA system integrates real-time traffic information from pan-tilt-cameras (PTCs), synchronizes this data into a digital twin to accurately replicate the physical world, and predicts network-wide mobility and safety risks in real time. The system's innovation lies in its integration of spatial-temporal modeling, simulation, and online control modules. Tested and evaluated under normal traffic conditions and incidental situations (e.g., unexpected accidents, pre-planned work zones) in a simulated testbed in Brooklyn, New York, DT-DIMA demonstrated mean absolute percentage errors (MAPEs) ranging from 8.40% to 15.11% in estimating network-level traffic volume and MAPEs from 0.85% to 12.97% in network-level safety risk prediction. In addition, the highly accurate safety risk prediction enables PTCs to preemptively monitor road segments with high driving risks before incidents take place. Such proactive PTC surveillance creates around a 5-minute lead time in capturing traffic incidents. The DT-DIMA system enables transportation managers to understand mobility not only in terms of traffic patterns but also driver-experienced safety risks, allowing for proactive resource allocation in response to various traffic situations. To the authors' best knowledge, DT-DIMA is the first urban mobility management system that considers both mobility and safety risks based on digital twin architecture.

information, safety risk, traffic state, (15 more...)

arXiv.org Artificial Intelligence

2407.15025

Country:

North America > United States > New York > Kings County > New York City (0.24)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Virginia (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
(3 more...)

Add feedback

DOCTOR: Dynamic On-Chip Temporal Variation Remediation Toward Self-Corrected Photonic Tensor Accelerators

Lu, Haotian, Banerjee, Sanmitra, Gu, Jiaqi

arXiv.org Artificial IntelligenceMay-31-2024

Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads, offering unparalleled speed and energy efficiency, especially in resource-limited, latency-sensitive edge computing environments. However, the deployment of analog photonic tensor accelerators encounters reliability challenges due to hardware noise and environmental variations. While off-chip noise-aware training and on-chip training have been proposed to enhance the variation tolerance of optical neural accelerators with moderate, static noise, we observe a notable performance degradation over time due to temporally drifting variations, which requires a real-time, in-situ calibration mechanism. To tackle this challenging reliability issues, for the first time, we propose a lightweight dynamic on-chip remediation framework, dubbed DOCTOR, providing adaptive, in-situ accuracy recovery against temporally drifting noise. The DOCTOR framework intelligently monitors the chip status using adaptive probing and performs fast in-situ training-free calibration to restore accuracy when necessary. Recognizing nonuniform spatial variation distributions across devices and tensor cores, we also propose a variation-aware architectural remapping strategy to avoid executing critical tasks on noisy devices. Extensive experiments show that our proposed framework can guarantee sustained performance under drifting variations with 34% higher accuracy and 2-3 orders-of-magnitude lower overhead compared to state-of-the-art on-chip training methods. Our code is open-sourced at https://github.com/ScopeX-ASU/DOCTOR.

accelerator, overhead, variation, (15 more...)

arXiv.org Artificial Intelligence

2403.02688

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
(3 more...)

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Use of natural language processing to extract and classify papillary thyroid cancer features from surgical pathology reports

Loor-Torres, Ricardo, Wu, Yuqi, Cabezas, Esteban, Borras, Mariana, Toro-Tobon, David, Duran, Mayra, Zahidy, Misk Al, Chavez, Maria Mateo, Jacome, Cristian Soto, Fan, Jungwei W., Ospina, Naykky M. Singh, Wu, Yonghui, Brito, Juan P.

arXiv.org Artificial IntelligenceMay-22-2024

Background We aim to use Natural Language Processing (NLP) to automate the extraction and classification of thyroid cancer risk factors from pathology reports. Methods We analyzed 1,410 surgical pathology reports from adult papillary thyroid cancer patients at Mayo Clinic, Rochester, MN, from 2010 to 2019. Structured and non-structured reports were used to create a consensus-based ground truth dictionary and categorized them into modified recurrence risk levels. Non-structured reports were narrative, while structured reports followed standardized formats. We then developed ThyroPath, a rule-based NLP pipeline, to extract and classify thyroid cancer features into risk categories. Training involved 225 reports (150 structured, 75 unstructured), with testing on 170 reports (120 structured, 50 unstructured) for evaluation. The pipeline's performance was assessed using both strict and lenient criteria for accuracy, precision, recall, and F1-score. Results In extraction tasks, ThyroPath achieved overall strict F-1 scores of 93% for structured reports and 90 for unstructured reports, covering 18 thyroid cancer pathology features. In classification tasks, ThyroPath-extracted information demonstrated an overall accuracy of 93% in categorizing reports based on their corresponding guideline-based risk of recurrence: 76.9% for high-risk, 86.8% for intermediate risk, and 100% for both low and very low-risk cases. However, ThyroPath achieved 100% accuracy across all thyroid cancer risk categories with human-extracted pathology information. Conclusions ThyroPath shows promise in automating the extraction and risk recurrence classification of thyroid pathology reports at large scale. It offers a solution to laborious manual reviews and advancing virtual registries. However, it requires further validation before implementation.

category, pathology report, pipeline, (15 more...)

arXiv.org Artificial Intelligence

2406.00015

Country:

North America > United States > Minnesota > Olmsted County > Rochester (0.35)
North America > United States > Florida > Alachua County > Gainesville (0.14)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Thyroid Cancer (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator

Zhang, Meng, Yin, Dennis, Gangi, Nicholas, Begović, Amir, Chen, Alexander, Huang, Zhaoran Rena, Gu, Jiaqi

arXiv.org Artificial IntelligenceFeb-11-2024

Electronic-photonic computing systems offer immense potential in energy-efficient artificial intelligence (AI) acceleration tasks due to the superior computing speed and efficiency of optics, especially for real-time, low-energy deep neural network (DNN) inference tasks on resource-restricted edge platforms. However, current optical neural accelerators based on foundry-available devices and conventional system architecture still encounter a performance gap compared to highly customized electronic counterparts. To bridge the performance gap due to lack of domain specialization, we present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization. At the device level, we present foundry-compatible, customized photonic devices, including a slow-light electro-optic modulator with experimental demonstration, optical splitters, and phase shifters that significantly reduce the footprint and power in input encoding and dot-product calculation. At the circuit level, partial products are hierarchically accumulated via parallel photocurrent aggregation, lightweight capacitive temporal integration, and sequential digital summation, considerably relieving the analog-to-digital conversion bottleneck. We also employ a multi-tile, multi-core architecture to maximize hardware sharing for higher efficiency. Across diverse edge AI workloads, TeMPO delivers digital-comparable task accuracy with superior quantization/noise tolerance. We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$^2$ compute density, pushing the Pareto frontier in edge AI hardware. This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators with even greater performance and efficiency.

architecture, modulator, tempo, (16 more...)

arXiv.org Artificial Intelligence

2402.07393

Country:

Europe (0.04)
South America > Chile (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Hardware (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

M3ICRO: Machine Learning-Enabled Compact Photonic Tensor Core based on PRogrammable Multi-Operand Multimode Interference

Gu, Jiaqi, Zhu, Hanqing, Feng, Chenghao, Jiang, Zixuan, Chen, Ray T., Pan, David Z.

arXiv.org Artificial IntelligenceDec-28-2023

Photonic computing shows promise for transformative advancements in machine learning (ML) acceleration, offering ultra-fast speed, massive parallelism, and high energy efficiency. However, current photonic tensor core (PTC) designs based on standard optical components hinder scalability and compute density due to their large spatial footprint. To address this, we propose an ultra-compact PTC using customized programmable multi-operand multimode interference (MOMMI) devices, named M3ICRO. The programmable MOMMI leverages the intrinsic light propagation principle, providing a single-device programmable matrix unit beyond the conventional computing paradigm of one multiply-accumulate (MAC) operation per device. To overcome the optimization difficulty of customized devices that often requires time-consuming simulation, we apply ML for optics to predict the device behavior and enable a differentiable optimization flow. We thoroughly investigate the reconfigurability and matrix expressivity of our customized PTC, and introduce a novel block unfolding method to fully exploit the computing capabilities of a complex-valued PTC for near-universal real-valued linear transformations. Extensive evaluations demonstrate that M3ICRO achieves a 3.4-9.6x smaller footprint, 1.6-4.4x higher speed, 10.6-42x higher compute density, 3.7-12x higher system throughput, and superior noise robustness compared to state-of-the-art coherent PTC designs, while maintaining close-to-digital task accuracy across various ML benchmarks. Our code is open-sourced at https://github.com/JeremieMelo/M3ICRO-MOMMI.

expressivity, icro, ptc, (17 more...)

arXiv.org Artificial Intelligence

2305.19505

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
Europe > San Marino > Fiorentino > Fiorentino (0.04)
Asia > Japan > Honshū > Chūgoku > Okayama Prefecture > Okayama (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

TENPLEX: Changing Resources of Deep Learning Jobs using Parallelizable Tensor Collections

Wagenländer, Marcel, Li, Guo, Zhao, Bo, Mai, Luo, Pietzuch, Peter

arXiv.org Artificial IntelligenceDec-8-2023

Deep learning (DL) jobs use multi-dimensional parallelism, i.e they combine data, model, and pipeline parallelism, to use large GPU clusters efficiently. This couples jobs tightly to a set of GPU devices, but jobs may experience changes to the device allocation: (i) resource elasticity during training adds or removes devices; (ii) hardware maintenance may require redeployment on different devices; and (iii) device failures force jobs to run with fewer devices. Current DL frameworks lack support for these scenarios, as they cannot change the multi-dimensional parallelism of an already-running job in an efficient and model-independent way. We describe Tenplex, a state management library for DL frameworks that enables jobs to change the GPU allocation and job parallelism at runtime. Tenplex achieves this by externalizing the DL job state during training as a parallelizable tensor collection (PTC). When the GPU allocation for the DL job changes, Tenplex uses the PTC to transform the DL job state: for the dataset state, Tenplex repartitions it under data parallelism and exposes it to workers through a virtual file system; for the model state, Tenplex obtains it as partitioned checkpoints and transforms them to reflect the new parallelization configuration. For efficiency, these PTC transformations are executed in parallel with a minimum amount of data movement between devices and workers. Our experiments show that Tenplex enables DL jobs to support dynamic parallelization with low overhead.

configuration, parallelism, parallelization configuration, (17 more...)

arXiv.org Artificial Intelligence

2312.05181

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Renton (0.04)
(14 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback