AITopics | Lu, Yichao

Collaborating Authors

Lu, Yichao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

Shenoy, Ashish, Lu, Yichao, Jayakumar, Srihari, Chatterjee, Debojeet, Moslehpour, Mohsen, Chuang, Pierce, Harpale, Abhay, Bhardwaj, Vikas, Xu, Di, Zhao, Shicong, Zhao, Longfang, Ramchandani, Ankit, Dong, Xin Luna, Kumar, Anuj

arXiv.org Artificial IntelligenceFeb-12-2024

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images, the output of which is used to augment input to a Multimodal Large Language Model (MM-LLM). While building Lumos, we encountered numerous challenges related to STR quality, overall latency, and model inference. In this paper, we delve into those challenges, and discuss the system architecture, design choices, and modeling techniques employed to overcome these obstacles. We also provide a comprehensive evaluation for each component, showcasing high quality and efficiency.

large language model, latency, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2402.08017

Country:

Europe > Spain (0.16)
North America > United States (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.63)

Add feedback

Traffic4cast at NeurIPS 2022 -- Predict Dynamics along Graph Edges from Sparse Node Data: Whole City Traffic and ETA from Stationary Vehicle Detectors

Neun, Moritz, Eichenberger, Christian, Martin, Henry, Spanring, Markus, Siripurapu, Rahul, Springer, Daniel, Deng, Leyan, Wu, Chenwang, Lian, Defu, Zhou, Min, Lumiste, Martin, Ilie, Andrei, Wu, Xinhua, Lyu, Cheng, Lu, Qing-Long, Mahajan, Vishal, Lu, Yichao, Li, Jiezhang, Li, Junjun, Gong, Yue-Jiao, Grötschla, Florian, Mathys, Joël, Wei, Ye, Haitao, He, Fang, Hui, Malm, Kevin, Tang, Fei, Kopp, Michael, Kreil, David, Hochreiter, Sepp

arXiv.org Artificial IntelligenceMar-14-2023

The global trends of urbanization and increased personal mobility force us to rethink the way we live and use urban space. The Traffic4cast competition series tackles this problem in a data-driven way, advancing the latest methods in machine learning for modeling complex spatial systems over time. In this edition, our dynamic road graph data combine information from road maps, $10^{12}$ probe data points, and stationary vehicle detectors in three cities over the span of two years. While stationary vehicle detectors are the most accurate way to capture traffic volume, they are only available in few locations. Traffic4cast 2022 explores models that have the ability to generalize loosely related temporal vertex data on just a few nodes to predict dynamic future traffic states on the edges of the entire road graph. In the core challenge, participants are invited to predict the likelihoods of three congestion classes derived from the speed levels in the GPS data for the entire road graph in three cities 15 min into the future. We only provide vehicle count data from spatially sparse stationary vehicle detectors in these three cities as model input for this task. The data are aggregated in 15 min time bins for one hour prior to the prediction time. For the extended challenge, participants are tasked to predict the average travel times on super-segments 15 min into the future - super-segments are longer sequences of road segments in the graph. The competition results provide an important advance in the prediction of complex city-wide traffic states just from publicly available sparse vehicle data and without the need for large amounts of real-time floating vehicle data.

artificial intelligence, competition, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2303.07758

Country:

Europe > Switzerland (0.46)
North America > United States (0.46)
Europe > Austria (0.46)
(2 more...)

Genre:

Overview (0.54)
Research Report (0.41)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Government > Regional Government (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State Estimation

Lu, Yichao

arXiv.org Artificial IntelligenceFeb-20-2023

Real-time traffic state estimation is essential for intelligent transportation systems. The NeurIPS 2022 Traffic4cast challenge provides an excellent testbed for benchmarking short-term traffic state estimation approaches. This technical report describes our solution to this challenge. In particular, we present an efficient two-stage gradient boosting framework for short-term traffic state estimation. The first stage derives the month, day of the week, and time slot index based on the sparse loop counter data, and the second stage predicts the future traffic states based on the sparse loop counter data and the derived month, day of the week, and time slot index. Experimental results demonstrate that our two-stage gradient boosting framework achieves strong empirical performance, achieving third place in both the core and the extended challenges while remaining highly efficient.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2302.104

Genre: Research Report (0.70)

Industry: Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network

Krenn, Mario, Buffoni, Lorenzo, Coutinho, Bruno, Eppel, Sagi, Foster, Jacob Gates, Gritsevskiy, Andrew, Lee, Harlin, Lu, Yichao, Moutinho, Joao P., Sanjabi, Nima, Sonthalia, Rishi, Tran, Ngoc Mai, Valente, Francisco, Xie, Yangxinyu, Yu, Rose, Kopp, Michael

arXiv.org Artificial IntelligenceSep-23-2022

A tool that could suggest new personalized research directions and ideas by taking insights from the scientific literature could significantly accelerate the progress of science. A field that might benefit from such an approach is artificial intelligence (AI) research, where the number of scientific publications has been growing exponentially over the last years, making it challenging for human researchers to keep track of the progress. Here, we use AI techniques to predict the future research directions of AI itself. We develop a new graph-based benchmark based on real-world data -- the Science4Cast benchmark, which aims to predict the future state of an evolving semantic network of AI. For that, we use more than 100,000 research papers and build up a knowledge network with more than 64,000 concept nodes. We then present ten diverse methods to tackle this task, ranging from pure statistical to pure learning methods. Surprisingly, the most powerful methods use a carefully curated set of network features, rather than an end-to-end AI approach. It indicates a great potential that can be unleashed for purely ML approaches without human knowledge. Ultimately, better predictions of new future research directions will be a crucial component of more advanced research suggestion tools.

data mining, machine learning, node, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s42256-023-00735-0

2210.00881

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Telecommunications > Networks (0.34)
Information Technology > Networks (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Faster Ridge Regression via the Subsampled Randomized Hadamard Transform

Lu, Yichao, Dhillon, Paramveer, Foster, Dean P., Ungar, Lyle

Neural Information Processing SystemsFeb-14-2020, 14:27:26 GMT

We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations ($p \gg n$). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of $O(n 2p)$. Our algorithm (SRHT-DRR) runs in time $O(np\log(n))$ and works by preconditioning the design matrix by a Randomized Walsh-Hadamard Transform with a subsequent subsampling of features. We provide risk bounds for our SRHT-DRR algorithm in the fixed design setting and show experimental results on synthetic and real datasets. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, ridge regression, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Add feedback

Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis

Ma, Zhuang, Lu, Yichao, Foster, Dean

arXiv.org Machine LearningJun-26-2015

Canonical Correlation Analysis (CCA) is a widely used spectral technique for finding correlation structures in multi-view datasets. In this paper, we tackle the problem of large scale CCA, where classical algorithms, usually requiring computing the product of two huge matrices and huge matrix decomposition, are computationally and storage expensive. We recast CCA from a novel perspective and propose a scalable and memory efficient Augmented Approximate Gradient (AppGrad) scheme for finding top $k$ dimensional canonical subspace which only involves large matrix multiplying a thin matrix of width $k$ and small matrix decomposition of dimension $k\times k$. Further, AppGrad achieves optimal storage complexity $O(k(p_1+p_2))$, compared with classical algorithms which usually require $O(p_1^2+p_2^2)$ space to store two dense whitening matrices. The proposed scheme naturally generalizes to stochastic optimization regime, especially efficient for huge datasets where batch algorithms are prohibitive. The online property of stochastic AppGrad is also well suited to the streaming scenario, where data comes sequentially. To the best of our knowledge, it is the first stochastic algorithm for CCA. Experiments on four real data sets are provided to show the effectiveness of the proposed methods.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1506.0817

Country:

Europe (0.68)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

large scale canonical correlation analysis with iterative least squares

Lu, Yichao, Foster, Dean P.

Neural Information Processing SystemsDec-31-2014

Canonical Correlation Analysis (CCA) is a widely used statistical tool with both well established theory and favorable performance for a wide range of machine learning problems. However, computing CCA for huge datasets can be very slow since it involves implementing QR decomposition or singular value decomposition of huge matrices. In this paper we introduce L-CCA, an iterative algorithm which can compute CCA fast on huge sparse datasets. Theory on both the asymptotic convergence and finite time accuracy of L-CCA are established. The experiments also show that L-CCA outperform other fast CCA approximation schemes on two real datasets.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Large scale canonical correlation analysis with iterative least squares

Lu, Yichao, Foster, Dean P.

arXiv.org Machine LearningDec-30-2014

Canonical Correlation Analysis (CCA) is a widely used statistical tool with both well established theory and favorable performance for a wide range of machine learning problems. However, computing CCA for huge datasets can be very slow since it involves implementing QR decomposition or singular value decomposition of huge matrices. In this paper we introduce L-CCA, a iterative algorithm which can compute CCA fast on huge sparse datasets. Theory on both the asymptotic convergence and finite time accuracy of L-CCA are established. The experiments also show that L-CCA outperform other fast CCA approximation schemes on two real datasets.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1407.4508

Country:

Europe (0.46)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fast Ridge Regression with Randomized Principal Component Analysis and Gradient Descent

Lu, Yichao, Foster, Dean P.

arXiv.org Machine LearningMay-15-2014

We propose a new two stage algorithm LING for large scale regression problems. LING has the same risk as the well known Ridge Regression under the fixed design setting and can be computed much faster. Our experiments have shown that LING performs well in terms of both prediction accuracy and computational efficiency compared with other large scale regression algorithms like Gradient Descent, Stochastic Gradient Descent and Principal Component Regression on both simulated and real datasets.

artificial intelligence, machine learning, singular value, (18 more...)

arXiv.org Machine Learning

1405.3952

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.72)

Add feedback

New Subsampling Algorithms for Fast Least Squares Regression

Dhillon, Paramveer, Lu, Yichao, Foster, Dean P., Ungar, Lyle

Neural Information Processing SystemsDec-31-2013

We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data ($n \gg p$). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O($np$) and our best method, {\it Uluru}, gives an error bound of $O(\sqrt{p/n})$ which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the fixed design (with Randomized Hadamard preconditioning) as well as sub-Gaussian random design setting. We also compare the performance of our methods on synthetic and real-world datasets and show that if observations are i.i.d., sub-Gaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback