Goto

Collaborating Authors

 square footage


LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Wang, Jingyuan, Chen, Yankai, Li, Zhonghang, Huang, Chao

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated remarkable progress in reasoning, often through supervised fine-tuning (SFT). However, SFT is resource-intensive, relying on large curated datasets, rejection-sampled demonstrations, and uniform optimization across all tokens, even though only a fraction carry meaningful learning value. In this work, we explore a counterintuitive idea: can smaller language models (SLMs) teach larger language models (LLMs) by revealing high-value reasoning moments that reflect the latter's unique strength? We propose LightReasoner, a novel framework that leverages the behavioral divergence between a stronger expert model (LLM) and a weaker amateur model (SLM). LightReasoner operates in two stages: (1) a sampling stage that pinpoints critical reasoning moments and constructs supervision examples capturing the expert's advantage through expert-amateur contrast, and (2) a fine-tuning stage that aligns the expert model with these distilled examples, amplifying its reasoning strengths. Across seven mathematical benchmarks, LightReasoner improves accuracy by up to 28.1%, while reducing time consumption by 90%, sampled problems by 80%, and tuned token usage by 99%, all without relying on ground-truth labels. By turning weaker SLMs into effective teaching signals, LightReasoner offers a scalable and resource-efficient approach for advancing LLM reasoning. Code is available at: https://github.com/HKUDS/LightReasoner


Building trust in AI: Transparent models for better decisions

AIHub

AI is becoming a part of our daily lives, from approving loans to diagnosing diseases. AI model outputs are used to make increasingly important decisions, based on smart algorithms and data. But if we can't understand these decisions, how can we trust them? One approach to making AI decisions more understandable is to use models that are inherently interpretable. These are models that are designed in such a way that consumers of the model outputs can infer the model's behaviour by reading the parameters of the model. Popular inherently interpretable models include Decision Trees and Linear Regression.


Auto-ICL: In-Context Learning without Human Supervision

Yang, Jinghan, Ma, Shuming, Wei, Furu

arXiv.org Artificial Intelligence

In the era of Large Language Models (LLMs), human-computer interaction has evolved towards natural language, offering unprecedented flexibility. Despite this, LLMs are heavily reliant on well-structured prompts to function efficiently within the realm of In-Context Learning. Vanilla In-Context Learning relies on human-provided contexts, such as labeled examples, explicit instructions, or other guiding mechanisms that shape the model's outputs. To address this challenge, our study presents a universal framework named Automatic In-Context Learning. Upon receiving a user's request, we ask the model to independently generate examples, including labels, instructions, or reasoning pathways. The model then leverages this self-produced context to tackle the given problem. Our approach is universally adaptable and can be implemented in any setting where vanilla In-Context Learning is applicable. We demonstrate that our method yields strong performance across a range of tasks, standing up well when compared to existing methods.


Real Estate Property Valuation using Self-Supervised Vision Transformers

Yazdani, Mahdieh, Raissi, Maziar

arXiv.org Artificial Intelligence

The use of Artificial Intelligence (AI) in the real estate market has been growing in recent years. In this paper, we propose a new method for property valuation that utilizes self-supervised vision transformers, a recent breakthrough in computer vision and deep learning. Our proposed algorithm uses a combination of machine learning, computer vision and hedonic pricing models trained on real estate data to estimate the value of a given property. We collected and pre-processed a data set of real estate properties in the city of Boulder, Colorado and used it to train, validate and test our algorithm. Our data set consisted of qualitative images (including house interiors, exteriors, and street views) as well as quantitative features such as the number of bedrooms, bathrooms, square footage, lot square footage, property age, crime rates, and proximity to amenities. We evaluated the performance of our model using metrics such as Root Mean Squared Error (RMSE). Our findings indicate that these techniques are able to accurately predict the value of properties, with a low RMSE. The proposed algorithm outperforms traditional appraisal methods that do not leverage property images and has the potential to be used in real-world applications.


Predicting housing prices and analyzing real estate market in the Chicago suburbs using Machine Learning

Xu, Kevin, Nguyen, Hieu

arXiv.org Artificial Intelligence

The pricing of housing properties is determined by a variety of factors. However, post-pandemic markets have experienced volatility in the Chicago suburb area, which have affected house prices greatly. In this study, analysis was done on the Naperville/Bolingbrook real estate market to predict property prices based on these housing attributes through machine learning models, and to evaluate the effectiveness of such models in a volatile market space. Gathering data from Redfin, a real estate website, sales data from 2018 up until the summer season of 2022 were collected for research. By analyzing these sales in this range of time, we can also look at the state of the housing market and identify trends in price. For modeling the data, the models used were linear regression, support vector regression, decision tree regression, random forest regression, and XGBoost regression. To analyze results, comparison was made on the MAE, RMSE, and R-squared values for each model. It was found that the XGBoost model performs the best in predicting house prices despite the additional volatility sponsored by post-pandemic conditions. After modeling, Shapley Values (SHAP) were used to evaluate the weights of the variables in constructing models.


Look Out Zillow Here Comes Jestimate!

#artificialintelligence

As someone with expertise in both real estate and data science, I've always been fascinated by Zillow's Zestimate. In the spirit of competition, I've developed Jim's estimate or Jestimate! The following interactive map contains 2018 home sales in San Francisco by neighborhood. Click on the neighborhood and then click on a home in the data table to see the Jestimate results versus the actual sales price. Zestimate uses a proprietary machine learning formula to estimate the current market value of a home.


iManage – Unravelling the Labyrinth of AI Myths: AI does not learn by itself iManage

#artificialintelligence

Encouraged by media portrayals of AI, a widespread myth is that AI simply learns by itself. For example, a common misconception represents AI as a digital brain that can be plugged and played into a given scenario, learning to solve X, Y, Z challenges on its own. Such representations are based on fiction, not fact. While AI is a robotic brain that can learn, it learns in a different way than a human brain. AI uses mathematics and pre-classified data to learn. Crucially, AI needs a human brain to guide it through the learning process by pre-classifying data into categories that it can examine and categorize.


Unravelling the Labyrinth of AI Myths: AI Does Not Learn by Itself - iManage

#artificialintelligence

Encouraged by media portrayals of AI, a widespread myth is that AI simply learns by itself. For example, a common misconception represents AI as a digital brain that can be plugged and played into a given scenario, learning to solve X, Y, Z challenges on its own. Such representations are based on fiction, not fact. While AI is a robotic brain that can learn, it learns in a different way than a human brain. AI uses mathematics and pre-classified data to learn.


Unintended Consequences? The Potential Impact Autonomous Vehicles Could Have On Your Home

Forbes - Tech

Much has been made about how autonomous vehicles will impact parking. To allow for alternative uses once autonomous vehicles become more pervasive, new commercial parking facilities are now frequently being designed and constructed with flexibility in mind. Developers are even going so far as to explore ways to repurpose existing parking garages. Far less has been made about how autonomous vehicles could change residential real estate, rendering the two-car garage obsolete and increasing livable square footage and home values. While no one can predict the future, some of the more prevalent thinking in the field suggests that after dropping off passengers, autonomous vehicles have four options.


WeWork's $20 Billion Dream: The Lavishly Funded Startup That Could Disrupt Commercial Real Estate

#artificialintelligence

With over $4B in funding, WeWork is expanding aggressively at home and abroad and pursuing diverse investments that have raised eyebrows. But its real-estate-as-a-service offering and trove of data on optimal office design could make the company's value prop far more than a marketing ploy. WeWork is a real estate company valued like a tech company. At least, that's the rap on WeWork from critics who think it can't support its $20B valuation in private markets. Backed by Japanese tech and telecom giant SoftBank Group, WeWork specializes in rent arbitrage -- leasing and developing properties at one price, then turning around and renting them out at much higher prices. Its recent run-up in funding -- raising some $4B in 2017 alone -- has given the company the firepower to expand quickly without worrying too much about fundamentals. Companies traded in public markets that follow the same business model trade at much lower sales multiples than WeWork. Detractors say WeWork has earned its valuation by putting hipster touches on formerly drab spaces and positioning itself as a startup incubator, then charging sky-high rent. On top of that, critics point to WeWork's investments in seeming distractions -- like its upcoming WeGrow elementary school and a wave pool company -- as more examples of a tech company with overreaching ambitions. But WeWork's recent shift to safer real estate commitments and its emphasis on longer-term renters and enterprise clients suggest the company could have legs. WeWork claims it's amassing a trove of data on ideal office locations and layouts, and using software to determine everything from ideal desk layout to optimal conference room size. The company is leveraging this data not only to improve its own locations, but also to become an outsourced facilities manager, at a time when big enterprises are trying to shed real estate management from their portfolios.