match
How Language Directions Align with Token Geometry in Multilingual LLMs
Multilingual LLMs demonstrate strong performance across diverse languages, yet there has been limited systematic analysis of how language information is structured within their internal representation space and how it emerges across layers. We conduct a comprehensive probing study on six multilingual LLMs, covering all 268 transformer layers, using linear and nonlinear probes together with a new Token--Language Alignment analysis to quantify the layer-wise dynamics and geometric structure of language encoding. Our results show that language information becomes sharply separated in the first transformer block (+76.4$\pm$8.2 percentage points from Layer 0 to 1) and remains almost fully linearly separable throughout model depth. We further find that the alignment between language directions and vocabulary embeddings is strongly tied to the language composition of the training data. Notably, Chinese-inclusive models achieve a ZH Match@Peak of 16.43\%, whereas English-centric models achieve only 3.90\%, revealing a 4.21$\times$ structural imprinting effect. These findings indicate that multilingual LLMs distinguish languages not by surface script features but by latent representational structures shaped by the training corpus. Our analysis provides practical insights for data composition strategies and fairness in multilingual representation learning. All code and analysis scripts are publicly available at: https://github.com/thisiskorea/How-Language-Directions-Align-with-Token-Geometry-in-Multilingual-LLMs.
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
Mathur, Suyash Vardhan, Bafna, Jainit Sushil, Kartik, Kunal, Khandelwal, Harshita, Shrivastava, Manish, Gupta, Vivek, Bansal, Mohit, Roth, Dan
Existing datasets for tabular question answering typically focus exclusively on text within cells. However, real-world data is inherently multimodal, often blending images such as symbols, faces, icons, patterns, and charts with textual content in tables. With the evolution of AI models capable of multimodal reasoning, it is pertinent to assess their efficacy in handling such structured data. This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data. We explore their ability to reason on tables that integrate both images and text, introducing MMTabQA, a new dataset designed for this purpose. Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs, understanding visual context, and comparing visual content across images. These findings establish our dataset as a robust benchmark for advancing AI's comprehension and capabilities in analyzing multimodal structured data.
Online Dating Is Great---for Investors. For Customers? It's Complicated.
Dating used to be about the end result. Its shift to an online business has made it about the journey. That might not be great for the longevity of consumers' relationships, but it should continue to benefit investors' love affair with publicly traded companies like Match Group and Bumble. Match's apps had nearly 100 million collective monthly active users as of the end of the first quarter. Meanwhile, the number of people willing to pay for so-called "freemium" dating apps continues to climb.
Does The Future Of Dating Hinge On Facebook
Facebook CEO Mark Zuckerberg speaks during the annual F8 summit in San Jose, California on May 1, 2018. Facebook CEO, Mark Zuckerberg, recently sent a small shockwave through the online dating industry when he announced Facebook's plans to roll out dating features at Facebook's F8 conference back in May; shares in dating-giant Match Group immediately plummeted by 16%. Fast forward to today, 22nd June, and investors' initial (and perhaps somewhat knee-jerk) concerns that Facebook would swiftly usurp existing dating apps as the go-to app seem to have been allayed, with Match's share price since recovering. In fact, the announcement itself has generally been welcomed by all the major online dating players. So undeterred were Match Group that they decided to buy a controlling stake in Hinge earlier this week, doubling down on their strategy of acquiring competitor dating brands in spite of the announcement.
Applied AI News
The National Aeronautics and Space Administration (NASA) Goddard Space Flight Center (Greenbelt, Md.) has developed the The system is designed to capture and maintain key scientific knowledge while it reduces common errors made by outside scientists. Johnson Controls (Milwaukee, Wis.), a manufacturer of control products used to monitor buildings, has deployed an intelligent agent-based knowledge-retrieval solution at its help desk to provide fast access to support information. Chester, N.Y.) to improve its ability to match reported wage information. The solution will help the agency match contribution information supplied by employers to an employee's Social Security account. RoyScot Trust, the asset finance arm of the Royal Bank of Scotland (Edinburgh, Scotland), has implemented an expert system-based solution to automate the credit-underwriting process.
Articles
Samuel's successes included a victory by his program over a master-level player. In fact, the opponent was not a master, and Samuel himself had no illusions about his program's strength. This single event, a milestone in AI, was magnified out of proportion by the media and helped to create the impression that checkers was a solved game. Nevertheless, his work stands as a major achievement in machine learning and AI. Since 1950, the checkers world has been dominated by Tinsley.
Techniques and Methodology
Editor's Note: AI workers have claimed for some time A partial evaluator is an interpreter that, with only partial information about a program's inputs, produces a specialized version of the program which exploits the partial information. A similar example is described in more detail in Kahn (1982b). Programming methodology in AI shares much with general programming methodology but differs in significant ways. An AI researcher does not typically understand the problem being programmed very well. An essential aspect of a very common style of doing AI research is to write programs in order to understand something better.
Semantic-Integration Research in the Database Community
Semantic integration has been a longstanding challenge for the database community. It has received steady attention over the past two decades, and has now become a prominent area of database research. In this article, we first review database applications that require semantic integration and discuss the difficulties underlying the integration process. We then describe recent progress and identify open research issues. We focus in particular on schema matching, a topic that has received much attention in the database community, but also discuss data matching (for example, tuple deduplication) and open issues beyond the match discovery context (for example, reasoning with matches, match verification and repair, and reconciling inconsistent data values).
Tenth Anniversary of the Plastics Color Formulation Tool
Since 1994, GE Plastics has employed a case-based reasoning (CBR) tool that determines color formulas that match requested colors. This tool, called FormTool, has saved GE millions of dollars in productivity and material (that is, colorant) costs. The technology developed in FormTool has been used to create an online color-selection tool for our customers called ColorXpress Select. A customer innovation center has been developed around the FormTool software. In offices and factories, in hospitals, homes, and schools, on the road and in outer space, products made with GE materials make life simpler, safer, and more comfortable for people every day.
HITECH CHESS REPORT
In response to this need, Shelby Lyman, the host of past Public Broadcasting Station (PBS) series on world chess championship matches, organized the AGS Challenge Match at the New School for Social Research in New York City. Funding for this event was provided by AGS Computers, Inc., a New Jersey-based software firm. The match was held September 22-25, with one game played each day, and was widely covered by the international press. Participating were Hitech, at 2407 then the highest-rated computer in the world, and International Grandmaster Arnold S. Denker, a former U.S. champion. Denker's rating of 2410 was comparable to that of Hitech.