Goto

Collaborating Authors

 falcon



Falcon: FastSpectralInferenceonEncryptedData

Neural Information Processing Systems

IntheHE-based MLaaSsetting,aclientencrypts thesensitive data, and uploads the encrypted data to the server that directly processes the encrypted data without decryption, and returns the encrypted result to the client. The client'S data privacy is preserved since only the client has the private key. Existing HE-enabled Neural Networks (HENNs), however, suffer from heavy computational overheads.


Falcon: Fast Spectral Inference on Encrypted Data

Neural Information Processing Systems

Homomorphic Encryption (HE) based secure Neural Networks(NNs) inference is one of the most promising security solutions to emerging Machine Learning as a Service (MLaaS). In the HE-based MLaaS setting, a client encrypts the sensitive data, and uploads the encrypted data to the server that directly processes the encrypted data without decryption, and returns the encrypted result to the client. The clients' data privacy is preserved since only the client has the private key. Existing HE-enabled Neural Networks (HENNs), however, suffer from heavy computational overheads. The state-of-the-art HENNs adopt ciphertext packing techniques to reduce homomorphic multiplications by packing multiple messages into one single ciphertext.


Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework

Hassan, Tasnimul, Karim, Md Faisal, Jeelani, Haziq, Behnam, Elham, Green, Robert, Syed, Fayeq Jeelani

arXiv.org Artificial Intelligence

Medical question-answering (QA) systems can benefit from advances in large language models (LLMs), but directly applying LLMs to the clinical domain poses challenges such as maintaining factual accuracy and avoiding hallucinations. In this paper, we present a retrieval-augmented generation (RAG) based medical QA system that combines domain-specific knowledge retrieval with open-source LLMs to answer medical questions. We fine-tune two state-of-the-art open LLMs (LLaMA~2 and Falcon) using Low-Rank Adaptation (LoRA) for efficient domain specialization. The system retrieves relevant medical literature to ground the LLM's answers, thereby improving factual correctness and reducing hallucinations. We evaluate the approach on benchmark datasets (PubMedQA and MedMCQA) and show that retrieval augmentation yields measurable improvements in answer accuracy compared to using LLMs alone. Our fine-tuned LLaMA~2 model achieves 71.8% accuracy on PubMedQA, substantially improving over the 55.4% zero-shot baseline, while maintaining transparency by providing source references. We also detail the system design and fine-tuning methodology, demonstrating that grounding answers in retrieved evidence reduces unsupported content by approximately 60%. These results highlight the potential of RAG-augmented open-source LLMs for reliable biomedical QA, pointing toward practical clinical informatics applications.


FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination

He, Chengyang, Sun, Ge, Bai, Yue, Lu, Junkai, Zhao, Jiadong, Sartoretti, Guillaume

arXiv.org Artificial Intelligence

F ALCON actively decouples locomotion and manipulation through two modular diffusion policies, coordinated by a vision-language foundation model. The VLM encodes global scene context, proprioceptive states, and goal instructions into a shared latent embedding that conditions both subsystems. Abstract--We present FoundAtion-model-guided decoupled LoCO-maNipulation visuomotor policies (F ALCON), a framework for loco-manipulation that combines modular diffusion policies with a vision-language foundation model as the coordinator . Our approach explicitly decouples locomotion and manipulation into two specialized visuomotor policies, allowing each subsystem to rely on its own observations. This mitigates the performance degradation that arise when a single policy is forced to fuse heterogeneous, potentially mismatched observations from locomotion and manipulation. Our key innovation lies in restoring coordination between these two independent policies through a vision-language foundation model, which encodes global observations and language instructions into a shared latent embedding conditioning both diffusion policies. On top of this backbone, we introduce a phase-progress head that uses textual descriptions of task stages to infer discrete phase and continuous progress estimates without manual phase labels. T o further structure the latent space, we incorporate a coordination-aware contrastive loss that explicitly encodes cross-subsystem compatibility between arm and base actions. Results show that it surpasses centralized and decentralized baselines while exhibiting improved robustness and generalization to out-of-distribution scenarios. ECENT progress in robot learning and foundation models has rekindled the longstanding vision of general-purpose robots that can move through unstructured environments and manipulate diverse objects with minimal task-specific engineering. Large Behavior Models (LBMs) extend the diffusion policy paradigm to multi-task dexterous manipulation [1], training a single policy across broad datasets of real and simulated trajectories. Robotics' Memo platform [8], demonstrate impressive whole-body behaviors that combine locomotion, manipulation, and language grounding in increasingly realistic environments. These developments suggest a future where robot generalist models consume raw sensor streams and language instructions and directly output actions to interact with the physical world. However, loco-manipulation, jointly controlling a mobile base and one or more arms, remains especially challenging on legged platforms [9]-[11], where the same body must simultaneously maintain stability and accomplish precise manipulation under different sensor streams and poses. In this work, we focus on a specific yet representative setting in which an arm-mounted quadruped robot performs long-horizon loco-manipulation tasks using only RGB observations, proprioceptive states, and sparse language instructions.


FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation

Zhang, Yuanhang, Yuan, Yifu, Gurunath, Prajwal, Gupta, Ishita, Omidshafiei, Shayegan, Agha-mohammadi, Ali-akbar, Vazquez-Chanlatte, Marcell, Pedersen, Liam, He, Tairan, Shi, Guanya

arXiv.org Artificial Intelligence

Humanoid loco-manipulation holds transformative potential for daily service and industrial tasks, yet achieving precise, robust whole-body control with 3D end-effector force interaction remains a major challenge. Prior approaches are often limited to lightweight tasks or quadrupedal/wheeled platforms. To overcome these limitations, we propose FALCON, a dual-agent reinforcement-learning-based framework for robust force-adaptive humanoid loco-manipulation. FALCON decomposes whole-body control into two specialized agents: (1) a lower-body agent ensuring stable locomotion under external force disturbances, and (2) an upper-body agent precisely tracking end-effector positions with implicit adaptive force compensation. These two agents are jointly trained in simulation with a force curriculum that progressively escalates the magnitude of external force exerted on the end effector while respecting torque limits. Experiments demonstrate that, compared to the baselines, FALCON achieves 2x more accurate upper-body joint tracking, while maintaining robust locomotion under force disturbances and achieving faster training convergence. Moreover, FALCON enables policy training without embodiment-specific reward or curriculum tuning. Using the same training setup, we obtain policies that are deployed across multiple humanoids, enabling forceful loco-manipulation tasks such as transporting payloads (0-20N force), cart-pulling (0-100N), and door-opening (0-40N) in the real world.


FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Alignment

Kim, Myunsoo, Shim, Seong-Woong, Lee, Byung-Jun

arXiv.org Artificial Intelligence

False negatives pose a critical challenge in vision-language pretraining (VLP) due to the many-to-many correspondence between images and texts in large-scale datasets. These false negatives introduce conflicting supervision signals that degrade the learned embedding space and diminish the effectiveness of hard negative sampling. In this paper, we propose FALCON (False-negative Aware Learning of COntrastive Negatives), a learning-based mini-batch construction strategy that adaptively balances the trade-off between hard and false negatives during VLP. Rather than relying on fixed heuristics, FALCON employs a negative mining scheduler that dynamically selects negative samples of appropriate hardness for each anchor instance during mini-batch construction, guided by a proxy for cross-modal alignment improvement. Experimental results demonstrate that FALCON significantly improves performance across three vision-language learning frameworks (ALBEF, BLIP-2, SigLIP-2) and a broad range of downstream tasks and evaluation settings, underscoring its effectiveness and robustness in mitigating the impact of false negatives.


Falcon: A Comprehensive Chinese Text-to-SQL Benchmark for Enterprise-Grade Evaluation

Luo, Wenzhen, Guan, Wei, Yao, Yifan, Pan, Yimin, Wang, Feng, Yu, Zhipeng, Wen, Zhe, Chen, Liang, Zhuang, Yihong

arXiv.org Artificial Intelligence

We introduce Falcon, a cross-domain Chinese text-to-SQL benchmark grounded in an enterprise-compatible dialect (MaxCompute/Hive). It contains 600 Chinese questions over 28 databases; 77% require multi-table reasoning and over half touch more than four tables. Each example is annotated along SQL-computation features and Chinese semantics. For evaluation, we release a robust execution comparator and an automated evaluation pipeline, under which all current state-of-the-art large-scale models (including Deepseek) achieve accuracies of at most 50%. Major errors originate from two sources: (1) schema linking in large enterprise landscapes - hundreds of tables, denormalized fields, ambiguous column names, implicit foreign-key relations and domain-specific synonyms that make correct join/column selection difficult; and (2) mapping concise, colloquial Chinese into the exact operators and predicates required for analytics - e.g., choosing the correct aggregation and group-by keys, expressing time windows and granularities, applying unit conversions, handling NULLs and data-quality rules, and formulating nested or windowed subqueries. Falcon therefore targets Chinese-specific semantics and enterprise dialects (abbreviations, business jargon, fuzzy entity references) and provides a reproducible middle ground before full production deployment by using realistic enterprise schemas, query templates, an execution comparator, and an automated evaluation pipeline for end-to-end validation.


From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors

Zhang, Zhengshen, Li, Hao, Dai, Yalun, Zhu, Zhengbang, Zhou, Lei, Liu, Chenchen, Wang, Dong, Tay, Francis E. H., Chen, Sijin, Liu, Ziwei, Liu, Yuxiao, Li, Xinghang, Zhou, Pan

arXiv.org Artificial Intelligence

Existing vision-language-action (VLA) models act in 3D real-world but are typically built on 2D encoders, leaving a spatial reasoning gap that limits generalization and adaptability. Recent 3D integration techniques for VLAs either require specialized sensors and transfer poorly across modalities, or inject weak cues that lack geometry and degrade vision-language alignment. In this work, we introduce FALCON (From Spatial to Action), a novel paradigm that injects rich 3D spatial tokens into the action head. FALCON leverages spatial foundation models to deliver strong geometric priors from RGB alone, and includes an Embodied Spatial Model that can optionally fuse depth, or pose for higher fidelity when available, without retraining or architectural changes. To preserve language reasoning, spatial tokens are consumed by a Spatial-Enhanced Action Head rather than being concatenated into the vision-language backbone. These designs enable FALCON to address limitations in spatial representation, modality transferability, and alignment. In comprehensive evaluations across three simulation benchmarks and eleven real-world tasks, our proposed FALCON achieves state-of-the-art performance, consistently surpasses competitive baselines, and remains robust under clutter, spatial-prompt conditioning, and variations in object scale and height.