toucan
14 toucans found inside a car dashboard complete rehab
The colorful tropical birds are highly prized in the illegal wildlife trade. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. A male rehabilitated toucan in Bronx Zoo World of Birds. Breakthroughs, discoveries, and DIY tips sent six days a week. Over a dozen keel-billed toucans () that were found stuffed inside of a vehicle's dashboard officially have a clean bill of health.
- Law (1.00)
- Leisure & Entertainment > Zoo & Circus (0.38)
- Media > Photography (0.31)
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
Xu, Zhangchen, Soria, Adriana Meza, Tan, Shawn, Roy, Anurag, Agrawal, Ashish Sunil, Poovendran, Radha, Panda, Rameswar
Large Language Model (LLM) agents are rapidly emerging as powerful systems for automating tasks across domains. Yet progress in the open-source community is constrained by the lack of high quality permissively licensed tool-agentic training data. Existing datasets are often limited in diversity, realism, and complexity, particularly regarding multi-tool and multi-turn interactions. To address this gap, we introduce Toucan, the largest publicly available tool-agentic dataset to date, containing 1.5 million trajectories synthesized from nearly 500 real-world Model Context Protocols (MCPs). Unlike prior work, Toucan leverages authentic MCP environments to generate diverse, realistic, and challenging tasks with trajectories involving real tool execution. Our pipeline first produces a broad spectrum of tool-use queries using five distinct models, applies model-based quality filtering, and then generates agentic trajectories with three teacher models using two agentic frameworks. Rigorous rule-based and model-based validation ensures high-quality outputs. We also introduce three extension mechanisms to further diversify tasks and simulate multi-turn conversations. Models fine-tuned on Toucan outperform larger closed-source counterparts on the BFCL V3 benchmark and push the Pareto frontier forward on MCP-Universe Bench.
- Workflow (0.69)
- Research Report (0.65)
- Banking & Finance (0.68)
- Media > News (0.46)
- Information Technology > Security & Privacy (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Toucan: Many-to-Many Translation for 150 African Language Pairs
Elmadany, AbdelRahim, Adebara, Ife, Abdul-Mageed, Muhammad
We address a notable gap in Natural Language Processing (NLP) by introducing a collection of resources designed to improve Machine Translation (MT) for low-resource languages, with a specific focus on African languages. First, we introduce two language models (LMs), Cheetah-1.2B and Cheetah-3.7B, with 1.2 billion and 3.7 billion parameters respectively. Next, we finetune the aforementioned models to create toucan, an Afrocentric machine translation model designed to support 156 African language pairs. To evaluate Toucan, we carefully develop an extensive machine translation benchmark, dubbed AfroLingu-MT, tailored for evaluating machine translation. Toucan significantly outperforms other models, showcasing its remarkable performance on MT for African languages. Finally, we train a new model, spBLEU-1K, to enhance translation evaluation metrics, covering 1K languages, including 614 African languages. This work aims to advance the field of NLP, fostering cross-cultural understanding and knowledge exchange, particularly in regions with limited language resources such as Africa. The GitHub repository for the Toucan project is available at https://github.com/UBC-NLP/Toucan.
Toucan: Token-Aware Character Level Language Modeling
Fleshman, William, Van Durme, Benjamin
Character-level language models obviate the need for separately trained tokenizers, but efficiency suffers from longer sequence lengths. Learning to combine character representations into tokens has made training these models more efficient, but they still require decoding characters individually. We propose Toucan, an augmentation to character-level models to make them "token-aware". Comparing our method to prior work, we demonstrate significant speed-ups in character generation without a loss in language modeling performance. We then explore differences between our learned dynamic tokenization of character sequences with popular fixed vocabulary solutions such as Byte-Pair Encoding and WordPiece, finding our approach leads to a greater amount of longer sequences tokenized as single items. Our project and code are available at https://nlp.jhu.edu/nuggets/.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
Duke grads' artificial intelligence startup raises capital, launches on Shopify platform WRAL TechWire
A fledgling, artificial intelligence (AI) startup founded by two Duke University grads has closed on a round of venture funding. Toucan AI – started by North Carolina natives and Duke grads Arjun Devarajan and Vishnu Menon – raised an undisclosed sum from a number of sources. They include Investors include MTT Ventures, Silicon Road, Seraph Group and a syndicate of angel investors. The startup said it would use the funds to support its growth on the Shopify App store, an online selling platform that enable entrepreneurs to create a store and host it on the platform web servers. "We're very excited about the level of interest we have received from Shopify sites, the Shopify Partners Program, our new investors, and our selection as a showcase company for Venture Atlanta, the biggest venture capital conference in the Southeast," said Arjun Devarajan, CEO and co-founder of Toucan AI, in a statement.