Goto

Collaborating Authors

 main problem




SciCode: A Research Coding Benchmark Curated by Scientists

Tian, Minyang, Gao, Luyu, Zhang, Shizhuo Dylan, Chen, Xinan, Fan, Cunwei, Guo, Xuefei, Haas, Roland, Ji, Pan, Krongchon, Kittithat, Li, Yao, Liu, Shengyan, Luo, Di, Ma, Yutao, Tong, Hao, Trinh, Kha, Tian, Chenyu, Wang, Zihan, Wu, Bohao, Xiong, Yanyu, Yin, Shengzhu, Zhu, Minhui, Lieret, Kilian, Lu, Yanxin, Liu, Genglin, Du, Yufeng, Tao, Tianhua, Press, Ofir, Callan, Jamie, Huerta, Eliu, Peng, Hao

arXiv.org Artificial Intelligence

Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields, including mathematics, physics, chemistry, biology, and materials science, we created a scientist-curated coding benchmark, SciCode. The problems in SciCode naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems. It offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only 4.6% of the problems in the most realistic setting. We believe that SciCode demonstrates both contemporary LMs' progress towards becoming helpful scientific assistants and sheds light on the development and evaluation of scientific AI in the future.


How we Rendered 3 Years of Video Using JavaScript

#artificialintelligence

A big part of our product at Lumen5 is our video rendering engine (we're a video creation tool, after all). A couple of years ago, we started experimenting with WebGL as a potential new way to render videos. It's totally transformed the way we think about JavaScript development, and opened our eyes to a completely different web paradigm. When we first started, it wasn't clear that WebGL was the way to go for us. It took a lot of research into the different alternatives before we broke ground on our existing render engine.


Machine-learning project takes aim at disinformation

#artificialintelligence

What is new is how quickly malicious actors can spread disinformation when the world is tightly connected across social networks and internet news sites. We can give up on the problem and rely on the platforms themselves to fact-check stories or posts and screen out disinformation--or we can build new tools to help people identify disinformation as soon as it crosses their screens. Preslav Nakov is a computer scientist at the Qatar Computing Research Institute in Doha specializing in speech and language processing. He leads a project using machine learning to assess the reliability of media sources. That allows his team to gather news articles alongside signals about their trustworthiness and political biases, all in a Google News-like format. "You cannot possibly fact-check every single claim in the world," Nakov explains. Instead, focus on the source. "I like to say that you can fact-check the fake news before it was even written." His team's tool, called the Tanbih News Aggregator, is available in Arabic and English and gathers articles in areas such as business, politics, sports, science and technology, and covid-19. Business Lab is hosted by Laurel Ruma, editorial director of Insights, the custom publishing division of MIT Technology Review. The show is a production of MIT Technology Review, with production help from Collective Next. This podcast was produced in partnership with the Qatar Foundation. "Even the best AI for spotting fake news is still terrible," MIT Technology Review, October 3, 2018 Laurel Ruma: From MIT Technology Review, I'm Laurel Ruma, and this is Business Lab, the show that helps business leaders make sense of new technologies coming out of the lab and into the marketplace.


Beyond NoSQL: The case for distributed SQL

#artificialintelligence

In the beginning, there were files. Later there were navigational databases based on structured files. Then there were IMS and CODASYL, and around 40 years ago we had some of the first relational databases. Throughout much of the 1980s and 1990s "database" strictly meant "relational database." Then with the growing popularity of object-oriented programming languages, some thought the solution to the "impedance mismatch" of object-oriented languages and relational databases was to map objects in the database. Thus we ended up with "object-oriented databases."


Use Case: Robots-As-A-Service: The Future of IoT and Blockchain NewsBTC

#artificialintelligence

We had the opportunity to sit down with Raullen Chai, the CEO of IoTeX, to discuss the state of the collaboration between IoT and blockchain.IoTeX tackles the key roadblocks of IoT: privacy, scalability and decentralization. IoTeX walks us through their novel consensus algorithm, randomized-DPoS (Delegated Proof of Stake), which takes traditional DPoS and combines it with random functions popularized by projects like Dfinity and Algorand. I want to start by asking you to paint a picture of the future of IoT. What will my home look like in ten years? In the future, all devices will be connected to the Internet.


How Canada can Dominate the Computing Revolution • Coinsquare Discover

#artificialintelligence

Due to the exponential growth and development of technology in the last couple decades, the computing world can be broken up into three core technologies: cognitive (AI), connected (IoT), and distributed (Blockchain/Digital-Currency) computing. The'trinity' works together because each type of computing tackles the others' weaknesses and drawbacks. For instance, the main problems of cognitive and connected computing regarding ethics, security, and privacy can be addressed by distributed computing. The main problems of limited hardware and detailed data for distributed and cognitive computing, respectively, can be addressed by connected computing. The main problems of customer protection and automation limitations for distributed and connected computing, respectively, can be addressed by cognitive computing.


How artificial intelligence will transform IT operations and devops

#artificialintelligence

To state that DevOps and IT operations teams will face new challenges in the coming years sounds a bit redundant, as their core responsibility is to solve problems and overcome challenges. However, with the dramatic pace in which the current landscape of processes, technologies, and tools are changing, it has become quite problematic to cope with it. Moreover, the pressure business users have been putting on DevOps and IT operations teams is staggering, demanding that everything should be solved with a tap on an app. However, at the backend, handling issues is a different ball game; the users can't even imagine how difficult it is to find a problem and solve it. One of the biggest challenges IT operations and DevOps teams face nowadays is being able to pinpoint the small yet potentially harmful issues in large streams of Big Data being logged in their environment.


Chatbot offers free legal aid to the homeless

#artificialintelligence

A Stanford undergrad's AI-based chatbot has already helped us with our parking tickets and various legal issues, but now his DoNotPay bot is taking on an even bigger, trickier issue. After receiving acclaim for the bot, which challenged over 160,000 tickets, Joshua Browder taught the program how to help homeless people in the UK claim their right to public housing. The user simply asks for help, and the bot will ask them a series of questions to determine how best to help them. Usually this will involve crafting a claim letter, which the bot fills in with the information that's been provided. In the end, people can save hundreds of dollars in legal advising fees when they need that money the most.