Goto

Collaborating Authors

 freelancer


GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace

Sacha, Mikołaj, Jafri, Hammad, Terzolo, Mattie, Sinha, Ayan, Rabinovich, Andrew

arXiv.org Artificial Intelligence

Recommending matches in a text-rich, dynamic two-sided marketplace presents unique challenges due to evolving content and interaction graphs. We introduce GraphMatch, a new large-scale recommendation framework that fuses pre-trained language models with graph neural networks to overcome these challenges. Unlike prior approaches centered on standalone models, GraphMatch is a comprehensive recipe built on powerful text encoders and GNNs working in tandem. It employs adversarial negative sampling alongside point-in-time subgraph training to learn representations that capture both the fine-grained semantics of evolving text and the time-sensitive structure of the graph. We evaluated extensively on interaction data from Upwork, a leading labor marketplace, at large scale, and discuss our approach towards low-latency inference suitable for real-time use. In our experiments, GraphMatch outperforms language-only and graph-only baselines on matching tasks while being efficient at runtime. These results demonstrate that unifying language and graph representations yields a highly effective solution to text-rich, dynamic two-sided recommendations, bridging the gap between powerful pretrained LMs and large-scale graphs in practice.


Remote Labor Index: Measuring AI Automation of Remote Work

Mazeika, Mantas, Gatti, Alice, Menghini, Cristina, Sehwag, Udari Madhushani, Singhal, Shivam, Orlovskiy, Yury, Basart, Steven, Sharma, Manasi, Peskoff, Denis, Lau, Elaine, Lim, Jaehyuk, Carroll, Lachlan, Blair, Alice, Sivakumar, Vinaya, Basu, Sumana, Kenstler, Brad, Ma, Yuntao, Michael, Julian, Li, Xiaoke, Ingebretsen, Oliver, Mehta, Aditya, Mottola, Jean, Teichmann, John, Yu, Kevin, Shaik, Zaina, Khoja, Adam, Ren, Richard, Hausenloy, Jason, Phan, Long, Htet, Ye, Aich, Ankit, Rabbani, Tahseen, Shah, Vivswan, Novykov, Andriy, Binder, Felix, Chugunov, Kirill, Ramirez, Luis, Geralnik, Matias, Mesura, Hernán, Lee, Dean, Cardona, Ed-Yeremai Hernandez, Diamond, Annette, Yue, Summer, Wang, Alexandr, Liu, Bing, Hernandez, Ernesto, Hendrycks, Dan

arXiv.org Artificial Intelligence

AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real-world, economically valuable projects designed to evaluate end-to-end agent performance in practical settings. AI agents perform near the floor on RLI, with the highest-performing agent achieving an automation rate of 2.5%. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking AI impacts and enabling stakeholders to proactively navigate AI-driven labor automation.


Wired and Business Insider remove articles by AI-generated 'freelancer'

The Guardian

Multiple news organisations have taken down articles written by an alleged freelance journalist that now appear to have been generated by AI. On Thursday, Press Gazette reported that at least six publications, including Wired and Business Insider, have removed articles from their websites in recent months after it was discovered that the stories – written under the name of Margaux Blanchard – were AI-generated. Wired published a story titled "They Fell in Love Playing Minecraft. A few weeks later, the outlet took down the story, stating in an editor's note: "After an additional review of the article … Wired editorial leadership has determined this article does not meet our editorial standards." The story cited a "Jessica Hu", an alleged 34-year-old "ordained officiant based in Chicago" who reportedly "made a name for herself as a'digital celebrant', specialising in ceremonies across Twitch, Discord and VRChat", according to Press Gazette, which reviewed the Wired article. Both the Press Gazette and the Guardian were not able to verify the identity of Hu. Press Gazette further reported that in April, Business Insider published two essays by Blanchard titled: "Remote work has been the best thing for me as a parent but the worst as a person" and "I had my first kid at 45.


The Transformative Power of Inspiration

Communications of the ACM

Growing up as a teenager in the 80's, I witnessed the birth and rise of personal computers firsthand. The Commodore 64 was the first computer to enter our home, and apart from the myriad games we played endlessly, it also made me experiment with BASIC (and basic) programming. Despite my early engagement with computing, at school I was more interested in languages and media (I also wasn't strong enough in maths). So, when it was time to go to university, I chose to study communication sciences at the Faculty of Social Sciences at KU Leuven, Belgium. During my studies, my interest in computers never faded, especially as it coincided with the rise of the Internet and the start of the World Wide Web--an evolution I eagerly followed.


Can AI Freelancers Compete? Benchmarking Earnings, Reliability, and Task Success at Scale

Noever, David, McKee, Forrest

arXiv.org Artificial Intelligence

This study explores Large Language Models (LLMs) as autonomous agents for real-world tasks, including freelance software development. This work presents a new benchmark that evaluates LLMs on freelance programming and data analysis tasks derived from economic data. We construct the benchmark using synthetic tasks created from a Kaggle Freelancer dataset of job postings, with all job prices standardized to USD (median fixed-project price around $250, and an average of $306). Each task is accompanied by structured input-output test cases and an estimated price tag, enabling automated correctness checking and a monetary performance valuation. This approach is inspired by OpenAI's recent SWE-Lancer benchmark (1,400 real Upwork tasks worth $1M total). Still, our framework simplifies evaluation using programmatically testable tasks and predicted price values, making it highly scalable and repeatable. On this benchmark, we evaluate four modern LLMs - Claude 3.5 Haiku, GPT-4o-mini, Qwen 2.5, and Mistral. We report each model's accuracy (task success rate and test-case pass rate) and the total "freelance earnings" it achieves (sum of prices of solved tasks). Our results show that Claude 3.5 Haiku performs best, earning approximately $1.52 million USD, followed closely by GPT-4o-mini at $1.49 million, then Qwen 2.5 ($1.33M) and Mistral ($0.70M). We analyze the distribution of errors per task and observe that the strongest models solve the most tasks and rarely fail completely on any project. We discuss the implications of these results for the feasibility of AI as a freelance developer, the advantages and limitations of our automated benchmark approach, and the gap between performance on structured tasks versus the true complexity of real-world freelance jobs.


Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval

Jouanneau, Warren, Palyart, Marc, Jouffroy, Emma

arXiv.org Artificial Intelligence

Finding the perfect match between a job proposal and a set of freelancers is not an easy task to perform at scale, especially in multiple languages. In this paper, we propose a novel neural retriever architecture that tackles this problem in a multilingual setting. Our method encodes project descriptions and freelancer profiles by leveraging pre-trained multilingual language models. The latter are used as backbone for a custom transformer architecture that aims to keep the structure of the profiles and project. This model is trained with a contrastive loss on historical data. Thanks to several experiments, we show that this approach effectively captures skill matching similarity and facilitates efficient matching, outperforming traditional methods.


Websites accuse AI startup Anthropic of bypassing their anti-scraping rules and protocol

Engadget

Freelancer has accused Anthropic, the AI startup behind the Claude large language models, of ignoring its "do not crawl" robots.txt Meanwhile, iFixit CEO Kyle Wiens said Anthropic has ignored the website's policy prohibiting the use of its content for AI model training. Matt Barrie, the chief executive of Freelancer, told The Information that Anthropic's ClaudeBot is "the most aggressive scraper by far." His website allegedly got 3.5 million visits from the company's crawler within a span of four hours, which is "probably about five times the volume of the number two" AI crawler. Similarly, Wiens posted on X/Twitter that Anthropic's bot hit iFixit's servers a million times in 24 hours.


"Generate" the Future of Work through AI: Empirical Evidence from Online Labor Markets

Liu, Jin, Xu, Xingchen, Li, Yongjun, Tan, Yong

arXiv.org Artificial Intelligence

With the advent of general-purpose Generative AI, the interest in discerning its impact on the labor market escalates. In an attempt to bridge the extant empirical void, we interpret the launch of ChatGPT as an exogenous shock, and implement a Difference-in-Differences (DID) approach to quantify its influence on text-related jobs and freelancers within an online labor marketplace. Our results reveal a significant decrease in transaction volume for gigs and freelancers directly exposed to ChatGPT. Additionally, this decline is particularly marked in units of relatively higher past transaction volume or lower quality standards. Yet, the negative effect is not universally experienced among service providers. Subsequent analyses illustrate that freelancers proficiently adapting to novel advancements and offering services that augment AI technologies can yield substantial benefits amidst this transformative period. Consequently, even though the advent of ChatGPT could conceivably substitute existing occupations, it also unfolds immense opportunities and carries the potential to reconfigure the future of work. This research contributes to the limited empirical repository exploring the profound influence of LLM-based generative AI on the labor market, furnishing invaluable insights for workers, job intermediaries, and regulatory bodies navigating this evolving landscape.


Gig Workers Behind AI Face 'Unfair Working Conditions,' Oxford Report Finds

TIME - Tech

And with it, so are the digital labor platforms used by many AI companies to employ human gig workers. Those people perform the vital but often unseen labor of generating or labeling the masses of data that AI systems heavily rely on--often as part of efforts to make AIs more reliable and less biased. Even as these workers take on the vital task of making modern AI safer, the companies that employ them are uniformly failing to meet even a basic threshold of labor rights standards, according to a new report from the University of Oxford's Internet Institute, shared exclusively with TIME. Researchers assessed 15 digital work platforms--among them Amazon Mechanical Turk, Scale AI and Appen--and found that all of them were "still far from safeguarding basic standards of fair work," according to the report. "While the run for AI deployments gets public hype and momentum, workers behind the design, building and testing of these technological solutions, unfortunately, still face enormous challenges and experience unfair working conditions," the report says.


Turbo-charging productivity in Asia: the economic benefits of generative AI

MIT Technology Review

This year, Microsoft commissioned global tech advisory firm Access Partnership, working alongside local partners including the Analytics Association of the Philippines, the Federation of Indian Chambers of Commerce & Industry (FICCI), and the Center for Global Communications (GLOCOM) in Japan, to conduct country-level research on the potential economic impact of generative AI across Asia. The research estimates a potential boost to productive capacity of US$621 billion in India, US$1.1 trillion in Japan, and US$79.3 billion in the Philippines alone, with studies ongoing in Malaysia, Indonesia and South Korea. These country findings are consistent with other global studies--for instance, a recent report by McKinsey estimates generative AI could add up to US$4.4 trillion a year to the global economy. The potential economic growth is so large because generative AI has implications for most types of work: its impact can be thought of as comparable to that of digitalization in general, rather than that of a specific product. In particular, this huge injection of productivity will arise from three channels--generative AI's potential to unleash creativity, accelerate discovery, and enhance efficiency.