Goto

Collaborating Authors

 algorithmic model


Leo Breiman, the Rashomon Effect, and the Occam Dilemma

arXiv.org Machine Learning

In the famous Two Cultures paper, Leo Breiman provided a visionary perspective on the cultures of ''data models'' (modeling with consideration of data generation) versus ''algorithmic models'' (vanilla machine learning models). I provide a modern perspective on these approaches. One of Breiman's key arguments against data models is the ''Rashomon Effect,'' which is the existence of many different-but-equally-good models. The Rashomon Effect implies that data modelers would not be able to determine which model generated the data. Conversely, one of his core advantages in favor of data models is simplicity, as he claimed there exists an ''Occam Dilemma,'' i.e., an accuracy-simplicity tradeoff. After 25 years of powerful computers, it has become clear that this claim is not generally true, in that algorithmic models do not need to be complex to be accurate; however, there are nuances that help explain Breiman's logic, specifically, that by ''simple,'' he appears to consider only linear models or unoptimized decision trees. Interestingly, the Rashomon Effect is a key tool in proving the nullification of the Occam Dilemma. To his credit though, Breiman did not have the benefit of modern computers, with which my observations are much easier to make. Breiman's goal for interpretability was somewhat intertwined with causality: simpler models can help reveal which variables have a causal relationship with the outcome. However, I argue that causality can be investigated without the use of single models, whether or not they are simple. Interpretability is useful in its own right, and I think Breiman knew that too. Technically, my modern perspective does not belong to either of Breiman's Two Cultures, but shares the goals of both of them - causality, simplicity, accuracy - and shows that these goals can be accomplished in other ways, without the limitations Breiman was concerned about.


Technical Perspective: Tapping the Link between Algorithmic Model Counting and Streaming

Communications of the ACM

It is rare and rewarding to connect two vastly different areas of computer science. Fast randomized algorithms in model counting were discovered in the early 1980s, while the area of streaming algorithms did not take off in the theory community until the late 1990s. Only recently were these disparate areas connected in the accompanying paper, where it was observed that the algorithmic techniques developed in the two areas were strikingly similar. This connection has given us exciting streaming algorithms used in database design and in network monitoring, as well as a unified perspective on existing algorithms. What exactly is model counting?


On the equivalence of Occam algorithms

arXiv.org Artificial Intelligence

Many of these analyses have focused on th e implications and uses of complexity-based algorithms defined by Blumer et a l. in two seminal papers [4, 5]. Their algorithms were defined such that they a chieved zero training error on a sample, and outputted a hypothesis whose complexity (VC dimension for continuous alphabets; description length for disc rete ones) was at most a polynomial in the target concept complexity, multiplied b y a sub-linear factor in the sam. These "Occam algorithms" are weak approx imations of the minimum-consistent-hypothesis problem [6]. In this paper, we focus on the continuous-alphabet Occam algorithms. In 1989, Blumer et al. [5] showed that if a concept was learnable by th eir Occam algorithm, then it was polynomially learnable; they left open the question of whether the converse of this theorem was true.


AP and OpenAI enter into two-year partnership to help train algorithmic models

Engadget

The Associated Press (AP) and ChatGPT parent company OpenAI have reached a news-sharing agreement, but not for the reasons you may think. It doesn't involve AI chatbots quickly churning out content, but rather a way for OpenAI to train its algorithmic models, as reported by Axios. The two-year deal gives OpenAI access to select news content and technology from the AP archives, dating back to 1985. All of this sweet, sweet data will be used to improve the efficacy of future iterations of ChatGPT and related tools. This is one of the first high-profile partnerships between a major news organization and an artificial intelligence company.


Can Algorithms be Racist?

#artificialintelligence

As artificial intelligence (A.I.) continues to rapidly integrate within everyday life, there are a few ethical dilemmas that have arisen synchronously and their impact on use cases have become the subject of much debate (Kilbertus et al., 2017; Hardt et al., 2016; Pazzanese, 2020). One such predicament that this paper hinges on has to do with inclusivity and marginalization (Bender et al., 2021). How are notions of participation affected by training data that reinforce hegemonic power in the formation of algorithmic models? Accordingly, this article will seek to spotlight ethical challenges within A.I. via a grounded interpretivist viewpoint gained by qualitatively investigating the literature in order to discuss bias amplifications. As outlined by Bender et al., (2021), there are several juristic and social dilemmas regarding the growth and utilization of language models.


Debate continues over the pros and cons of regulating artificial intelligence

#artificialintelligence

What are the issues of most concern for businesses in the EU Commission's recently published AI Act proposals? Our virtual gathering included representatives from the UK, Netherlands and USA, stretching across the automotive, energy, education, professional services and tech sectors. As with our first AI roundtable, the discussion ranged far and wide. A notable difficulty with the Commission's draft regulation on AI (as proposed, its "AI Act") is that it assumes that an end-to-end "provider" of an AI system can be identified and fixed with liability. The AI Act defines such service providers as the person or organisation that developed the system or had it developed.


Breiman's two cultures: You don't have to choose sides

arXiv.org Machine Learning

Breiman's classic paper casts data analysis as a choice between two cultures: data modelers and algorithmic modelers. Stated broadly, data modelers use simple, interpretable models with well-understood theoretical properties to analyze data. Algorithmic modelers prioritize predictive accuracy and use more flexible function approximations to analyze data. This dichotomy overlooks a third set of models $-$ mechanistic models derived from scientific theories (e.g., ODE/SDE simulators). Mechanistic models encode application-specific scientific knowledge about the data. And while these categories represent extreme points in model space, modern computational and algorithmic tools enable us to interpolate between these points, producing flexible, interpretable, and scientifically-informed hybrids that can enjoy accurate and robust predictions, and resolve issues with data analysis that Breiman describes, such as the Rashomon effect and Occam's dilemma. Challenges still remain in finding an appropriate point in model space, with many choices on how to compose model components and the degree to which each component informs inferences.


The Algorithmic Colonization of Africa -- Real Life

#artificialintelligence

The second annual CyFyAfrica 2019 -- the Conference on Technology, Innovation, and Society -- took place in Tangier, Morocco, in June. It was a vibrant, diverse and dynamic gathering attended by various policymakers, UN delegates, ministers, governments, diplomats, media, tech company representatives, and academics from over 65 nations, mostly African and Asian countries. The conference's central aim, stated unapologetically, was to bring forth the continent's voices in the global discourse. The president of Observer Research Foundation (one of the co-hosts of the conference) in their opening message emphasized that the voices of Africa's youth need to be put front and center as the continent increasingly comes to rely on technology to address its social, educational, health, economic, and financial issues. The conference was intended in part to provide a platform for those young people, and they were afforded that opportunity, along with many Western scholars from various universities and tech developers from industrial and commercial sectors.


Less self-assured AI are unlikely to override human orders

Daily Mail - Science & tech

In the Terminator film franchise, hyper-intelligent robots learn to operate without their human masters, leading to a machine uprising that wipes out most of mankind. Researchers have now recommended that humans design intelligent robots of the future with less self-assurance to stop them breaking away from human control. The team suggest that over-confident artificial intelligence can cause an array of problems. Their research found that an AI that is too self-assured will override the wishes of its human supervisor. The team claim that over-confident artificial intelligence can cause an array of problems.


Interacting with ML Models

#artificialintelligence

The main difference between data analysis today, compared with a decade or two ago, is the way that we interact with it. Previously, the role of statistics was primarily to extend our mental models by discovering new correlations and causal rules. Today, we increasingly delegate parts of our reasoning processes to algorithmic models that live outside our mental models. In my next few posts, I plan to explore some of the issues that arise from this delegation and how ideas such as model interpretability can potentially address them. Throughout this series of posts, I will argue that while current research has barely scratched the surface of understanding the interaction between algorithmic and mental models, these issues will be much more important to the future of data analysis than the technical performance of the models themselves.