open source python library
LLM Interactive Optimization of Open Source Python Libraries -- Case Studies and Generalization
Florath, Andreas, Kiraly, Franz
With the advent of large language models (LLMs) like GPT-3, a natural question is the extent to which these models can be utilized for source code optimization. This paper presents methodologically stringent case studies applied to well-known open source python libraries pillow and numpy. We find that contemporary LLM ChatGPT-4 (state September and October 2023) is surprisingly adept at optimizing energy and compute efficiency. However, this is only the case in interactive use, with a human expert in the loop. Aware of experimenter bias, we document our qualitative approach in detail, and provide transcript and source code. We start by providing a detailed description of our approach in conversing with the LLM to optimize the _getextrema function in the pillow library, and a quantitative evaluation of the performance improvement. To demonstrate qualitative replicability, we report further attempts on another locus in the pillow library, and one code locus in the numpy library, to demonstrate generalization within and beyond a library. In all attempts, the performance improvement is significant (factor up to 38). We have also not omitted reporting of failed attempts (there were none). We conclude that LLMs are a promising tool for code optimization in open source libraries, but that the human expert in the loop is essential for success. Nonetheless, we were surprised by how few iterations were required to achieve substantial performance improvements that were not obvious to the expert in the loop. We would like bring attention to the qualitative nature of this study, more robust quantitative studies would need to introduce a layer of selecting experts in a representative sample -- we invite the community to collaborate.
Torchhd: An Open Source Python Library to Support Research on Hyperdimensional Computing and Vector Symbolic Architectures
Heddes, Mike, Nunes, Igor, Vergรฉs, Pere, Kleyko, Denis, Abraham, Danny, Givargis, Tony, Nicolau, Alexandru, Veidenbaum, Alexander
Hyperdimensional computing (HD), also known as vector symbolic architectures (VSA), is a framework for computing with distributed representations by exploiting properties of random high-dimensional vector spaces. The commitment of the scientific community to aggregate and disseminate research in this particularly multidisciplinary area has been fundamental for its advancement. Joining these efforts, we present Torchhd, a high-performance open source Python library for HD/VSA. Torchhd seeks to make HD/VSA more accessible and serves as an efficient foundation for further research and application development. The easy-to-use library builds on top of PyTorch and features state-of-the-art HD/VSA functionality, clear documentation, and implementation examples from well-known publications. Comparing publicly available code with their corresponding Torchhd implementation shows that experiments can run up to 100x faster. Torchhd is available at: https://github.com/hyperdimensional-computing/torchhd.
Data Science Content Intern (Remote)
The opportunity to be a part of the exciting early stages of a well-funded, European-based Open Source start-up that has massive growth and venture potential Fully Remote Working Environment 50โฌ/month development budget to learn Data Science, Causal ML, Bayesian Inference or anything you like that applies to your role.
Data Science Content Intern
The opportunity to be a part of the exciting early stages of a well-funded, European-based Open Source start-up that has massive growth and venture potential Fully Remote Working Environment 50โฌ/month development budget to learn Data Science, Causal ML, Bayesian Inference or anything you like that applies to your role.
Awesome Python Data Science Libraries And Frameworks For Free You Should Definitely Use โ Fly Spaceships With Your Mind
Creating complex data and analysis pipelines has never been easier. You'll be inundated with tutorials online. You can learn the language at every turn. Keeping track of it all is not so easy. Learning the programming basics is easy, but keeping track of the technological possibilities only grows with experience. We present you Awesome Python Data Science libraries and frameworks for free that you should know.
TensorFlow Vs Theano - The Choice Of Tool Should Never Depend On One's Own Preferences โ Fly Spaceships With Your Mind
TensorFlow vs Theano โ TensorFlow, along with PyTorch, is currently the best known and most widely used machine learning framework. However, the choice of tool should never depend on one's own preferences, but should be adapted to the data to be examined. Especially in the Big data area, this can prevent a decisive loss of performance. It is therefore also worthwhile to look off the beaten track and to look at other frameworks and libraries in addition to the top dogs. Theano is one such open source Python library.
Announcing Solaris: an open source Python library for analyzing overhead imagery with machine learning
Performing machine learning (ML) and analyzing geospatial data are both hard problems requiring a lot of domain expertise. These limitations have historically meant that one needs to be an expert in both to perform even the most basic analyses, making advances in AI for overhead imagery difficult to achieve. We at CosmiQ Works have asked ourselves: is there anything we can do to reduce this barrier to entry, making it easier to apply machine learning methods to overhead imagery data? Enter Solaris, a new Python library for ML analysis of geospatial data from CosmiQ Works. Solaris builds upon SpaceNet's previous tool suite, SpaceNetUtilities, along with several other CosmiQ projects like BASISS to provide an end-to-end pipeline for geospatial AI. Would you prefer a basic command line interface so you can run a pre-trained model without learning Python?
Scikit-Learn Cheat Sheet: Python Machine Learning
Most of you who are learning data science with Python will have definitely heard already about scikit-learn, the open source Python library that implements a wide variety of machine learning, preprocessing, cross-validation and visualization algorithms with the help of a unified interface. If you're still quite new to the field, you should be aware that machine learning, and thus also this Python library, belong to the must-knows for every aspiring data scientist. That's why DataCamp has created a scikit-learn cheat sheet for those of you who have already started learning about the Python package, but that still want a handy reference sheet. Or, if you still have no idea about how scikit-learn works, this machine learning cheat sheet might come in handy to get a quick first idea of the basics that you need to know to get started. Either way, we're sure that you're going to find it useful when you're tackling machine learning problems!
Scikit-Learn Cheat Sheet: Python Machine Learning
Most of you who are learning data science with Python will have definitely heard already about scikit-learn, the open source Python library that implements a wide variety of machine learning, preprocessing, cross-validation and visualization algorithms with the help of a unified interface. If you're still quite new to the field, you should be aware that machine learning, and thus also this Python library, belong to the must-knows for every aspiring data scientist. That's why DataCamp has created a scikit-learn cheat sheet for those of you who have already started learning about the Python package, but that still want a handy reference sheet. Or, if you still have no idea about how scikit-learn works, this machine learning cheat sheet might come in handy to get a quick first idea of the basics that you need to know to get started. Either way, we're sure that you're going to find it useful when you're tackling machine learning problems!