Collaborating Authors


Baseball Pitch Prediction


The data I used can be found on Kaggle. The overall dataset contains eight comma-separated value(CSV) files, containing data from the MLB seasons 2015–2018. However, I focused on two of the files, pitches and at-bats. The pitches CSV file contained 40 data columns, and the at-bats had 11. Both of them had data values that I would need, so I decided to merge the files.

Machine Learning Model Measures MLB Players' Performances


A team of researchers at the Penn State College of Information Sciences and Technology has developed a machine learning model that can better measure baseball players' and teams' short- and long-term performance. The new method was measured against existing statistical analysis methods called sabermetrics.

Baseball teams are using AI to judge and predict the future of players


They can only dream of what it's like to burst onto the field in The Big Show on Opening Day, but Purdue University outfielders Cam Thompson and Curtis Washington Jr. are among thousands of college baseball players with access to more data-juiced tech than ever to use in the hopes of getting to the majors. One of the tools their team has tested tracks and visualizes every joint in their bodies to measure and analyze their dynamic movements, helping them become a split-second faster on the base paths or gain an edge on runners when they throw home. "I was the slowest on the team," said Thompson in a video describing Purdue's use of 3D Athlete Tracking, or 3DAT, technology developed by Intel, which captures video footage and applies computer vision and deep learning to digitize an individual player's skeletal data and calculate biomechanics. The data and analytical insights gave Thompson and his coaches information revealing that he was bent over just slightly when launching himself from a base. "To the eye, you might not see this, but those first four or five steps were actually slowing him down," said John Madia, director of Baseball Player Development at Purdue.

The Morning After: OpenAI's DALL·E 2 is imagination meets AI image generation


The OpenAI consortium has unveiled the next iteration of DALL·E, a multimodal AI that could generate rudimental, low-res images from a text-based prompt. This time around, the system is capable of generating images at higher resolution and with lower latency than the original. DALL·E 2 uses OpenAI's CLIP image recognition system and adds the ability for users to edit the results. They can now select and edit areas of existing images, add or remove elements, mash together two images into a single collage and generate further variations of an existing image. What's more, the output images are 1,024 pixel squares, up from the 256 x 256-pixel canvases generated by the original version.

Robots as umpires and referees?


Artificial intelligence is about to take on one of the most iconic jobs in all of sports: the baseball umpire. This summer, teams in the AAA league – just one level below Major League Baseball itself – will use sophisticated technology to call balls and strikes on batters. Human umpires will still crouch behind the plate, but a voice in their earpieces will tell them to shout out "ball" or "strike" – as determined by a system known as Automated Ball-Strike. ABS will even calculate different-sized strike zones for tall or short players, just as human umpires must do. Human umps will be able to override the ABS if they feel it made a mistake.

Ozzie Guillen rips idea of 'robot umpires' in MLB

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Ozzie Guillen made clear Sunday he was no fan of robot umpires or automated strike zones coming to Major League Baseball. Robot umpiring was first tested in the independent Atlantic League of Professional Baseball and in the Low-A minor leagues. Triple-A minor league baseball is trying out automated strike zones for the 2022 season.

Learning Temporal Rules from Noisy Timeseries Data Artificial Intelligence

Events across a timeline are a common data representation, seen in different temporal modalities. Individual atomic events can occur in a certain temporal ordering to compose higher level composite events. Examples of a composite event are a patient's medical symptom or a baseball player hitting a home run, caused distinct temporal orderings of patient vitals and player movements respectively. Such salient composite events are provided as labels in temporal datasets and most works optimize models to predict these composite event labels directly. We focus on uncovering the underlying atomic events and their relations that lead to the composite events within a noisy temporal data setting. We propose Neural Temporal Logic Programming (Neural TLP) which first learns implicit temporal relations between atomic events and then lifts logic rules for composite events, given only the composite events labels for supervision. This is done through efficiently searching through the combinatorial space of all temporal logic rules in an end-to-end differentiable manner. We evaluate our method on video and healthcare datasets where it outperforms the baseline methods for rule discovery.

Data Literacy Education Framework – Part 2 -


What do we need to do to increase the data literacy of our organization? In a world where your personal data, and the preferences and biases buried in that data, are being used to influence your behaviors, beliefs, and decisions, data literacy becomes a fundamental skill. And it's not just corporations that need this training. Data Literacy should be taught in universities, in high schools, in middle schools and even in adult education and nursing homes. In the first blog of this two-part series on the Data Literacy Education Framework, I introduced the 4 stages of the Data Literacy Educational Framework, a framework which organizations, universities, high schools, and even adult education programs can use to create a more holistic data literacy training. Now, I want to complete the Data Literacy Education Framework by discussing the third (AI / ML Literacy) and fourth stages (Prediction and Statistical Literacy) of the Data Literacy Education Framework.

Towards a Theoretical Understanding of Word and Relation Representation Machine Learning

Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily assessed, whereas judging that from their spelling is often impossible (e.g. cat /feline) and to predetermine and store similarities between all words is prohibitively time-consuming, memory intensive and subjective. We focus on word embeddings learned from text corpora and knowledge graphs. Several well-known algorithms learn word embeddings from text on an unsupervised basis by learning to predict those words that occur around each word, e.g. word2vec and GloVe. Parameters of such word embeddings are known to reflect word co-occurrence statistics, but how they capture semantic meaning has been unclear. Knowledge graph representation models learn representations both of entities (words, people, places, etc.) and relations between them, typically by training a model to predict known facts in a supervised manner. Despite steady improvements in fact prediction accuracy, little is understood of the latent structure that enables this. The limited understanding of how latent semantic structure is encoded in the geometry of word embeddings and knowledge graph representations makes a principled means of improving their performance, reliability or interpretability unclear. To address this: 1. we theoretically justify the empirical observation that particular geometric relationships between word embeddings learned by algorithms such as word2vec and GloVe correspond to semantic relations between words; and 2. we extend this correspondence between semantics and geometry to the entities and relations of knowledge graphs, providing a model for the latent structure of knowledge graph representation linked to that of word embeddings.

Learning Representations of Entities and Relations Artificial Intelligence

Encoding facts as representations of entities and binary relationships between them, as learned by knowledge graph representation models, is useful for various tasks, including predicting new facts, question answering, fact checking and information retrieval. The focus of this thesis is on (i) improving knowledge graph representation with the aim of tackling the link prediction task; and (ii) devising a theory on how semantics can be captured in the geometry of relation representations. Most knowledge graphs are very incomplete and manually adding new information is costly, which drives the development of methods which can automatically infer missing facts. The first contribution of this thesis is HypER, a convolutional model which simplifies and improves upon the link prediction performance of the existing convolutional state-of-the-art model ConvE and can be mathematically explained in terms of constrained tensor factorisation. The second contribution is TuckER, a relatively straightforward linear model, which, at the time of its introduction, obtained state-of-the-art link prediction performance across standard datasets. The third contribution is MuRP, first multi-relational graph representation model embedded in hyperbolic space. MuRP outperforms all existing models and its Euclidean counterpart MuRE in link prediction on hierarchical knowledge graph relations whilst requiring far fewer dimensions. Despite the development of a large number of knowledge graph representation models with gradually increasing predictive performance, relatively little is known of the latent structure they learn. We generalise recent theoretical understanding of how semantic relations of similarity, paraphrase and analogy are encoded in the geometric interactions of word embeddings to how more general relations, as found in knowledge graphs, can be encoded in their representations.