great place
Composable Text Controls in Latent Space with ODEs
Liu, Guangyi, Feng, Zeyu, Gao, Yuan, Yang, Zichao, Liang, Xiaodan, Bao, Junwei, He, Xiaodong, Cui, Shuguang, Li, Zhen, Hu, Zhiting
Real-world text applications often involve composing a wide range of text control operations, such as editing the text w.r.t. an attribute, manipulating keywords and structure, and generating new text of desired properties. Prior work typically learns/finetunes a language model (LM) to perform individual or specific subsets of operations. Recent research has studied combining operations in a plug-and-play manner, often with costly search or optimization in the complex sequence space. This paper proposes a new efficient approach for composable text operations in the compact latent space of text. The low-dimensionality and differentiability of the text latent vector allow us to develop an efficient sampler based on ordinary differential equations (ODEs) given arbitrary plug-in operators (e.g., attribute classifiers). By connecting pretrained LMs (e.g., GPT2) to the latent space through efficient adaption, we then decode the sampled vectors into desired text sequences. The flexible approach permits diverse control operators (sentiment, tense, formality, keywords, etc.) acquired using any relevant data from different domains. Experiments show that composing those operators within our approach manages to generate or edit high-quality text, substantially improving over previous methods in terms of generation quality and efficiency.
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Kavehzadeh, Parsa, Valipour, Mojtaba, Tahaei, Marzieh, Ghodsi, Ali, Chen, Boxing, Rezagholizadeh, Mehdi
The rapid advancement of large language models (LLMs) has revolutionized natural language processing (NLP). While these models excel at understanding and generating human-like text, their widespread deployment can be prohibitively expensive. SortedNet is a recent training technique for enabling dynamic inference for deep neural networks. It leverages network modularity to create sub-models with varying computational loads, sorting them based on computation/accuracy characteristics in a nested manner. We extend SortedNet to generative NLP tasks, making large language models dynamic without any pretraining and by only replacing standard Supervised Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT) at the same costs. Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference. We show that using this approach, we are able to unlock the potential of intermediate layers of transformers in generating the target output. Our sub-models remain integral components of the original model, minimizing storage requirements and transition costs between different computational/latency budgets. By applying this approach on LLaMa 2 13B for tuning on the Stanford Alpaca dataset and comparing it to normal tuning and early exit via PandaLM benchmark, we show that Sorted Fine-Tuning can deliver models twice as fast as the original model while maintaining or exceeding performance.
Lead ETL Data Engineer at Verisk - Newark, NJ, United States
We help the world see new possibilities and inspire change for better tomorrows. Our analytic solutions bridge content, data, and analytics to help business, people, and society become stronger, more resilient, and sustainable. The Data Engineering and Analytics Lab (DEAL) is a team of technical actuaries responsible for the design and implementation of our core statistical data-systems including data ingestion, data integration, data transformation, data analysis, and analytic dataset construction. We're an innovation group that is charged with visualizing the future of our organization's operations and leveraging our expertise in data, technology, P&C insurance, and process optimization to provide a first-class analytics environment to our data-collection, data-management, actuarial, and data-analytics colleagues. The DEAL team is looking to hire an experienced Lead ETL Data Engineer, ideally having a good combination of an analytical/innovative mindset, technical aptitude, business accumen, communication skills, and a passion for mentoring.
New Mexico Is a Great Place for Sci-Fi
Melinda Snodgrass is the novelist and screenwriter best known for her classic Star Trek: The Next Generation script "The Measure of a Man." Her latest novel, Lucifer's War, pits an unlikely band of heroes against a horde of Lovecraftian monsters that have been spreading fear and ignorance throughout human history. "It's unbelievable now, the kind of nonsense people are accepting, that's being pushed on them by social media," Snodgrass says in Episode 529 of the Geek's Guide to the Galaxy podcast. "I really wanted to make a stand for science and rationality, as opposed to magic and superstition." The book is set in Snodgrass' home state of New Mexico, a place where science and superstition clash in a particularly striking way. "It's a very weird place, where you have Los Alamos laboratory, Sandia laboratories, high-tech, high-energy centers," Snodgrass says, "Some of the finest scientific minds in the world come here to lecture and study and commune with each other, and then on the other side you have people who will balance your aura and sell you a crystal to deal with your cancer."
Data Analyst
Wood Mackenzie is the global leader in data, analysis and consulting across the energy, chemicals, metals, mining, power and renewables sectors. Founded in 1973, our success has always been underpinned by the simple principle of providing trusted research and advice that makes a difference to our customers. Today we have over 2,000 customers ranging from the largest global energy companies and financial institutions to governments as well as smaller market specialists. Our teams are located around the world. This enables us to stay closely connected with customers and the markets and sectors we cover.
E2E Refined Dataset
Toyama, Keisuke, Sudoh, Katsuhito, Nakamura, Satoshi
Although the well-known MR-to-text E2E dataset has been used by many researchers, its MR-text pairs include many deletion/insertion/substitution errors. Since such errors affect the quality of MR-to-text systems, they must be fixed as much as possible. Therefore, we developed a refined dataset and some python programs that convert the original E2E dataset into a refined dataset.
Incident Response Engineer
Cybereason's mission is to'protect it all' – delivering unparalleled prevention, detection, investigation, and response for all endpoints: workstations, laptops, mobile devices and more. Our cyber-defence solutions combine machine learning and AI to analyze threats, connecting huge volumes of data to reveal cyber-attacks and shut them down, as well as block intrusion of known and unknown threats. Since entering the Japan market in 2016, we have seen tremendous growth, now holding #1 market share. We are constantly evolving and hope to expand our team with daring individuals that never give up! Starting this year, we are focusing on reversing the adversaries advantage with the establishment of a new team.
What is Training Data and Why Is It Important for AI and Computer Vision? Find Out Here.
Simply put, training data is a dataset that is used to train a machine learning model. The purpose of training data is to provide the model with examples of how it should behave in different situations. Without training data, it would be very difficult for machines to learn how to perform specific tasks. In this article, we will discuss why training data is important for AI and computer vision, and we will provide some tips on where you can find high-quality training datasets. Training data is important for AI and computer vision because it allows machines to learn from examples.
Towards the global vision of engagement of Generation Z at the workplace: Mathematical modeling
Kycia, Radosław A., Niemczynowicz, Agnieszka, Nieżurawska-Zając, Joanna
The engagement of employees at the workplace is one of the main ingredients for company growth. Therefore, the motivational systems that encourage engagement in the staff can significantly boost the realization of development aids. With the births ranging from the late 1990s till 2010s, the persons from Generation Z started or soon will start their first jobs in companies. High productivity of employees from this generation can be achieved by crafting a proper motivation system. Such a system must also be designed to tie the employee with the company since otherwise, the experience will be lost during the work rotation.
'Video games are a great place for politics': meet India's modern magical realists
In Gujarat, a tiny independent studio is drawing on India's rich literary history to create surreal games that flow like visual poems, evoking decades of colonial literature and folk theatre to draw attention to the politics of today. Through fantastical environments where buildings and oversized monuments are made of rubber sandals and toothpaste tubes, Studio Oleomingus – made up of writer/artist Dhruv Jani and programmer Sushant Chakraborty, with help from another programmer, Vivek Savsaiya – crafts interactive stories that cast a playful light on India's complicated past and present. "We find video games to be excellent spaces for political discourse," Jani tells me over Skype. "The government is hardly bothered about something as'trivial' as video games, and they also give you a lot of room to think and ponder complex ideas." The studio's short, experimental games, drenched in vibrant colours and otherworldly imagery, pay homage to the magical realist, nonsense literature that defined many Indian childhoods.