AITopics | Lynn

Collaborating Authors

Lynn

MegaWika: Millions of reports and their sources across 50 diverse languages

Barham, Samuel, Weller, Orion, Yuan, Michelle, Murray, Kenton, Yarmohammadi, Mahsa, Jiang, Zhengping, Vashishtha, Siddharth, Martin, Alexander, Liu, Anqi, White, Aaron Steven, Boyd-Graber, Jordan, Van Durme, Benjamin

arXiv.org Artificial IntelligenceJul-13-2023

To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials. We process this dataset for a myriad of applications, going beyond the initial Wikipedia citation extraction and web scraping of content, including translating non-English articles for cross-lingual applications and providing FrameNet parses for automated semantic analysis. MegaWika is the largest resource for sentence-level report generation and the only report generation dataset that is multilingual. We manually analyze the quality of this resource through a semantically stratified sample. Finally, we provide baseline results and trained models for crucial steps in automated report generation: cross-lingual question answering and citation retrieval.

computational linguistic, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2307.07049

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Philippines > Mindanao > Bangsamoro > Province of Maguindanao del Norte > City of Cotabato (0.05)
Africa > Togo > Maritime Region > Lome (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.68)

Add feedback

Ask Me Anything: A simple strategy for prompting language models

Arora, Simran, Narayan, Avanika, Chen, Mayee F., Orr, Laurel, Guha, Neel, Bhatia, Kush, Chami, Ines, Sala, Frederic, Ré, Christopher

arXiv.org Artificial IntelligenceNov-19-2022

Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt that demonstrates how to perform the task and no additional training. Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task. To mitigate the high degree of effort involved in prompt-design, we instead ask whether producing multiple effective, yet imperfect, prompts and aggregating them can lead to a high quality prompting strategy. Our observations motivate our proposed prompting method, ASK ME ANYTHING (AMA). We first develop an understanding of the effective prompt formats, finding that question-answering (QA) prompts, which encourage open-ended generation ("Who went to the park?") tend to outperform those that restrict the model outputs ("John went to the park. Output True or False."). Our approach recursively uses the LLM itself to transform task inputs to the effective QA format. We apply the collected prompts to obtain several noisy votes for the input's true label. We find that the prompts can have very different accuracies and complex dependencies and thus propose to use weak supervision, a procedure for combining the noisy predictions, to produce the final predictions for the inputs. We evaluate AMA across open-source model families (e.g., EleutherAI, BLOOM, OPT, and T0) and model sizes (125M-175B parameters), demonstrating an average performance lift of 10.2% over the few-shot baseline. This simple strategy enables the open-source GPT-J-6B model to match and exceed the performance of few-shot GPT3-175B on 15 of 20 popular benchmarks. Averaged across these tasks, the GPT-J-6B model outperforms few-shot GPT3-175B. We release our code here: https://github.com/HazyResearch/ama_prompting

large language model, machine learning, question generation, (21 more...)

arXiv.org Artificial Intelligence

2210.02441

Country:

North America > United States > New Jersey (0.14)
Africa > Middle East > Libya (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.14)
(83 more...)

Genre:

Research Report (1.00)
Personal (0.92)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground (1.00)
Transportation > Air (1.00)
(20 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Top Gear or Black Mirror: Inferring Political Leaning From Non-Political Content

Kurnaz, Ahmet, Hale, Scott A.

arXiv.org Artificial IntelligenceAug-11-2022

Polarization and echo chambers are often studied in the context of explicitly political events such as elections, and little scholarship has examined the mixing of political groups in non-political contexts. A major obstacle to studying political polarization in non-political contexts is that political leaning (i.e., left vs right orientation) is often unknown. Nonetheless, political leaning is known to correlate (sometimes quite strongly) with many lifestyle choices leading to stereotypes such as the "latte-drinking liberal." We develop a machine learning classifier to infer political leaning from non-political text and, optionally, the accounts a user follows on social media. We use Voter Advice Application results shared on Twitter as our groundtruth and train and test our classifier on a Twitter dataset comprising the 3,200 most recent tweets of each user after removing any tweets with political text. We correctly classify the political leaning of most users (F1 scores range from 0.70 to 0.85 depending on coverage). We find no relationship between the level of political activity and our classification results. We apply our classifier to a case study of news sharing in the UK and discover that, in general, the sharing of political news exhibits a distinctive left-right divide while sports news does not.

classifier, dataset, tweet, (17 more...)

arXiv.org Artificial Intelligence

2208.05662

Country:

Asia > Russia (0.14)
Europe > Ireland (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(36 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Information Technology > Services (1.00)
Government > Voting & Elections (1.00)
(4 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Persuading the Body to Regenerate Its Limbs

The New YorkerMay-3-2021, 10:00:00 GMT

Each year, researchers from around the world gather at Neural Information Processing Systems, an artificial-intelligence conference, to discuss automated translation software, self-driving cars, and abstract mathematical questions. It was odd, therefore, when Michael Levin, a developmental biologist at Tufts University, gave a presentation at the 2018 conference, which was held in Montreal. Fifty-one, with light-green eyes and a dark beard that lend him a mischievous air, Levin studies how bodies grow, heal, and, in some cases, regenerate. He waited onstage while one of Facebook's A.I. researchers introduced him, to a packed exhibition hall, as a specialist in "computation in the medium of living systems." Levin began his talk, and a drawing of a worm appeared on the screen behind him.

electricity, levin, planarian, (16 more...)

The New Yorker

Country:

North America > Canada > Quebec > Montreal (0.25)
North America > United States > Virginia (0.05)
North America > United States > Massachusetts > Essex County > Lynn (0.05)
(3 more...)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.89)

Technology:

Information Technology > Communications > Social Media (0.90)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)

Add feedback