Trained models for Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020

Sudhanshu Mishra, Shivangi Prasad & Shubhanshu Mishra
Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying Our approach is described in our paper titled: Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020). The source code for training this model and more details can be found on our code...

World Values Survey and World Bank Data for measuring perceptions of expertise in developing nations

Katherine Copas
This dataset include data pulled from the World Bank 2009, the World Values Survey wave 6, Transparency International from 2009. The data were used to measure perceptions of expertise from individuals in nations that are recipients of development aid as measured by the World Bank.

SaltProc output for TAP MSR and MSBR online reprocessing depletion simulations

Andrei Rykhlevskii & Kathryn D. Huff

Tweet IDs annotated for enthusiasm and support towards social causes: CTE, cyberbullying, and LGBT

Shubhanshu Mishra, Sneha Agarwal, Jinlong Guo, Kirstin Phelps , Johna Picco & Jana Diesner
This data has tweets collected in paper Shubhanshu Mishra, Sneha Agarwal, Jinlong Guo, Kirstin Phelps, Johna Picco, and Jana Diesner. 2014. Enthusiasm and support: alternative sentiment classification for social movements on social media. In Proceedings of the 2014 ACM conference on Web science (WebSci '14). ACM, New York, NY, USA, 261-262. DOI: https://doi.org/10.1145/2615569.2615667 The data only contains tweet IDs and the corresponding enthusiasm and support labels by two different annotators.

Data used to construct Table1 and Figs. 2 and 4 in Ainsworth & Long (2020) 30 Years of Free Air Carbon Dioxide Enrichment (FACE): What Have We Learned About Future Crop Productivity and the Potential for Adaptation? Global Change Biology

Stephen Long
Data extracted from Text, Tables and Figures of publications in summarizing crop responses to Free-Air CO2 Elevation (FACE)

RAD-seq genotypes for a Miscanthus sacchariflorus diversity panel

Lindsay Clark, Joyce Njuguna, Xiaoli Jin, Karen Petersen, Kossanou G. Anzoua, Larissa Bagmet, Pavel Chebukin, Martin Deuter, Elena Dzyubenko, Nicolay Dzyubenko, Kweon Heo, Douglas A. Johnson, Uffe Jørgensen, Jens B. Kjeldsen, Hironori Nagano, Junhua Peng, Andrey Sabitov, Toshihiko Yamada, Ji Hye Yoo, Chang Yeon Yu, Stephen P. Long & Erik Sacks
Restriction site-associated DNA sequencing (RAD-seq) data from 643 Miscanthus accessions from a diversity panel, including 613 Miscanthus sacchariflorus, three M. sinensis, and 27 M. xgiganteus. DNA was digested with PstI and MspI, and single-end Illumina sequencing was performed adjacent to the PstI site. Variant and genotype calling was performed with TASSEL-GBSv2, using the Miscanthus sinensis v7.1 reference genome from Phytozome 12 (https://phytozome.jgi.doe.gov). Additional ploidy-aware genotype calling was performed by polyRAD v1.1.

Potential Impacts of Supersonic Aircraft on Stratospheric Ozone and Climate

Jun Zhang, Donald Wuebbles, Douglas Kinnison & Steven Baughcum
This datasets provide basis of our analysis in the paper - Potential Impacts of Supersonic Aircraft on Stratospheric Ozone and Climate. All datasets here can be categorized into emission data and model output data (WACCM). All the model simulations (background and perturbation) were run to steady-state and only the datasets used in analysis are archived here.

Bank Elevation Dataset

Bruce Rhoads & Evan Lindroth
Data on bank elevations determined from lidar data for the Upper Sangamon River, Illinois, the Mission River, Texas, and the White River in Indiana

Second-generation citation context analysis (2010-2019) to retracted paper Matsuyama 2005

Jodi Schneider & Di Ye
Citation context annotation. This is part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. "Continued Citation of a Fraudulent Clinical Trial Report, Eleven Years after it was retracted for Falsifying Data" [R&R under review with Scientometrics]. Publications were selected by examining all citations to the retracted paper Matsuyama 2005, and selecting the 35 citing papers, published 2010 to 2019, which do not mention the retraction, but which mention the...

Citation context annotation for new and newly found citations (2006-2019) to retracted paper Matsuyama 2005

Di Ye, Alison Hill, Ashley Whitehorn (Fulton) & Jodi Schneider
Citation context annotation for papers citing retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) https://doi.org/10.1016/S0012-3692(08)60339-6 ). This is part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. "Continued Citation of a Fraudulent Clinical Trial...

A feasibility study to test a novel approach to dietary weight loss with a focus on assisting informed decision making in food selection

Mindy Lee, Catherine Applegate, Annabelle Shaffer, Abrar Emamaddin, John Erdman & Manabu Nakamura
This small dataset is a raw data of anthropometric and dietary intake data.

Demographic characteristics, site and phylogenetic distribution of dogs with appendicular osteosarcoma: 744 dogs (2000-2015)

Laura Selmic, Marejka Shaevitz, Joanne Tuohy, Laura Garrett & Audrey Ruple
Objective: To report demographic characteristics of a contemporary population of dogs with appendicular osteosarcoma and assess the relationship between demographic characteristics, site distribution, and phylogenetic breed clusters. Design: Retrospective case series. Methods: A search of the Veterinary Medical Database was performed for dogs with appendicular osteosarcoma as a new diagnosis. Entries were reviewed for the sex, neuter status, age at diagnosis, breed, affected limb, and tumor location. The reported breed for purebred dogs was used...

The choices we make and the impacts they have: Machine learning and species delimitation in North American box turtles (Terrapene spp.)

Bradley T. Martin, Tyler K. Chafin, Marlis R. Douglas, John S. Placyk, Roger D. Birkhead, Christopher A. Phillips & Michael E. Douglas
Model-based approaches that attempt to delimit species are hampered by computational limitations as well as the unfortunate tendency by users to disregard algorithmic assumptions. Alternatives are clearly needed, and machine-learning (M-L) is attractive in this regard as it functions without the need to explicitly define a species concept. Unfortunately, its performance will vary according to which (of several) bioinformatic parameters are invoked. Herein, we gauge the effectiveness of M-L-based species-delimitation algorithms by parsing 64 variably-filtered...

Re-evaluating deep neural networks for phylogeny estimation: the issue of taxon sampling

Martin Grosshauser, Paul Zaharias & Tandy Warnow
Deep neural networks (DNNs) are powerful machine learning models that are widely used for classification problems, and have been recently proposed for quartet tree phylogeny estimation (Survorov et al. Systematic Biology 2020 and Zou et al. Molecular Biology and Evolution 2020). Here we present a study evaluating recently trained DNNs (from Zou et al., MBE 2020) in comparison to a collection of standard phylogeny estimation methods, including UPGMA, neighbor joining, maximum parsimony, and maximum likelihood,...

Phylogeny estimation given sequence length heterogeneity

Vladimir Smirnov & Tandy Warnow
Abstract Phylogeny estimation is a major step in many biological studies, and has many well known challenges. With the dropping cost of sequencing technologies, biologists now have increasingly large datasets available for use in phylogeny estimation. Here we address the challenge of estimating a tree given large datasets with a combination of full-length sequences and fragmentary sequences, which can arise due to a variety of reasons, including sample collection, sequencing technologies, and analytical pipelines. We...

Data from: Habitat suitability and connectivity modeling reveal priority areas for Indiana bat (Myotis sodalis) conservation in a complex habitat mosaic

Ashleigh Cable, Joy O'Keefe, Jill Deppe, Tara Hohoff, Steven Taylor & Mark Davis
Context Conservation for the Indiana bat (Myotis sodalis), a federally endangered species in the United States of America, is typically focused on local maternity sites; however, the species is a regional migrant, interacting with the environment at multiple spatial scales. Hierarchical levels of management may be necessary, but we have limited knowledge of landscape-level ecology, distribution, and connectivity of suitable areas in complex landscapes. Objectives We sought to 1) identify factors influencing M. sodalis maternity...

Data from: Genomic evidence of prevalent hybridization throughout the evolutionary history of the fig-wasp pollination mutualism

Gang Wang, Xingtan Zhang, Edward Herre, Charles Cannon, Doyle McKey, Carlos Machado, Wen-Bin Yu, Michael Arnold, Rodrigo Pereira, Ray Ming, Yi-Fei Liu, Yibin Wang, Dongna Ma & Jin Chen
Ficus (figs) and their agaonid wasp pollinators present an ecologically important mutualism that also provides a rich comparative system for studying functional co-diversification throughout its coevolutionary history (~75 million years). We obtained entire nuclear, mitochondrial, and chloroplast genomes for 15 species representing all major clades of Ficus. Multiple analyses of these genomic data suggest that hybridization events have occurred throughout Ficus evolutionary history. Furthermore, cophylogenetic reconciliation analyses detect significant incongruence among all nuclear, chloroplast, and...

Polyethylene upcycling to long-chain alkylaromatics by tandem hydrogenolysis/aromatization

Fan Zhang, Manhao Zeng, Ryan Yappert, Jiakai Sun, Yu-Hsuan Lee, Anne LaPointe, Baron Peters, Mahdi Abu-Omar & Susannah Scott
The current scale of plastics production and the accompanying waste disposal problems represent a largely untapped opportunity for chemical upcycling. Tandem catalytic conversion by Pt/g-Al2O3 converts various polyethylene grades in high yields (up to 80 wt%) to low molecular-weight liquid/wax products, in the absence of added solvent or H2, with little production of light gases. The major components are valuable long-chain alkylaromatics and alkylnaphthenes (average ca. C30, Ð = 1.1). Coupling exothermic hydrogenolysis with endothermic...

Fire and drought effects on soils invaded by Microstegium vimineum in southern Illinois

Jennifer Fraterrigo & Mara Rembelski
We measured the effects of fire or drought treatment on plant, microbial and biogeochemical responses in temperate deciduous forests invaded by the annual grass Microstegium vimineum with a history of either frequent fire or fire exclusion. Please note, on Documentation tab / Experimental or Sampling Design, “15 (XVI)” should be “16 (XVI)”.

Reproduction and hybridization in Celastrus scandens and C. orbiculatus at the Indiana Dunes National Park

David N. Zaya, Stacey A. Leicht-Young, Noel B. Pavlovic & Mary V. Ashley
These data are from an observational study and small experiment investigating reproductive biology and hybridization between two plants, Celastrus scandens L. and Celastrus orbiculatus Thunb. (Celastraceae). These data were collected during the 2008 growing season from the Indiana Dunes National Park (formerly Indiana Dunes National Lakeshore), just east of the municipality of Ogden Dunes, Indiana, USA. The five data files provide information on floral output of the two species, fertilization rate, fruit set rate, hybridization...

Multi-model urban climate projections data from: Global multi-model projections of local urban climates

Lei Zhao, Keith Oleson, Elie Bou-Zeid, Eric Scott Krayenhoff, Andrew Bray, Qing Zhu, Zhonghua Zheng, Chen Chen & Michael Oppenheimer
This dataset contains the emulated global multi-model urban climate projections under RCP 8.5 and RCP 4.5 used in the article "Global multi-model projections of local urban climates" (https://www.nature.com/articles/s41558-020-00958-8). Details about this dataset and the local urban climate emulator are described in the article. This dataset documents the monthly mean projections of urban temperatures and urban relative humidity of 26 CMIP5 Earth system models (ESMs) from 2006 to 2100 across the globe. This dataset may be...

Marsh bird occupancy of wetlands managed for waterfowl in the Midwestern USA - Analysis Inputs

Therin M. Bradshaw, Abigail G. Blake-Bradshaw, Auriel M.V. Fournier, Joseph D. Lancaster, John O'Connell, Christopher N. Jacques, Michael W. Eicholtz & Heath M Hagy
Data inputs, and scripts for the analysis detailed in Bradshaw et al, published in PlosONE 2020.

Data from: Long-term persistence of wildlife populations in a pastoral area

Christian Kiffner, John Kioko, Jack Baylis, Camille Beckwith, Craig Brunner, Christine Burns, Vasco Chavez-Molina, Sara Cotton, Laura Glazik, Ellen Loftis, Megan Moran, Caitlin O’Neill, Ole Theisinger & Bernard Kissui
Facilitating coexistence between people and wildlife is a major conservation challenge in East Africa. Some conservation models aim to balance the needs of people and wildlife, but the effectiveness of these models is rarely assessed. Using a case-study approach, we assessed the ecological performance of a pastoral area in northern Tanzania (Manyara Ranch) and established a long-term wildlife population monitoring programme (carried out intermittently from 2003-2008 and regularly from 2011-2019) embedded in a distance sampling...

Cline Center Historical Phoenix Event Data. Cline Center for Advanced Social Research. v1.3.0. May 4

Scott Althaus, Joseph Bajjalieh, John Carter, Buddy Peyton & Dan Shalmon
The Cline Center Historical Phoenix Event Data covers the period 1945-2019 and includes 8.2 million events extracted from 21.2 million news stories. This data was produced using the state-of-the-art PETRARCH-2 software to analyze content from the New York Times (1945-2018), the BBC Monitoring's Summary of World Broadcasts (1979-2019), the Wall Street Journal (1945-2005), and the Central Intelligence Agency’s Foreign Broadcast Information Service (1995-2004). It documents the agents, locations, and issues at stake in a wide...

Grey Literature and Development: The Non-Governmental Organization in Action

Lynne Rudasill
Traditionally, the non-governmental organization working in the area of development has been viewed as a trusted source for research and information on specific topics and populations. With the advent of the World Wide Web, many of these organizations are working to make their expertise available to a large number of users. This preliminary study surveys non-governmental organizations working in several areas of health-related activity to ascertain what types of information they are making available on...

