Goto

Collaborating Authors

 Law


Regularization for Cox's proportional hazards model with NP-dimensionality

arXiv.org Machine Learning

High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of nonconcave penalized methods for nonpolynomial (NP) dimensional data with censoring in the framework of Cox's proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that nonconcave penalties lead to significant reduction of the "irrepresentable condition" needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the nonconcave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.


The Devil Is in the Details: New Directions in Deception Analysis

AAAI Conferences

In this study, we use the computational textual analysis tool, the Gramulator, to identify and examine the distinctive linguistic features of deceptive and truthful discourse. The theme of the study is abortion rights and the deceptive texts are derived from a Devil’s Advocate approach, conducted to suppress personal beliefs and values. Our study takes the form of a contrastive corpus analysis, and produces systematic differences between truthful and deceptive personal accounts. Results suggest that deceivers employ a distancing strategy that is often associated with deceptive linguistic behavior. Ultimately, these deceivers struggle to adopt a truth perspective. Perhaps of most importance, our results indicate issues of concern with current deception detection theory and methodology. From a theoretical standpoint, our results question whether deceivers are deceiving at all or whether they are merely poorly expressing a rhetorical position, caused by being forced to speculate on a perceived proto-typical position. From a methodological standpoint, our results cause us to question the validity of deception corpora. Consequently, we propose new rigorous standards so as to better understand the subject matter of the deception field. Finally, we question the prevailing approach of abstract data measurement and call for future assessment to consider contextual lexical features. We conclude by suggesting a prudent approach to future research for fear that our eagerness to analyze and theorize may cause us to misidentify deception. After-all, successful deception, which is the kind we seek to detect, is likely to be an elusive and fickle prey.


The Discrete Infinite Logistic Normal Distribution

arXiv.org Machine Learning

We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN is a generalization of the hierarchical Dirichlet process (HDP) that models correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables, and study its statistical properties. We consider applications to topic modeling and derive a variational inference algorithm for approximate posterior inference. We study the empirical performance of the DILN topic model on four corpora, comparing performance with the HDP and the correlated topic model (CTM). To deal with large-scale data sets, we also develop an online inference algorithm for DILN and compare with online HDP and online LDA on the Nature magazine, which contains approximately 350,000 articles.


An existing, ecologically-successful genus of collectively intelligent artificial creatures

arXiv.org Artificial Intelligence

ABSTRACT People sometimes worry about the Singularity (Vinge 1993, Kurzweil 2005), or about the world being taken over by artificially intelligent robots. I believe the risks of these are very small. However, few people recognize that we already share our world with artificial creatures that participate as intelligent agents in our society: corporations. Our planet is inhabited by two distinct kinds of intelligent beings -- individual humans and corporate entities -- whose natures and interests are intimately linked. To coexist well, we need to find ways to define the rights and responsibilities of both individual humans and corporate entities, and to find ways to ensure that corporate entities behave as responsible members of society. CORPORATIONS ARE INTELLIGENT AGENTS A corporation is an artificial legal entity, created by the state through a particular kind of legal agreement. A corporation can own property, can sign contracts, can sue and be sued in court, and can be prosecuted and punished for crimes. It can act as an economic agent on its own behalf in our society. A corporation can have goals, can make plans to achieve those goals, and can use its resources to act to carry out those plans. It solves problems and makes decisions about how best to achieve its goals, so it can be considered as an intelligent agent, as defined by a leading text in Artificial Intelligence (Russell & Norvig 2010, p. 34). An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.... A human agent has eyes, ears, and other organs for sensors and hands, legs, vocal tract, and so on for actuators.


Using Crowdsourcing to Improve Profanity Detection

AAAI Conferences

Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution – making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of list-based profanity detection techniques. The use of crowdsourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.


Challenges in Patrolling to Maximize Pristine Forest Area (Position Paper)

AAAI Conferences

Illegal extraction of forest resources is fought, in many developing countries, by patrols through the forest that seek to deter such activity by decreasing its profitability. With limited resources for performing such patrols, a patrol strategy will seek to distribute the patrols throughout the forest, in space and time, in order to minimize the resulting amount of extraction that occurs or maximize the degree of forest protection, according to one of several potential metrics. We pose this problem as a Stackelberg game. We adopt and extend the simple, geometrically elegant model of (Albers 2010). First, we study optimal allocations of patrol density under generalizations of this model, relaxing several of its assumptions. Second, we pose the problem of generating actual schedules whose site visit frequencies are consistent with the analytically computed optimal patrol densities.


The Mathematics of Aggregation, Interdependence, Organizations and Systems of Nash Equilibria: A Replacement for Game Theory

AAAI Conferences

Traditional social science research has been unable to satisfactorily aggregate individual level data to group, organization and systems levels, making it one of social science’s biggest challenges (Giles, 2011). For game and social theory, we believe that the fault can be attributed to the lack of valid distance measures (e.g., the arbitrary ordering of cooperation and competition precludes a Hilbert space distance metric for the ordering of and gradations between these social behaviors, making game theory normative). Alternatively, we offer a theory of social interdependence with countable mathematics based on bistable or multi-stable perspectives and linear algebra. The evidence that is available is supportive. It indicates that meaning is a one-sided, stable, classical interpretation, not only making the correspondence between beliefs and objective reality in social settings incomplete, raising questioning about static theories from earlier eras (i.e., Axelrod’s evolution of cooperation; Simon’s bounded rationality). The result indicates for open systems (democracies) that interpretations evolve naturally to become orthogonal (Nash equilibria), that orthogonal interpretations generate the information to drive social evolution, but that in closed systems (dictatorships), dependent on the enforcement of social cooperation and the suppression of opposing points of view, evolution slows or stops (e.g., China, Iran or Cuba), causing capital and energy to be wasted, misdirected or misallocated as leaders suppress the interpretations that they alone have the authority to label as unethical, immoral, or irreligious. We conclude that a mathematics based on NE is feasible.



Modeling Polarizing Topics: When Do Different Political Communities Respond Differently to the Same News?

AAAI Conferences

Political discourse in the United States is getting increasingly polarized. This polarization frequently causes different communities to react very differently to the same news events. Political blogs as a form of social media provide an unique insight into this phenomenon. We present a multitarget, semisupervised latent variable model, MCR-LDA to model this process by analyzing political blogs posts and their comment sections from different political communities jointly to predict the degree of polarization that news topics cause. Inspecting the model after inference reveals topics and the degree to which it triggers polarization. In this approach, community responses to news topics are observed using sentiment polarity and comment volume which serves as a proxy for the level of interest in the topic. In this context, we also present computational methods to assign sentiment polarity to the comments which serve as targets for latent variable models that predict the polarity based on the topics in the blog content. Our results show that the joint modeling of communities with different political beliefs using MCR-LDA does not sacrifice accuracy in sentiment polarity prediction when compared to approaches that are tailored to specific communities and additionally provides a view of the polarization in responses from the different communities.


Modelling Time and Reliability in Structured Argumentation Frameworks

AAAI Conferences

Argumentation is a human-like reasoning mechanism contributing to the formalization of commonsense reasoning. In the last decade, several argument-based formalisms have emerged, with application in many areas, such as legal reasoning, autonomous agents and multi-agent systems; many are based on Dung’s seminal work characterizing Abstract Argumentation Frameworks (AF). Recent research in the area has led to Temporal Argumentation Frameworks (TAF) that extend Dung’s by considering the temporal availability of arguments. In this work we introduce a novel framework, called Extended Temporal Argumentation Framework (E-TAF), extending TAF with the capability of modeling availability of attacks among arguments, which allows for instance to model reliability of arguments varying over time. We show how E-TAF can be enriched by considering Structured Abstract Argumentation, adding compositional elements to the abstract arguments involved based on a simplified version of the recently introduced Dynamic Argumentation Frameworks.