6,610 Works

Parallelization of an Axially Symmetric Steady Flow Program

Norm Beekwilder & Andrew Grimshaw
Description of the modifications made to an axially symmetric steady flow program to convert it from a sequential code to a parallel code. Some of the difficulties encountered and performance results are discussed.

Elastic Localization

Pascal Vicaire & John Stankovic
Numerous wireless sensor network algorithms assume that individual sensors possess location information. However, state of the art localization algorithms often achieve acceptable performance only under restrictive as- sumptions. For instance, some algorithms necessitate regu- lar sensor deployment or centralized computations. Other algorithms require a high proportion of position aware nodes or the ability to accurately infer emission distance or emission direction of received radio signals. We propose the Elastic Localization Algorithm (ELA), a distributed, scalable,...

.NET Security: Lessons Learned and Missed from JAVA

Nathanael Paul & Dave Evans
Many systems execute untrusted programs in virtual machines (VMs) to limit their access to system resources. Sun introduced the Java VM in 1995, primarily intended as a lightweight platform for execution of untrusted code inside web pages. More recently, Microsoft developed the .NET platform with similar goals. Both platforms share many design and implementation properties, but there are key differences between Java and .NET that have an impact on their security. This paper examines how...

GPU_Accelerated Render Cache

Tenghui Zhu, Rui Wang & David Luebke
The Render Cache can assist a high quality renderer to achieve interactive frame rate by reusing cached 3D samples. These samples can be reprojected to the new frames when viewpoint changes. However, the original Render Cache imposes high computation overheads on the CPU. With highly parallelized vector computation units, the GPU is a good candidate to address the computational requirements of the Render Cache. In this paper, we present a system that implements the Render...

Leveraging Memory Level Parallelism Using Dynamic Warp Subdivision

Jiayuan Meng, David Tarjan & Kevin Skadron
SIMD organizations have shown to allow high throughput for data-parallel applications. They can operate on multiple datapaths under the same instruction sequencer, with its set of operations happening in lockstep sometimes referred to as warps and a single lane referred to as a thread. However, ability of SIMD to gather from disparate addresses instead of aligned vectors means that a single long latency mem- ory access will suspend the entire warp until it completes. This...

Parallel Scan for Stream Architectures

Duane Merrill & Andrew Grimshaw
One of the most useful algorithmic primitives for parallel processing is scan (also known as prefix scan, prefix sum, prefix reduction, etc.). Because the computational granularity of concurrent scan tasks is so small, the memory bandwidth between physical processing elements and globally-visible storage banks is the limiting hardware resource. Current implementations of parallel scan for GPGPU stream architectures do not maximize memory bandwidth: they either make inefficient use of device memory accesses, are computationally-bound due...

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architectures

Jeremy Sheaffer & Kevin Skadron
We present Fractal, and API for programming parallel workloads on multithreaded and multicore architectures. Fractal schedules threads on processors with cache configuration in mind in order to minimize cache misses. Fractal achieves speedups of nearly 2x on Sun NIAGRA, while processors with less hyperthreading see less impressive performance improvements.

Assigning Real-Time Tasks to Homogeneous Multiprocessor Systems

Almut Burchard, Jorg Liebeherr, Yingfeng Oh & Sang Son
Optimal scheduling of real-time tasks on multiprocessor systems is known to be computationally intractable for large task sets. Any practical scheduling algorithm for assigning real-time tasks to a multiprocessor system presents a trade-off between its computational complexity and its performance. The performance of a scheduling algorithm is measured in terms of the additional number of processors required to arrive at a schedule without deadline violations as compared to an optimal algorithm. In this study, new...

Managing Contention and Timing Constraints in a Real-Time Database System

Matthew Lehr & Sang Son
This technical report discusses how current real-time technology has been applied to a database management system to support firm real-time transactions. The report reviews priority-based CPU- and resource scheduling concepts and shows how they are used to avoid the problem of priority inversion in transaction service order, transaction progress, and memory allocation. Next, the appropriateness of optimistic concurrency control to real-time data management is examined, and the implementation of previously proposed methods WAIT-X(S) and Precise...

Consistency Maintenance using UNIFY

Anand Natrajan, Jr Reynolds & Sudhir Srinivasan
Distributed simulations comprised of aggregated entities (AEs) and disaggregated entities (DEs) pose critical consistency issues when AEs and DEs interact. Usually, meaningful interaction cannot take place without one of the two representing itself at a level of resolution compatible with the level of the other. Any approach that employs dynamic transitions between aggregated and disaggregated resolution levels suffers from not only potential consistency problems, but also chain disaggregation, network flooding, transition latency, and mapping problems...

The Core Legion Object Model

Michael Lewis & Andrew Grimshaw
This document describes the core Legion object model. The model specifies the composition and functionality of Legion's core objects - those objects that cooperate to create, locate, manage, and remove objects from the Legion system. The model reflects the underlying philosophy and objectives of the Legion project. In particular, the object model facilitates a flexible extensible implementation, provides a single persistent name space, grants site autonomy to participating organizations, and scales to millions of sites...

Architectural Support for Extensibility and Autonomy in Wide-Area Distributed Object Systems

Andrew Grimshaw, Michael Lewis, Adam Ferrari & John Karpovich
The Legion system defines a software architecture designed to support metacomputing, the use of large collections of heterogeneous computing resources distributed across local- and wide-area networks as a single, seamless virtual machine. Metasystems software must be extensible because no single system can meet all of the diverse, often conflicting, requirements of the entire present and future user community, nor can a system constructed today take best advantage of unanticipated future hardware advances. Metasystems software must...

The Case for Feedback Control Real-Time Scheduling

John Stankovic, Chenyang Lu & Sang Son
Despite the significant body of results in real-time scheduling, many real world problems are not easily supported. While algorithms such as Earliest Deadline First, Rate Monotonic, and the Spring scheduling algorithm can support sophisticated task set characteristics (such as deadlines, precedence constraints, shared resources, jitter, etc.), they are all "open loop" scheduling algorithms. Open loop refers to the fact that once schedules are created they are not "adjusted" based on continuous feedback. While open-loop scheduling...

Monitoring Temperature in FPGA Based SoCs

Siva Velusamy, Wei Huang, John Lach & Kevin Skadron
FPGA logic densities continue to increase at a tremen- dous rate. This has had the undesired consequence of in- creased power density, which manifests itself as higher on- die temperatures and local hotspots. Sophisticated packag- ing techniques have become essential to maintain the health of the chip. In addition to static techniques to reduce the temperature, dynamic thermal management techniques are essential. Such techniques rely on accurate on-chip tem- perature information. In this paper, we...

On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance

Sung Chung & Kevin Skadron
This paper describes a new on-demand wakeup prediction policy for instruction cache leakage control that achieves better leakage savings than prior policies, and avoids the performance overheads of prior policies. The proposed policy reduces leakage energy by more than 92% with only less than 0.3% performance overhead on average, whereas prior policies were either prone to severe performance overhead or failed to reduce the leakage energy as much. The key to this new on-demand policy...

Granularity of Microprocessor Thermal Management: A Technical Report

Karthik Sankaranarayanan, Wei Huang, Mircea Stan, Hossein HajHariri, Robert Ribando & Kevin Skadron
Process technology scaling, poor supply voltage scaling and the resultant exponential increase in power density have made temperature a first-class design constraint in today�s microprocessors. An interesting question in the context of thermal management and multi-core architectures is about the correct size granularity of thermal management. It is known that the silicon substrate acts as a spatial low-pass filter for temperature. This means that if blocks with very high power density are small enough (for...

Genetic Algorithms in Autonomous Embedded Systems

Chris Gregg
The performance and usefulness of autonomous embedded systems (AES) can be enhanced by providing them with artificial intelligence (AI). Because embedded systems are generally constrained by mul- tiple factors (e.g., power consumption, processing speed, memory, etc.), fully-fledged AI implementations are not feasible for most AES designs. However, microprocessors targeted at embedded systems have improved to the point where it is possible to include certain AI methods in embedded designs. Genetic algorithms offer a modicum of...

A Genetic Programming Approach to Shader Simplification

Pitchaya SitthiAmorn, Nicholas Modly, Wesley Wiemer & Jason Lawrence
The programmability of modern graphics hardware has led to an enormous increase in pixel shader complexity and a commensu- rate increase in the difficulty and effort needed to optimize them. We present a framework based on Genetic Programming (GP) for automatically simplifying pixel shaders. Our approach computes a series of increasingly simplified shaders that expose the inher- ent trade-off between speed and accuracy. Compared to existing automatic methods for shader simplification [Olano et al. 2003;...

Increasing Memory Bandwidth for Vector Computations

Sally McKee, Steven Moyer, Wm Wulf & Charles Hitchcock
Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance micro- processors to vector-like algorithms, including the "Grand Challenge" scientific problems. Caching is not the sole solution for these applications due to the poor temporal and spatial locality of their data accesses. Moreover, the nature of memories themselves has changed. Achieving greater bandwidth requires exploiting the characteristics of memory components "on the other side of the cache" - they should not...

Super-criticality revisited.

Sudhir Srinivasan & Jr Reynolds
Critical path analysis has been suggested as a technique for establishing a lower bound on the completion times of all parallel discrete event simulations (PDES). A protocol is super-critical if there is at least one simulation that can complete in less than the critical path time using that protocol. Previous studies have shown that several practical protocols are super-critical while others are not. We present a sufficient condition to demonstrate that a protocol is super-critical....

New Graph Arborescence and Steiner Constructions for High-Performance FPGA Routing

Michael Alexander & Gabriel Robins
The flexibility and reusability of field-programmable gate arrays (FPGAs) enable significant speed and cost savings in the VLSI design/validation/simulation cycle. However, this is achieved at a substantial performance penalty due to signal delays through the programmable interconnect. This motivates a critical-net routing objective which seeks to minimize source-sink signal propagation delays; we formulate this objective as a graph Steiner arborescence (i.e., shortest-path tree with minimum wirelength) problem and propose an effective heuristic which produces routing...

SOR as a Preconditioner

M DeLong & J Ortega
We show by experimental results on some convection-diffusion type equations that the SOR iteration may be a promising preconditioner in conjunction with the GMRES method. Our results indicate that it is critical to take several Gauss-Seidel or SOR iterations, rather than just one, and that at least a factor of two improvement over Gauss-Seidel can be expected with a reasonable approximation to the optimal omega. This approximation must be on the low side of the...

Moving-Target TSP and Related Problems

C Helvig, Gabriel Robins & Alex Zelikovsky
Previous literature on the Traveling Salesman Problem (TSP) implicitly assumed that the sites to be visited are stationary. Motivated by practical applications, we introduce a time-dependent generalization of TSP which we call Moving-Target TSP, where a pursuer must inter- cept a set of targets which move with constant velocities in a minimum amount of time. We propose approximate and exact algorithms for sev- eral natural variants of Moving-Target TSP. Our implementation of the exact algorithm...

Comparing the Performance of Database Selection Algorithms

James French, Allison Powell, Jamie Callan, Charles Viles, Travis Emmitt & Kevin Prey
We compare the performance of two database selection algorithms reported in the literature. Their performance is compared using a common testbed designed specifically for database selection techniques. The testbed is a decomposition of the TREC/TIPSTER data into 236 subcollections. We present results of a recent investigation of the performance of the CORI algorithm and compare the performance with earlier work that examined the performance of gGlOSS. The databases from our testbed were ranked using both...

Using Reflection for Flexibility and Extensibility in a Metacomputing Environment

Anh NguyenTuong, Steve Chapin, Andrew Grimshaw & Charlie Viles
We present system developers with a reflective model, the Reflective Graph and Event model (RGE), for building metacomputing applications, incorporating our design goals of flexibility, extensibility, reusability, and composability. The model uses graphs and events to specify computations and enables first-class program graphs as event handlers. We demonstrate the RGE model in several areas of interest to metacomputing using Legion as our experimental testbed. We unify the concepts of exceptions and events; by making exceptions...

Registration Year

  • 2017

Resource Types

  • Report


  • UNSW Sydney
  • University of Virginia
  • Bonn-Rhein-Sieg University of Applied Sciences
  • University of Bonn