--- ---

Refereed Publications

Saman Ashkiani, Nina Amenta, and John D. Owens. Parallel Approaches to the String Matching Problem on the GPU. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, July 2016. [ bib ]

Leyuan Wang, Yangzihao Wang, Carl Yang, and John D. Owens. A Comparative Study on Exact Triangle Counting Algorithms on the GPU. In Proceedings of the 1st High Performance Graph Processing Workshop, HPGP '16, May 2016. [ bib | DOI | http ]

Saman Ashkiani, Andrew A. Davidson, Ulrich Meyer, and John D. Owens. GPU Multisplit. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, pages 12:1–12:13, March 2016. [ bib | DOI | http ]

Pınar Muyan-Özçelik and John D. Owens. Multitasking Real-time Embedded GPU Computing Tasks. In Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016, pages 78–87, March 2016. [ bib | DOI | http ]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. Gunrock: A High-Performance Graph Processing Library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, pages 11:1–11:12, March 2016. Distinguished Paper. [ bib | DOI | http ]

Mikhail M. Shashkov, Jason Mak, Shawn Recker, Connie Nguyen, John Owens, and Kenneth I. Joy. Efficient Dense Reconstruction Using Geometry and Image Consistency Constraints. In Proceedings of the IEEE Applied Imagery Pattern Recognition Workshop, October 2015. [ bib | http ]

Yuduo Wu, Yangzihao Wang, Yuechao Pan, Carl Yang, and John D. Owens. Performance Characterization of High-Level Programming Models for GPU Graph Analytics. In IEEE International Symposium on Workload Characterization, IISWC-2015, pages 66–75, October 2015. Best Paper finalist. [ bib | DOI | http ]

Anjul Patney, Stanley Tzeng, Kerry A. Seitz, Jr., and John D. Owens. Piko: A Framework for Authoring Programmable Graphics Pipelines. ACM Transactions on Graphics, 34(4):147:1–147:13, August 2015. [ bib | DOI | ACM DL | http ]

Leyuan Wang, Sean Baxter, and John D. Owens. Fast Suffix Array on the GPU. In Euro-Par 2015: Proceedings of the 21st International European Conference on Parallel and Distributed Computing, Lecture Notes in Computer Science, pages 573–587. Springer, August 2015. Distinguished Paper. [ bib | DOI | http ]

Carl Yang, Yangzihao Wang, and John D. Owens. Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU. In Graph Algorithms Building Blocks, GABB 2015, pages 841–847, May 2015. [ bib | DOI | http ]

Thomas Weber, Michael Wimmer, and John D. Owens. Parallel Reyes-style Adaptive Subdivision with Bounded Memory Usage. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, i3D 2015, pages 39–45, February/March 2015. [ bib | DOI | ACM DL | http ]

Jonathan Kemal, Roger L. Davis, and John D. Owens. Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units. In AIAA Infotech @ Aerospace, AIAA Science and Technology Forum, January 2015. [ bib | DOI | http ]

Mohamed Ebeida, Scott Mitchell, Anjul Patney, Andrew Davidson, Stanley Tzeng, Muhammad Awad, Ahmed Mahmoud, and John D. Owens. Exercises in High-Dimensional Sampling: Maximal Poisson-disk Sampling and k-d Darts. In Janine Bennett, Fabien Vivodtzev, and Valerio Pascucci, editors, Topological and Statistical Methods for Complex Data – Tackling Large-Scale, High-Dimensional, and Multivariate Data Sets, pages 221–238. Springer, November 2014. [ bib | DOI | http ]

Jason Mak, Mauricio Hess-Flores, Shawn Recker, John D. Owens, and Kenneth I. Joy. A Comparative Study of Recent GPU-Accelerated Multi-View Sequential Reconstruction Triangulation Methods for Large-Scale Scenes. In C. V. Jawahar and Shiguang Shan, editors, Big Data in 3D Computer Vision (Computer Vision—ACCV 2014 Workshops), volume 9008 of Lecture Notes in Computer Science, pages 254–269. Springer International Publishing, November 2014. [ bib | DOI | http ]

Afton Geil, Yangzihao Wang, and John D. Owens. WTF, GPU! Computing Twitter's Who-To-Follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks, COSN '14, pages 63–68, October 2014. [ bib | DOI | ACM DL | http ]

Andrew Davidson, Sean Baxter, Michael Garland, and John D. Owens. Work-Efficient Parallel GPU Methods for Single Source Shortest Paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2014, pages 349–359, May 2014. [ bib | DOI | http ]

Jason Mak, Mauricio Hess-Flores, Shawn Recker, John D. Owens, and Kenneth I. Joy. GPU-Accelerated and Efficient Multi-View Triangulation for Scene Reconstruction. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, WACV '14, pages 61–68, March 2014. [ bib | DOI | http ]

Mohamed S. Ebeida, Anjul Patney, Scott A. Mitchell, Keith R. Dalbey, Andrew A. Davidson, and John D. Owens. k-d Darts: Sampling by k-Dimensional Flat Searches. ACM Transactions on Graphics, 33:3:1–3:16, January 2014. [ bib | DOI | ACM DL | http ]

Mohamed S. Ebeida, Ahmed H. Mahmoud, Muhammad A. Awad, Mohammed A. Mohammed, Scott A. Mitchell, Alex Rand, and John D. Owens. Sifted Disks. Computer Graphics Forum, 32(2):509–518, May 2013. [ bib | DOI | .pdf ]

Stanley Tzeng, Brandon Lloyd, and John D. Owens. A GPU Task-Parallel Model with Dependency Resolution. IEEE Computer, 45(8):34–41, August 2012. [ bib | DOI | http ]

Shengren Li, Lance Simons, Jagadeesh Bhaskar Pakaravoor, Fatemeh Abbasinejad, John D. Owens, and Nina Amenta. kANN on the GPU with Shifted Sorting. In Proceedings of High Performance Graphics, HPG '12, pages 39–47, June 2012. [ bib | DOI | http ]

Stanley Tzeng, Anjul Patney, Andrew Davidson, Mohamed S. Ebeida, Scott A. Mitchell, and John D. Owens. High-Quality Parallel Depth-of-Field Using Line Samples. In Proceedings of High Performance Graphics, HPG '12, pages 23–31, June 2012. [ bib | DOI | http ]

Andrew Davidson, David Tarjan, Michael Garland, and John D. Owens. Efficient Parallel Merge Sort for Fixed and Variable Length Keys. In Proceedings of Innovative Parallel Computing, InPar '12, May 2012. [ bib | DOI | http ]

Mohamed S. Ebeida, Scott A. Mitchell, Anjul Patney, Andrew A. Davidson, and John D. Owens. A Simple Algorithm for Maximal Poisson-Disk Sampling in High Dimensions. Computer Graphics Forum, 31(2):785–794, May 2012. [ bib | DOI | http ]

Kshitij Gupta, Jeff Stuart, and John D. Owens. A Study of Persistent Threads Style GPU Programming for GPGPU Workloads. In Proceedings of Innovative Parallel Computing, InPar '12, May 2012. [ bib | DOI | http ]

Ritesh A. Patel, Yao Zhang, Jason Mak, and John D. Owens. Parallel Lossless Data Compression on the GPU. In Proceedings of Innovative Parallel Computing, InPar '12, May 2012. [ bib | DOI | http ]

Andrew Davidson and John Owens. Toward Techniques for Auto-tuning GPU Algorithms. In Kristján Jónasson, editor, Applied Parallel and Scientific Computing, volume 7134 of Lecture Notes in Computer Science, pages 110–119. Springer Berlin / Heidelberg, February 2012. [ bib | DOI ]

Yao Zhang, John Ludd Recker, Robert Ulichney, Ingeborg Tastl, and John D. Owens. Plane-dependent Error Diffusion on a GPU. In Proceedings of SPIE: IS&T/SPIE Electronic Imaging 2012 / Parallel Processing for Imaging Applications II, volume 8295B, pages 8295B–59:1–10, January 2012. [ bib | DOI | http ]

Mohamed S. Ebeida, Anjul Patney, John D. Owens, and Eric Mestreau. Isotropic conforming refinement of quadrilateral and hexahedral meshes using two-refinement templates. International Journal for Numerical Methods in Engineering, 88(10):974–985, 9 December 2011. [ bib | DOI | http ]

Kshitij Gupta and John D. Owens. Compute & Memory Optimizations for High-Quality Speech Recognition on Low-End GPU Processors. In Proceedings of the 2011 International Conference on High Performance Computing, HiPC 2011, December 2011. [ bib | DOI | http ]

Dan A. Alcantara, Vasily Volkov, Shubhabrata Sengupta, Michael Mitzenmacher, John D. Owens, and Nina Amenta. Building an Efficient Hash Table on the GPU. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 2, chapter 4, pages 39–53. Morgan Kaufmann, October 2011. [ bib | DOI ]

Mohamed S. Ebeida, Scott A. Mitchell, Andrew A. Davidson, Anjul Patney, Patrick M. Knupp, and John D. Owens. Efficient and Good Delaunay Meshes From Random Points. In Proceedings of the SIAM Conference on Geometric and Physical Modeling, GD/SPM11, pages 1506–1515, October 2011. [ bib | DOI | http ]

Mark Silberstein, Assaf Schuster, and John D. Owens. Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 2, chapter 36, pages 501–517. Morgan Kaufmann, October 2011. [ bib | DOI ]

Jeff A. Stuart, Pavan Balaji, and John D. Owens. Extending MPI to Accelerators. In Proceedings of the First Workshop on Architectures and Systems for Big Data, ASBD 2011, pages 19–23, October 2011. [ bib | DOI | ACM DL | http ]

Yao Zhang, Jonathan Cohen, Andrew A. Davidson, and John D. Owens. A Hybrid Method for Solving Tridiagonal Systems on the GPU. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 2, chapter 11, pages 117–132. Morgan Kaufmann, October 2011. [ bib | DOI | http ]

John Jenkins, Isha Arkatkar, John D. Owens, Alok Choudhary, and Nagiza F. Samatova. Lessons Learned from Exploring the Backtracking Paradigm on the GPU. In Euro-Par 2011: Proceedings of the 17th International European Conference on Parallel and Distributed Computing, volume 6853 of Lecture Notes in Computer Science, pages 425–437. Springer, August/ September 2011. [ bib | DOI | http ]

Everett H. Phillips, Yao Zhang, Roger L. Davis, and John D. Owens. Acceleration of 2-D Compressible Flow Solvers with Graphics Processing Unit Clusters. Journal of Aerospace Computing, Information, and Communication, 8(8):237–249, August 2011. [ bib | DOI | http ]

Mohamed S. Ebeida, Anjul Patney, Scott A. Mitchell, Andrew Davidson, Patrick M. Knupp, and John D. Owens. Efficient Maximal Poisson-Disk Sampling. ACM Transactions on Graphics, 30(4):49:1–49:12, July 2011. [ bib | DOI | ACM DL | http ]

Jeff A. Stuart, Michael Cox, and John D. Owens. GPU-to-CPU Callbacks. In Euro-Par 2010 Workshops: Proceedings of the Third Workshop on UnConventional High Performance Computing (UCHPC 2010), volume 6586 of Lecture Notes in Computer Science, pages 365–372. Springer, July 2011. [ bib | DOI | http ]

Vladimir Glavtchev, Pınar Muyan-Özçelik, Jeffery M. Ota, and John D. Owens. Feature-Based Speed Limit Sign Detection Using a Graphics Processing Unit. In Proceedings of the 2011 IEEE Intelligent Vehicles Symposium, IV '11, pages 195–200, June 2011. [ bib | DOI | http ]

Christopher P. Stone, Earl P. N. Duque, Yao Zhang, David Car, John D. Owens, and Roger L. Davis. GPGPU parallel algorithms for structured-grid CFD codes. In Proceedings of the 20th AIAA Computational Fluid Dynamics Conference, number 2011-3221, June 2011. [ bib | DOI | http ]

Andrew Davidson, Yao Zhang, and John D. Owens. An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU. In Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011, pages 956–965, May 2011. [ bib | DOI | http ]

Jeff A. Stuart and John D. Owens. Multi-GPU MapReduce on GPU Clusters. In Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011, pages 1068–1079, May 2011. [ bib | DOI | http ]

Andrew Davidson and John D. Owens. Register Packing for Cyclic Reduction: A Case Study. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pages 4:1–4:6, March 2011. [ bib | DOI | ACM DL | http ]

Pınar Muyan-Özçelik, Vladimir Glavtchev, Jeffrey M. Ota, and John D. Owens. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 1, chapter 32, pages 497–516. Morgan Kaufmann, February 2011. [ bib | DOI | http ]

Yao Zhang and John D. Owens. A Quantitative Performance Analysis Model for GPU Architectures. In Proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture, HPCA-17, pages 382–393, February 2011. [ bib | DOI | http ]

Shubhabrata Sengupta, Mark Harris, Michael Garland, and John D. Owens. Efficient Parallel Scan Algorithms for many-core GPUs. In Jakub Kurzak, David A. Bader, and Jack Dongarra, editors, Scientific Computing with Multicore and Accelerators, Chapman & Hall/CRC Computational Science, chapter 19, pages 413–442. Taylor & Francis, January 2011. [ bib | DOI | http ]

Yao Zhang, John Ludd Recker, Robert Ulichney, Giordano B. Beretta, Ingeborg Tastl, I-Jong Lin, and John D. Owens. A Parallel Error Diffusion Implementation on a GPU. In Proceedings of SPIE: IS&T/SPIE Electronic Imaging 2011 / Parallel Processing for Imaging Applications, volume 7872, pages 78720K:1–9, January 2011. [ bib | DOI | http ]

Pınar Muyan-Özçelik, Vladimir Glavtchev, Jeffery M. Ota, and John D. Owens. A Template-Based Approach for Real-Time Speed-Limit-Sign Recognition on an Embedded System using GPU Computing. In Michael Goesele, Stefan Roth, Arjan Kuijper, Bernt Schiele, and Konrad Schindler, editors, DAGM 2010: Proceedings of the 32nd Annual Symposium of the German Association for Pattern Recognition, volume 6376 of Lecture Notes in Computer Science, pages 162–171. Springer, September 2010. [ bib | DOI | http ]

Andrew Davidson and John D. Owens. Toward Techniques for Auto-Tuning GPU Algorithms. In State of the Art in Scientific and Parallel Computing, Para 2010, June 2010. [ bib | http ]

Anjul Patney, Stanley Tzeng, and John D. Owens. Fragment-Parallel Composite and Filter. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering), 29(4):1251–1258, June 2010. [ bib | DOI | http ]

Everett H. Phillips, Roger L. Davis, and John D. Owens. Unsteady Turbulent Simulations on a Cluster of Graphics Processors. In Proceedings of the 40th AIAA Fluid Dynamics Conference, number AIAA 2010-5036, June 2010. [ bib | DOI | http ]

Jeff A. Stuart, Cheng-Kai Chen, Kwan-Liu Ma, and John D. Owens. Multi-GPU Volume Rendering using MapReduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing / The First International Workshop on MapReduce and its Applications, HPDC '10 / MAPREDUCE '10, pages 841–848, June 2010. [ bib | DOI | ACM DL | http ]

Stanley Tzeng, Anjul Patney, and John D. Owens. Task Management for Irregular-Parallel Workloads on the GPU. In Proceedings of High Performance Graphics, HPG '10, pages 29–37, June 2010. [ bib | DOI | http ]

Yao Zhang, Jonathan Cohen, and John D. Owens. Fast Tridiagonal Solvers on the GPU. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pages 127–136, January 2010. [ bib | DOI | ACM DL | http ]

Dan A. Alcantara, Andrei Sharf, Fatemeh Abbasinejad, Shubhabrata Sengupta, Michael Mitzenmacher, John D. Owens, and Nina Amenta. Real-Time Parallel Hashing on the GPU. ACM Transactions on Graphics, 28(5):154:1–154:9, December 2009. [ bib | DOI | ACM DL | http ]

Kshitij Gupta and John D. Owens. Three-Layer Optimizations for Fast GMM Computations on GPU-like Parallel Processors. In Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pages 146–151, December 2009. [ bib | DOI | http ]

Anjul Patney, Mohamed S. Ebeida, and John D. Owens. Parallel View-Dependent Tessellation of Catmull-Clark Subdivision Surfaces. In Proceedings of High Performance Graphics, HPG '09, pages 99–108, August 2009. [ bib | DOI | ACM DL | http ]

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, and Kenneth I. Joy. Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures. In Proceedings of the 21st International Conference on Scientific a

---