High-Performance Clustering of Distributed, High-Dimensional Data Sets
|Thursday, May 26, 2016
|Raffel’s – 10160 Reading Road (see below for directions)
|5:30 p.m. to 6:00 p.m. – Social Time
|6:00 p.m. to 7:00 p.m. – Dinner
|7:00 p.m. to 8:45 p.m. – Presentation
|$10- $15, See information in Reservations
ABOUT THE MEETING: The size and amount of data captured from numerous sources has created a situation where the large quantity of data challenges our ability to understand the meaning within the data. This has motivated studies for mechanized data analysis and in particular for the clustering, or partitioning, of data into related groups. In fact, the size of the data has grown to the point where it is now often necessary to stream the data through the system for online and high speed analysis. This talk explores the application of approximate methods for the stream clustering of high-dimensional data (feature sizes contains 100s to 1000s of measures). In particular, we develop an algorithm, called RPHash, that combines Random Projection and Locality Sensitive Hashing (LSH) to implement a high-performance method for the parallel and distributed clustering of streaming data in a map-reduce framework. RPHash is able to perform clustering at a rate much faster than traditional clustering algorithms, such K-Means. Furthermore, the experimental results show that RPHash has a near linear speedup relative to the number of CPU cores. This speedup efficiency is possible because the approximate methods used in RPHash allow independent and largely unsynchronized analyses to be performed on each streamed data vector until a log-based reduction step is required to provide clustering results.
ABOUT THE PRESENTER: Philip A. Wilsey is a Professor of Computer Engineering at the University of Cincinnati. His primary research is in high performance computing, parallel and distributed simulation, high-dimensional data clustering, embedded systems, and point-of-care medical devices. He has worked more than 25 years in the field of high performance computing and has recently initiated studies into the high performance clustering of high-dimensional data. His work in high-performance high-dimensional data clustering focuses a combination of approximate methods that support high-performance as well as distributed data clustering.
MENU SELECTIONS: Buffet Menu: Homemade Potato and Chicken Noodle Soups, Salad and Baked Potato Bar, Toppings: Shredded Cheddar, Chopped Bacon, Sliced Hardboiled Eggs, Sauteed Mushrooms & Garlic, Diced Broccoli, Diced Turkey & Salad Dressings, Breadsticks, Chef’s Choice Dessert
LOCATION: Raffel’s is located at 10160 Reading Road, south of Glendale-Milford Road on the east side of Reading. Take I-75 to the Glendale-Milford Rd. Exit, go east on Glendale-Milford Road approximately ¾ of a mile to Reading Rd. and turn right on Reading.
RESERVATIONS: https://ieeecincinnati.org/meetings/. Please click on the appropriate link and complete the reservation. (Note: Meeting list on webpage is slow to load on some browsers)
Reservations close at midnight on Sunday May 22, 2016. DINNER RESERVATION CANCELLATION POLICY An email to Reservations@ieeecincinnati.org prior to the close of reservations is required to properly cancel your reservation.
WALK-INS (those without reservations): You are welcome to attend this meeting and/or enjoy the dinner even if you did not register in advance. Walk-ins pay a higher $15 dinner fee. Raffel’s determines our cost based on the number of plates used; if you choose to have dinner, please pay the fee if you eat even if you arrive late or didn’t pre-register.
PE CREDITS: Depending on the subject matter, attendance at IEEE Cincinnati Section Meetings now qualifies the attendee for Professional Development Hours towards renewal of Professional Engineers Licenses. Required documentation will be available following the meeting if qualified! The Section Meetings also provide a great opportunity to network with fellow engineers in the area.