Chen@Disney,Paris

Welcome to Chen Zhang's Home Page

I have recently completed my PhD and graduated from School of Computer Science, University of Waterloo. I will be working as a postdoctoral fellow at McGill University. My research interest is around high performance and high throughput computing with applications in various fields, such as bioinformatics, health informatics, easy-to-use cloud solutions, large scale database management, scientific visualization, etc. My PhD thesis topic is "Enhancing Data Processing on Clouds with Hadoop/HBase", focusing on cloud computational workflow system and cloud database solution on top of column-oriented data stores.

 

Before coming to UW, I did my master study in Vrije Universiteit Amsterdam on parallel and distributed computer systems. Prior to that, I was an undergraduate student in Zhejiang University, Hangzhou, China. I learnt a lot from my previous experiences and I'm now looking forward to more challenges. One more thing to mention is that, both Hangzhou and Amsterdam are wonderful places to visit. I like them very much!


Education

PhD

 

2006 - 2011

 

Computer Science, University of Waterloo

MSc

 

2004 - 2006

 

Computer Science, Vrije Universiteit Amsterdam

BEng

 

2000 - 2004

 

Chu Kechen College, Zhejiang University


Research

Current Projects

Sep 2008 - Spring 2011

·        Cloud Database Solution Using HBase

Develop tools and techniques to support multi-row distributed transactions (ACID) with global strong snapshot isolation using HBase. The resulting HBaseSI system is the first SI system on bare-bones HBase. It is non-intrusive to existing HBase system configuration and user data, uses novel ways to efficiently handle distributed transactions without using consensus-based protocols, explicit atomic broadcast, and data locks.

·        CloudWF

Designed a scalable and lightweight scientific workflow system for managing workflows composed of both MapReduce and legacy applications on Hadoop clouds. It is the first workflow management system targeted to take advantage of the Hadoop/HBase architecture for scalability, fault tolerance and ease of use.

·        CloudBATCH

Designed a system to use Hadoop/HBase as a cluster management system in lieu of traditional batch job queuing systems to accept both MapReduce and legacy batch job submissions, removing the complexity and overhead of making the two kinds of systems compatible.

·        Space Weather Simulation

Develop a prototype of an automated Space Weather Forecast simulation tool on clouds powered by Eucalyptus. We use Python scripts to automate the process of reading inputs from external data sources, running simulation, performing visualization and staging output files back to a web portal for users to view.

·        Live Cell Image Processing

Designed a system to execute legacy applications (MATLAB) with non-standard Hadoop input formats (i.e. image files instead of textual inputs) through the Hadoop MapReduce framework. It is one of the earliest efforts to apply Hadoop to large-scale scientific data processing instead of server side computations such as web indexing, etc. The project goal is to study the complex molecular interactions that regulate biological systems.

Past Projects

Sep 2004 – Aug 2008

·        GridBASE

Refined a previously developed database-driven light-weight distributed job execution system for running task-farmable applications on clusters/grids. The core concept behind GridBASE is similar to cloud computing in using computing power on-demand like electricity, treating every node as homogeneous.

·         Parallel MPEG Encoder      

Designed and implemented a parallel MPEG encoder to make video compressing fast and easy. The application employed a self-developed library based on JavaGAT@GridLab. It split the original video into chunks to be processed independently on grid compute nodes and merged the intermediate result files into a final video output.

·        Bioinformatics Workflow Framework

Designed and implemented a grid workflow system as a web portal backend, tailored specifically to bioinformatics programs o ered by IBIVU (http://ibivu.cs.vu.nl). The system was based on Opensymphony workflow engine to submit jobs to DAS-2 cluster.

·         Standalone Globule Redirector      

Designed and implemented a redirection configuration fetcher which helped improve performance of Globule (http://www.globule.org) on top of Apache Portable Runtime.

·        FFPF and SCAMPI project

Ported pcap to FFPF (Fairly Fast Packet Filter) for network monitoring for terabyte backbone.

·         RFID Guardian     

A project focused on providing security and privacy in Radio Frequency Identification (RFID) systems. As one of the earliest participants to the project, contributed to the design and analysis of the software system architecture and the development of various usage scenarios as well as project design documents.  


Publication

·        Book Chapter

[1] Hans De Sterck, Alex Papo, Chen Zhang, Micah Hamady, Rob Knight. 'Database-driven Grid Computing and Distributed Web Applications: A Comparison', For Book “Grids for Bioinformatics and Computational Biology”. Wiley, December 2007. ISBN: 978-0-471-78409-8. preprint

·        Paper

[1] Chen Zhang, Hans De Sterck. 'HBaseSI: A Solution for Multi-row Distributed Transactions with Global Strong Snapshot Isolation on Clouds', Scalable Computing: Practice and Experience, 12, July 2011. preprint

[2] Chen Zhang, Hans De Sterck. 'Supporting Multi-row Distributed Transactions with Global Snapshot Isolation Using Bare-bones HBase', The 11th ACM/IEEE International Conference on Grid Computing (Grid 2010), Oct 25-29, 2010, Brussels, Belgium. preprint

[3] Chen Zhang, Hans De Sterck. 'CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase', The Second IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2010), Nov 30 - Dec 3, 2010, Indianapolis, Indiana, USA. preprint

[4] Ryan Kennedy, Manuel E. Lladser, Zhiyuan Wu, Chen Zhang, Michael Yarus, Hans De Sterck, and Rob Knight. 'Natural and Artificial RNAs Occupy the Same Restricted Region of Sequence Space', RNA, 16:280-289, 2010. preprint

[5] Chen Zhang, Hans De Sterck. 'CloudWF: A Computational Workflow System for Clouds Based on Hadoop', The First International Conference on Cloud Computing, December 1-4, 2009, Beijing, China. preprint extended version

[6] Chen Zhang, Ashraf Aboulnaga, Hans De Sterck, Haig Djambazian, Rob Sladek. 'Case Study of Scientific Data Processing on a Cloud Using Hadoop', High Performance Computing Symposium 2009, June 14-17,2009, Kingston, Ontario, Canada. preprint

[7] Hans De Sterck, Chen Zhang, Aleks Papo. 'Database-driven grid computing with GridBASE', The 2007 IEEE International Symposium on Bioinformatics and Life Science Computing. May 21-23, 2007, Niagara Fall, Ontario, Canada.preprint

[8] Stevens le Blond, Ana-Maria Oprescu, Chen Zhang. 'Early Application Experience with the Grid Application Toolkit (GAT)', The Fourteenth Global Grid Forum (GGF14), June 26-30, 2005, Chicago, IL, USA.preprint

·        Thesis

[1] A Workflow Engine for Running Bioinformatics Codes On DAS-2 (Dutch Grid). Aug 2006. Vrije Universiteit Amsterdam, The Netherlands.

[2] Distributed and Parallel Oriented Layered Architecture Model for NLP. Aug 2004. Zhejiang University, China.

·        Book

         [1] Chen Zhang, Fu Bing, Zhao Jun. 'Java2 programming 150 examples and explanations', China Publishing House of Electronics Industry (PHEI), 2003-09. Second print, 2004-02.

 


Courses

 

Fall 2006

 

CS848 Advanced Topics in Databases: Management of Information Systems

CS882 Protein Folding

CS666 Advanced Algorithms

 

Winter 2007

 

CS787 Computational Vision

CS846 Analysis of Reactive Programs

 

Fall 2008

 

CS848 Advanced Topics in Databases: Databases in Cloud Computing Environments

 


Teaching

·        TA

CS134        Principles of Computer Science

CS135        Designing Functional Programs

CS338        Computer Applications in Business: Databases

CS348        Introduction to Database Management

CS442/642 Principles of Programming Languages

CS454        Distributed Systems


Contact

Chen Zhang
(Currently off campus. New mailing address available upon request.)
Email: chen@chenzhang.info


Last updated: Dec. 2011

Copyright@Chen Zhang