Fast & Efficient Serverless Data Analytics

Wukong is a Serverless DAG Engine built atop AWS Lambda and Dask.

Download

Cheap. Fast. Easy to Use.

Check out what you can do with Wukong


Cost Effective

Wukong's pay-per-use pricing keeps costs low.

Easy to Use

Wukong can be easily deployed in seconds on home computers or in the cloud.

Open Source

Wukong is open-source. Start contributing today.

Highly Scalable

Wukong uses serverless computing to scale to thousands of executors in seconds.

Applications

From machine learning & linear algebra to data analytics, Wukong can do it all.

Performant

Wukong delivers best-in-class end-to-end performance for a variety of workloads.

Stop waiting. Start building.

Wukong is open source. Head over to GitHub to get started today.

Sample Code

            # Generate random input data.
            X = da.random.random((128000, 100), chunks = (10000, 100)) 
            
            # Prepare SVD computation.
            u, s, v = da.linalg.svd(X) 
            
            # Begin execution.
            result = u.compute()
            result = s.compute()
            result = v.compute()
          
            # Generate random input data.
            X = da.random.random((10000, 10000), chunks = (2000, 2000))
            
            # Prepare SVD computation.
            u, s, v = da.linalg.svd_compressed(X, k = 5)
            
            # Begin execution.
            result = u.compute()
            result = s.compute()
            result = v.compute()
          
            # Generate random input data.
            X = da.random.random((10000, 10000), chunks = (1000, 1000)) 

            # Prepare GEMM computation.
            XX = da.matmul(X, X)
            
            # Begin execution.
            result = XX.compute()
          
            # Generate random input data.
            X = da.random.random((32768, 128), chunks = (8192, 128))

            # Prepare GEMM computation.
            q,r = da.linalg.tsqr(X)
            
            # Begin execution.
            res_q = q.compute()
            res_r = r.compute()
          
            # Prepare classifier.
            X, y = sklearn.datasets.make_classification(n_samples=1000)
            clf = ParallelPostFit(SVC(gamma='scale'))
            clf.fit(X, y)
            
            # Prepare the workload and begin execution.
            X, y = dask_ml.datasets.make_classification(n_samples = 1024000, random_state = 1024000, chunks = 10240)

            result = clf.predict(X).compute()
          
            import pandas as pd
            
            # Read dataframe from file.
            df = pd.read_csv('employees.csv')

            # Create Dask dataframe.
            dask_df = dask.dataframe.from_pandas(df, npartitions = 64)

            # Prepare and perform operation.
            filtered_df = dask_df[dask_df['salary'] > 50000]
            filtered_df.compute()
          

Publications

  • B. Carver, J. Zhang, A. Wang and Y. Cheng, "In Search of a Fast and Efficient Serverless DAG Engine," 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), 2019
    Paper Slides
  • B. Carver, J. Zhang, A. Wang, A. Anwar, P. Wu, Y. Cheng, "Wukong: A Scalable and Locality-Enhanced Framework for Serverless Parallel Computing," 2020 ACM Symposium on Cloud Computing (SoCC), 2020
    Paper Video Talk

Join the Community