Wukong - Serverless Data Analytics

Cheap. Fast. Easy to Use.

Check out what you can do with Wukong

Cost Effective

Wukong's pay-per-use pricing keeps costs low.

Easy to Use

Wukong can be easily deployed in seconds on home computers or in the cloud.

Open Source

Wukong is open-source. Start contributing today.

Highly Scalable

Wukong uses serverless computing to scale to thousands of executors in seconds.

Applications

From machine learning & linear algebra to data analytics, Wukong can do it all.

Performant

Wukong delivers best-in-class end-to-end performance for a variety of workloads.

Sample Code

            # Generate random input data.
            X = da.random.random((128000, 100), chunks = (10000, 100)) 
            
            # Prepare SVD computation.
            u, s, v = da.linalg.svd(X) 
            
            # Begin execution.
            result = u.compute()
            result = s.compute()
            result = v.compute()

            # Generate random input data.
            X = da.random.random((10000, 10000), chunks = (2000, 2000))
            
            # Prepare SVD computation.
            u, s, v = da.linalg.svd_compressed(X, k = 5)
            
            # Begin execution.
            result = u.compute()
            result = s.compute()
            result = v.compute()

            # Generate random input data.
            X = da.random.random((10000, 10000), chunks = (1000, 1000)) 

            # Prepare GEMM computation.
            XX = da.matmul(X, X)
            
            # Begin execution.
            result = XX.compute()

            # Generate random input data.
            X = da.random.random((32768, 128), chunks = (8192, 128))

            # Prepare GEMM computation.
            q,r = da.linalg.tsqr(X)
            
            # Begin execution.
            res_q = q.compute()
            res_r = r.compute()

            # Prepare classifier.
            X, y = sklearn.datasets.make_classification(n_samples=1000)
            clf = ParallelPostFit(SVC(gamma='scale'))
            clf.fit(X, y)
            
            # Prepare the workload and begin execution.
            X, y = dask_ml.datasets.make_classification(n_samples = 1024000, random_state = 1024000, chunks = 10240)

            result = clf.predict(X).compute()

            import pandas as pd
            
            # Read dataframe from file.
            df = pd.read_csv('employees.csv')

            # Create Dask dataframe.
            dask_df = dask.dataframe.from_pandas(df, npartitions = 64)

            # Prepare and perform operation.
            filtered_df = dask_df[dask_df['salary'] > 50000]
            filtered_df.compute()

Publications

B. Carver, J. Zhang, A. Wang and Y. Cheng, "In Search of a Fast and Efficient Serverless DAG Engine," 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), 2019
Paper Slides
B. Carver, J. Zhang, A. Wang, A. Anwar, P. Wu, Y. Cheng, "Wukong: A Scalable and Locality-Enhanced Framework for Serverless Parallel Computing," 2020 ACM Symposium on Cloud Computing (SoCC), 2020
Paper Video Talk

Fast & Efficient Serverless Data Analytics

Wukong is a Serverless DAG Engine built atop AWS Lambda and Dask.

Cheap. Fast. Easy to Use.

Cost Effective

Easy to Use

Open Source

Highly Scalable

Applications

Performant

Stop waiting. Start building.

Sample Code

Publications

B. Carver, J. Zhang, A. Wang and Y. Cheng, "In Search of a Fast and Efficient Serverless DAG Engine," 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), 2019

B. Carver, J. Zhang, A. Wang, A. Anwar, P. Wu, Y. Cheng, "Wukong: A Scalable and Locality-Enhanced Framework for Serverless Parallel Computing," 2020 ACM Symposium on Cloud Computing (SoCC), 2020

Join the Community