S2: a Next-Generation Data Science Toolbox
Advantages
- S2 is orders of magnitude faster than R.
- S2 is orders of magnitude faster than Python. S2 solves many problems of Python such as performance and deployment.
- S2 support data analysis on terabytes or even petabytes of data across arrays of data servers and machines.
- S2 comes with many polished data sets and proprietary algorithms users can use out-of-box.
R Alternative
S2 supports a few dozens type of graphs, charts and plots.
Surface plot
Random number generation
Distributions
Solvers are the foundation of the future mathematics. They are the core of AI and Big Data Analytics. We need solvers for any problems that do not have a closed form solution. That is pretty much any modern problem nowadays. S2 supports a full suite of all known standard optimization algorithms.
Linear programming
Quadratic programming
Second order conic programming
Python Replacement
Python does two good things: (1) scripting as a glue to put together many components together to do data analysis, and (2) array/tensor programming. S2 does those as well but better.
The problems with Python are: (1) scripting or interpreted language is slow, and (2) it is very difficult to deploy code to other devices due to numerous versioning of dependencies and an assorted array of libraries in FORTRAN, C, C++, etc. S2 is fast and runs on the 15 billion devices with a JVM by copy-and-pasting jars.
First, Python is slow, very slow. Second, Python scripts runs only in the Python environment. You cannot port it to your phone, watch, router, automobile, rockets. Worst of all, Python deployment is well known to be a nightmare. It runs fine on your machine, but it takes tremendous effort to make it run on another person’s machine.
S2 scripting, compiled to Java bytecode, is orders of magnitude faster than Python’s. It runs on any (embedded) device that runs JVM, hence no deployment problem.
S2 scripting acts as the glue to put many components together but with much better performance. The following case show a scheduling system we built for a steel manufacturing plant. The steps are:
- Read the job data
- Read the machine data
- Schedule the jobs to the machines
- Plot the job-shop schedules to maximize utilization
All these steps are done in an S2 script in 12 lines! This same code can be deployed on S2, on a stand-alone application or on a cloud using REST.
The output schedule in Gnatt chart.
The power of Python comes from these 3 libraries: scipy, numpy and pandas. Together, they allow users to put data in a high-dimensional array (aka tensor) so that they can dice, slice, cut, sample the tensor in however way they want. More importantly, they magically “convert” any Python script into parallel execution code for high performance. It splits a pandas DataFrame to several chunks, spawns a thread to operate on each chunk and combines them back together. (Yet, most Python programmers don’t know how or just don’t do this. This is one reason why a lot of Python scripts are slow.)
S2 supports exactly this kind of array/tensor programming for parallelization. Here is a paralleled version of the Black-Scholes formula application example. The formula is applied concurrently to all the rows in the stock share price/option price table/array, in the same fashion that Python does with pandas.
S2 supports also all kinds of dissecting, slicing, cutting, dicing, sampling, massaging data frame using ND4J.
Big Data Handling
S2 can handle terabytes or even petabytes of data across arrays of machines in an effective manner using map-reduce programming in a simple S2 script.
Demo: a word count example of very large documents across machines.
Industrial Partners
S2 makes partnership with many third-party vendors to make available on S2 their analytics, algorithms and data, hence a one-stop shop of algorithms and data.
AlgoQuant is a large library of financial analytics. It has hundreds of functions. It also comes with well cleaned and professionally maintained data for equities (US and China). AlgoQuant has many templates and frameworks for users to do research in portfolio management.
For example, suppose a user want to study how a simple moving average crossover works for a particular stock, s/he needs only to write the strategy code in a few lines.
The script can be plugged into the AlgoQuant framework for backtesting.
AlgoQuant has a suite of analysis and reporting tools.
SuperCurve is a fixed income data firm in China. They sell high quality bond data and analytics.
A user can retrieve China bond data in S2 using the SuperCurve API. Here is how s/he can fit a zero-coupon yield curve using those bond data in S2 using only two lines of code.
With a yield curve, s/he can price any fixed income instrument on that date in S2.
Licensing
Community Edition
S2 Community Edition is free to use. Please let us know what you think of S2, bugs and feature requests in our forum. You can try it out without registering an account but your work won't be saved. Registration is free!
Enterprise Edition
If you are looking to
- increase S2's computational power
- have a private and secured server (cluster)
- co-develop and customize S2
Please contact sales.
Third Party Vendors
If you would like to host your data and/or algorithms/analytics on S2, please contact us.
Collaboration
If you would like to work together or contribute to S2, please contact us.