S2 IDE

an integrated development environment of advanced analytics and data

Tutorials

Here is a list of tutorials.

More coming…

Advantages

R Alternative

You can do a lot of things in S2 that you can do in R and S2 does so much faster. Moreover, R code runs inside only the R environment. It is very difficult to deploy them anywhere else such as in embedded devices like microwaves, automobiles, space rockets. S2 code runs in any JVM environment. There are now 15 billion devices that run JVM!

Numeric computing environment

S2 is an IDE for coding numerical algorithms. Let’s start with 1+1.

With 1+1 working, we can do pretty much anything in numerical programming, which is just a more complicated series of many 1+1. For example, integration.

S2 has the fastest linear algebra package in the Java world, probably.

Plotting

S2 supports a few dozens type of graphs, charts and plots.

Box plot

Density plot

Scatter plot

Histogram

Bar chart

Surface plot

and many more…

Statistics

S2 has a very comprehensive statistics package.

S2 supports almost all types of linear regressions and their statistics, OLS, GLM and logistics.

Time series analysis

Random number generation

Distributions

and many more…

Optimization

Solvers are the foundation of the future mathematics. They are the core of AI and Big Data Analytics. We need solvers for any problems that do not have a closed form solution. That is pretty much any modern problem nowadays. S2 supports a full suite of all known standard optimization algorithms.

Linear programming

Quadratic programming

Second order conic programming

and many more…

Machine Learning

It is simple to create and train a Neural Network (NN) in S2.

A simple script trains an NN to learn the Black-Scholes formula from a data set of stock prices and option prices.

It converges in a few hundred epochs.

Python Replacement

Python does two good things: (1) scripting as a glue to put together many components together to do data analysis, and (2) array/tensor programming. S2 does those as well but better.

The problems with Python are: (1) scripting or interpreted language is slow, and (2) it is very difficult to deploy code to other devices due to numerous versioning of dependencies and an assorted array of libraries in FORTRAN, C, C++, etc. S2 is fast and runs on the 15 billion devices with a JVM by copy-and-pasting jars.

Scripting

First, Python is slow, very slow. Second, Python scripts runs only in the Python environment. You cannot port it to your phone, watch, router, automobile, rockets. Worst of all, Python deployment is well known to be a nightmare. It runs fine on your machine, but it takes tremendous effort to make it run on another person’s machine.

S2 scripting, compiled to Java bytecode, is orders of magnitude faster than Python’s. It runs on any (embedded) device that runs JVM, hence no deployment problem.

S2 scripting acts as the glue to put many components together but with much better performance. The following case show a scheduling system we built for a steel manufacturing plant. The steps are:

Read the job data
Read the machine data
Schedule the jobs to the machines
Plot the job-shop schedules to maximize utilization

All these steps are done in an S2 script in 12 lines! This same code can be deployed on S2, on a stand-alone application or on a cloud using REST.

The output schedule in Gnatt chart.

Array/Tensor Programming

The power of Python comes from these 3 libraries: scipy, numpy and pandas. Together, they allow users to put data in a high-dimensional array (aka tensor) so that they can dice, slice, cut, sample the tensor in however way they want. More importantly, they magically “convert” any Python script into parallel execution code for high performance. It splits a pandas DataFrame to several chunks, spawns a thread to operate on each chunk and combines them back together. (Yet, most Python programmers don’t know how or just don’t do this. This is one reason why a lot of Python scripts are slow.)

S2 supports exactly this kind of array/tensor programming for parallelization. Here is a paralleled version of the Black-Scholes formula application example. The formula is applied concurrently to all the rows in the stock share price/option price table/array, in the same fashion that Python does with pandas.

S2 supports also all kinds of dissecting, slicing, cutting, dicing, sampling, massaging data frame using ND4J.

Big Data Handling

S2 can handle terabytes or even petabytes of data across arrays of machines in an effective manner using map-reduce programming in a simple S2 script.

Demo: a word count example of very large documents across machines.

Industrial Partners

S2 makes partnership with many third-party vendors to make available on S2 their analytics, algorithms and data, hence a one-stop shop of algorithms and data.

AlgoQuant

AlgoQuant is a large library of financial analytics. It has hundreds of functions. It also comes with well cleaned and professionally maintained data for equities (US and China). AlgoQuant has many templates and frameworks for users to do research in portfolio management.

For example, suppose a user want to study how a simple moving average crossover works for a particular stock, s/he needs only to write the strategy code in a few lines.

The script can be plugged into the AlgoQuant framework for backtesting.

AlgoQuant has a suite of analysis and reporting tools.

SuperCurve

SuperCurve is a fixed income data firm in China. They sell high quality bond data and analytics.

A user can retrieve China bond data in S2 using the SuperCurve API. Here is how s/he can fit a zero-coupon yield curve using those bond data in S2 using only two lines of code.