Book Name: | Applied Data Science |
Category: | Machine Learning |
Free Download: | Available |
Applied Data Science
Notes for the Applied Data Science course at Columbia University. He focuses more on the stat limit, while also teaching readers some basic programming skills.
About Book
The explosion of available data coinciding with the continuous evolution of statistical and computational methods has led to a new generation of specialists. These data scientists use rigorous statistical methods to find meaning in the data. Minimizing a loss function is not enough: business and social decisions depend on the interpretation of this knowledge. The world of scientific computing is rapidly changing. Fast and dirty scripts aren’t enough – a maintainable code base and collaborative development environment allow projects to go into production and scale. A data scientist has to wear a lot of hats, here we present two of them.
Maintainable coding techniques using test-based development, version control and collaboration will be taught. The code will be of the type found in the scikit-learn and statsmodels packages. Students leave the classroom after building a library on GitHub and understanding several basic statistical / machine learning algorithms.
The case studies offer students the opportunity to use their software on real-world datasets. Here they develop the intuition to extract meaning from the data.
Students finish the class with a website/blog/portfolio, and experience with the translation:
Real world –> data –> scientist –> collaborators/coworkers –> policy-decision/data-product
Author Details
Ian Langmore is a software engineer at Google, an applied mathematician who works as a data scientist. His specialties are Monte Carlo simulation, machine learning, statistics, partial differential equations, scientific calculus.
Daniel Krasner is the founder and CEO of Merriam Tech, a company whose products combine archival research techniques, a focus on meaning and context, with statistical processing of language to bring insightful and intuitive interaction to vast collections of electronic text. . Previously, he was a mathematician (PhD from Columbia University) and worked on the intersection of low-dimensional topology, representation theory, and homological algebra.