Cara menggunakan data scientist cheat sheet

A helpful 5-page data science cheatsheet to assist with exam reviews, interview prep, and anything in-between. It covers over a semester of introductory machine learning, and is based on MIT's Machine Learning courses 6.867 and 15.072. The reader should have at least a basic understanding of statistics and linear algebra, though beginners may find this resource helpful as well.

Inspired by Maverick's Data Science Cheatsheet (hence the 2.0 in the name), located here.

Topics covered:

  • Linear and Logistic Regression
  • Decision Trees and Random Forest
  • SVM
  • K-Nearest Neighbors
  • Clustering
  • Boosting
  • Dimension Reduction (PCA, LDA, Factor Analysis)
  • Natural Language Processing
  • Neural Networks
  • Recommender Systems
  • Reinforcement Learning
  • Anomaly Detection
  • Time Series
  • A/B Testing

This cheatsheet will be occasionally updated with new/improved info, so consider a follow or star to stay up to date.

Future additions (ideas welcome):

  • Time Series Added!
  • Statistics and Probability Added!
  • Data Imputation
  • Generative Adversarial Networks
  • Graph Neural Networks

Links

  • Data Science Cheatsheet 2.0 PDF

Screenshots

Here are screenshots of a couple pages - the link to the full cheatsheet is above!

Cara menggunakan data scientist cheat sheet
Cara menggunakan data scientist cheat sheet

Why is Python/SQL not covered in this cheatsheet?

I planned for this resource to cover mainly algorithms, models, and concepts, as these rarely change and are common throughout industries. Technical languages and data structures often vary by job function, and refreshing these skills may make more sense on keyboard than on paper.

License

Feel free to share this resource in classes, review sessions, or to anyone who might find it helpful :)

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Partial functions allow us to fix a certain number of arguments of a function and generate a new function.

Example:




from functoolsimport partial

  

# A normal function

def f(a, b, c, x):

312
0
312
1
312
2
312
3
312
4
312
5
312
6
312
3
312
8
312
5 from0
312
3from2
312
5 from4

  

from6

from7

from8from9 functools0functools1functools2functools3functools2functools5functools6

  

functools8

functools9import0import1import2

Output:

3145

In the example we have pre-filled our function with some constant values of a, b and c. And g() just takes a single argument i.e. the variable x.

Another Example :




from functoolsimport

312
3

  

# A normal function

def partial0

312
0
312
1
312
6
312
3
312
4
312
5 from0
312
3
312
8
312
5 from2

  

 3

 4from9  6from9  8 9from9 functools3functools6

  

# A normal function4

functools9# A normal function6functools1import2

Output:

312
  • Partial functions can be used to derive specialized functions from general functions and therefore help us to reuse our code.
  • This feature is similar to bind in C++.

This article is contributed by Mayank Rawat .If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to [email protected]. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

Langkah Langkah berikut yang dilakukan oleh Data Scientist?

Proses Data Science.
Obtain. Langkah pertama untuk memulai sebuah proyek data science adalah obtain, yaitu mendapatkan atau mengumpulkan data. ... .
2. Scrub. Setelah data dikumpulkan, hal selanjutnya yang harus dilakukan dalam tahap proses data science adalah scrubbing data. ... .
3. Explore. ... .
4. Model. ... .
Interpret..

Tools apa saja yang digunakan untuk seorang Data Scientist?

Yuk, simak pembahasannya di bawah ini!.
SQL. SQL (Structured Query Language) adalah bahasa pemrograman yang digunakan untuk membangun, mengakses, mengubah, dan memanipulasi data berbasis relasional. ... .
Python. Bahasa pemrograman selanjutnya yang populer di kalangan Data Scientist adalah Python. ... .
3. SAS. ... .
Matlab..

Data Science menggunakan apa?

Selanjutnya, kita perlu memahami pula alat-alat yang digunakan dalam data science secara umum. Mereka, tak lain tak bukan, ialah Big Data, Machine Learning, Data Mining, Deep Learning, sampai Artificial Intelligence.

Apa itu Sklearn pada python?

Scikit-learn atau Sklearn adalah library berbasis Python untuk membangun model pembelajaran mesin. Ia menyediakan banyak algoritma pembelajaran untuk regresi, pengelompokan, dan klasifikasi.