Data Science  
 


  ,      ,     .    ,    .     .     Python   .   ,   , Data Science,  .





Data Science  



 



 ,2023



ISBN978-5-0060-2886-9

     Ridero







      1)  2)   Data Science.

    ,     Visual Studio Code.

    , ,  .    .           ,     Data Science.




 


     ,  .



1.,     ,      . , ? ,    .     .       ,    .    .   ,       .   DS,   ࠖ   .  .      ,     .      ,   ,    ,  .         .

2.       :  . , 堖 , 堖    ,  . ?   . Ƞ  ,           .  ,    ,     .       .        ,      .      ,    ,       .   ,       .    ,   . ,    .

3.      .   .     ,   , .     ,      ,   .   !

4.  .  , , .    ,    .    , .  ࠖ , .     .    .        . , . ,    , Data Science   .     .     ,  .

5.       . ,     ,     .   .   ,     .

6.  ,    .    - ,        (   - ).      ,  , DS, .   , ,    ,      .    .    .

7.   .        90%,  .          ,   .      .   , , , .

8.   .  -  ,    . 堖    .  ,     .  - ?   , ,    ?

9.  .   , . ,   .   ,   ,        .  ,      .          .

10.      .    .     .   ?  堖     .     ,  .         :    .    .      ,     .          (   !).

11.       DS,    ,        . , ,     ?       : ,  .. ,  ,    .   .   ,   ,    ,    .      .   .    ,   . Ƞ, ,       !      ..,   , .    !

12.  .   ?    .       ,   ..     ,  .         . ,     ,      .

13.  . ,      . ,  ,   ,      . ,     ,        .  ,      .  . ?   .     .    .    ,      .  ,      DS    .  !

14.   .   ?      .       , .      . Ƞ     .    !

15.  .     ,      .

16.     .  ,  .   !      .     ,    .

17. ,    .  .

18.  .       .   .  !     ,  .   ,        .




 1.  







Data Science    :

1)   ;

2)   ;

3)  .

  :

1)    (,  ..);

2)       ( ..).

 ,     ,    ,   .   .        ,  .   ? -,     ,   , ,  . -,      ,   ,   . , ,      ,     .    ,  ,  ,      .  ,  ࠖ   ,  ,   .     .

    ,          .  ,     ,  堖  .

     .   ,   ,  ,  ..,       ,    (    ).        .

  , ,  :

1)    ,   (). ,   ,      ;

2)   . ,         ;

3)  ;

4)     . ,   ,    ;

5)  ,    . ,       ..;

6)   . ,        ;

7)    ;

  

    https://www.kaggle.com/jealousleopard/goodreadsbooks (https://www.kaggle.com/jealousleopard/goodreadsbooks).   ,      .    ,         .           ,      .  ,  Goodreads  2020  API  .

        ?    ,  ,   ,   .         ,  ,   .     , ,   ,        ,      .      .













,      :

1)   (, , isbn, ,  ,   );

2)    ( ,  ,  ).

        ,   ,   , ,    .      ,        . , ,   ,      ?       ?

         .

   :  ,  .

  頖 ,   ⠖   ( ). , ,       .       0n (    ).      ,  -   .













,   12 (  0 11) 11123 ().   (   ).    RangeIndex.     . isbn   , isbn13  int64.  , publication_date   ,    .   .








,  publication_date    .      ,    .








    .   db    ,   publication_date   NaT.   ~  .  isin     .

  ,      ,  ,    ,     .  ,     db,        db,  .








     .       (      ),    .    ,  isbn      .       ,          .








 isbn13 . -     .








   ,         isbn. , ,  ,    . Ѡ   .   ,     .








  76.   ?  .








,    ,   .  .    ,    .      ,    .



   .   ?   db.      ,    0. ,    ,   publisher.    value_counts    head    .








,    ,   ,  1162!  .



* *

        .   ,          ,  str ..








 , ,  Amazon,    . Ƞ  !  ,     ,    .   , ,  .   ꠖ   .  ?        . ,     ࠖ    .   ,     .








,   ,       ,   .   ,   .  ,   ,      ,   .   , , t-.      -  ,   .



* *

  np. where?      ,   ,   audio,   audio,   ,    paper.








   ,     10.  .








   ,   ,     Listening Library  ,    .  ,     10  . .








    :

1) ,   ,

2) ,   .

   ,   .     ,      .   ,         .    ?       ,   ..  , ,       .

    :

1)    ,

2)   ,             .

    ,    ,     .         ,    ,   .








   authors. ,    ,      /.   ?



* *

   ,    10* (db. loc [:, publication_date].dt. year // 10). ?        10,     . ,   2001,   200.    10  ,   .








 , ,  ,    .    .        .        .      .   .   tra_co (  )  1 ,   , 0.








  ,      .   .








 ,     1000   . ,     ,    ,   0, .  ?        .     ,     .      :

1)   #,

2)   BoxedSet.

 ,      books, vol., volume, series.

.     https://developers.google.com/edu/python/regular-expressions (https://developers.google.com/edu/python/regular-expressions).













   0.     .  0  ,     ,   .   ,     . ,   0,   0,    . .













  .         .








    .    :    **decade**,    ,   20  1940.

   ?       ,       ,     .









 2.  

















        蠖     ,  ,    Exploratory Data Analysis (EDA).

    .

   :  =>  =>  =>  => .    EDA   :  =>  =>  =>  => .

 ,      ,  , EDA   ,   .  ,       ,    EDA   ,    .

 , EDA   ,  ,   ..    ,  . ,  ANOVA, t-tests, chi-squared tests, F-tests.

       .        :

1)  ( ,   );

2)  (,  ).

  ,      :

1)    ?

2)     ?

3)      ?

4)    ?

5)   ?

  ,     ,     .    ,   ,    .

  ,    .  ( )  .   ?  , 堖    .     .   .








    ?    ,   ?  , ,    .   ?     ?        ?     .     ,    ?   ? Ƞ ,   ? Ƞ   .  Goodreads,    ,      ,  .   ,    ,      .  ,           ( ,     ).



    . Ƞ       . ,  :

1.      ();

2.    (     );

3.   ,       ().

     , ,  ,      . ,  Goodreads      .       .

             (堖 ).     ,    (. 187):

1.    ;    ;   ,   ;          .;

2.          ,   .



    



              (, .76).


     .      ( ,    )       (, . 78).    ,  .

    - ,             (, . 87).        .



 ()              (, .88).


 :








    ,   X    ,  3, 8, 19..     ,     X  . ,   3, 8, 19..

         [7.4.4. What are variance components?] (https://www.itl.nist.gov/div898/handbook/prc/section4/ (https://www.itl.nist.gov/div898/handbook/prc/section4/))







  ,  ,       ,    , . ,  X   ,         ,    (, .94).


     ,     .      ,      .








    :








 ,  ,   M (X)   X   ,   .      (   )   x.    X,     x.

,     .








 .    .    , .    ?








    ?      . ?        0,01100,    0,01   10000.          ,      (, 98).     , ,   ,     ,   .

   k     ,   (k,     k=1).    k            .

[      .        (   ).          . Ӡ    .          (    ,    ).]

      9.      .

      (, . 188).    .      (https://www.itl.nist.gov/div898/handbook/ppc/section3/ppc333.htm (https://www.itl.nist.gov/div898/handbook/ppc/section3/ppc333.htm)),  [7.2.4.2. Sample sizes required] (https://www.itl.nist.gov/div898/handbook/prc/section2/prc242.htm (https://www.itl.nist.gov/div898/handbook/prc/section2/prc242.htm)).

    ,    ().

  



          ,  堖          (, . 192).


 ,     Goodreads,   堖  ,          Goodreads.

   .



      . ,     ,     .     ,    . ,   ,       ,    ( )     ,         ;    ,   , ,  ,     ,     (, . 197).


,     Goodreads. ,   ,   .     Goodreads      .

,   ,     .      .      .     ,     .      ,      ,       ().     .



   ,           ( . 198).


,     ,    .

      ,  , - (堖 ),    .



   ,       ,          ,           .             .


,     , -   ,    ,   .       .



   ,  (    n)     (, . 199).


 ,      ,    .



   ,   [   ]    .


   .  1416. 16.



        :

1.   頖 . 4.        (https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm (https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm)),      .        [8.1.6. What are the basic lifetime distribution models used for non-repairable populations?] (https://www.itl.nist.gov/div898/handbook/apr/section1/apr16.htm (https://www.itl.nist.gov/div898/handbook/apr/section1/apr16.htm)).      ?       ,   .       .

   .

    :   (a)   .    ,       ,    :

1)    a( )    ,       Ox: ,  a, ,  a (, . 131).

2) Ѡ         ,     ,     Ox;  蠖          Oy ().

3)      1.

[     . ,     ,   ,       .         .      ,    .]

        ( ).   :



   X          ,       ,  X  ,   (, . 135).


 ,     .   ,  ,       .      .       .        0,       , .         .

2.  蠖 . 2    ,  ,  .         :



Ѡ        ,          ,    ,   ,         (. 24 ).


     (MSE)  .      :








     .



    [ ]    ( )          , ,     .   ,      . ,  ,               ;      ,    .


3.Z-  ,    .    :








  : x     ;    ; ࠖ  .

4.  , .  [1.3.5.2. Confidence Limits for the Mean] (https://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm (https://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm))

5. , , .   ,  :

1)           ;

2)        .   ,   ;

3)       .  ,     .

  [Engineering statistics handbook] (https://www.itl.nist.gov/div898/handbook/eda/eda.htm (https://www.itl.nist.gov/div898/handbook/eda/eda.htm))     :

1.   :

a) ;

b)  .       .  6078%       .  9098%       .  99%       ;

c)      ;

d)  , , ,    ,  Wilk-Shapiro test.

2.     .

3.           (   ).

4.    .

5.  Anderson-Darling   .

6.     .

      [1.4.2.1.3. Quantitative Output and Interpretation] (https://www.itl.nist.gov/div898/handbook/eda/section4/eda4213.htm (https://www.itl.nist.gov/div898/handbook/eda/section4/eda4213.htm))




  












   ,     .      ,    .     ?    ,      .  ,     ,   . ,      ,   .        .

  ,             . Ѡ   .

    :

*      .  describe;

*   .  describe;

*       ,    ? ( ).



    :

*    ?  ;

*     ?  ;

*    ?  ;

*        ?  ;

*      ,    ?   ;

*     ?  ,  ;

*       ?  NLTK.

   .

        ,  ,      .








       .   ,        .       ,   .

       : average_rating, num_pages, ratings_count, text_reviews_count.       ,      .    .   ,   text_reviews_count,    (): count, mean ..       .    ?   








  24,8   ,     (8,1252)    .  , ,       ,     .

    ? Ӡ   .    - . ? -,    ,    . ,      ,     3.9. -,     . ,   ,     .     ,       .      ,   ,      ,   std.   ,    .     ,      1   5.    : 25%, 50%, 75%?    : 25%    3.77, 50%    3.96..     :  ,     3.7725%.

      . ,  num_pages, ratings_count, text_reviews_count  . , ,  num_pages  344,   6576.  ,    344,  ,  6576.     .     ,     .   ,      ,     ,       .        papanda.













,   10149   10838.

   The Iliad, 8.   ,     ,   8. ,  .  The Iliad   .

26 ,   eng, 8669.

  13,    2000,   7332.

 4,   3,   2851.

 ,   :     ,   .    ,   6492.

,         multivolume  1,        0. ,  ,  ,   ( 8147).

-,    ? -,      ,  .      ?  ? Ѡ ?      ,   ,  . -,      ,     ,      ,  .     . ,     ,      . ,   -    ,     ,        .  ,       .

      () .














 


    ,   -  .     .   ,           . ,        ,      , 頖  , 頖  .      ,    .

    :

1)    ࠖ  ;

2)    ;

3)    ;

4)    ,   .

   .

        :

1.   ;

2.  ;

3. ;

4.   .

       (https://www.itl.nist.gov/div898/handbook/eda/section3/4plot.htm (https://www.itl.nist.gov/div898/handbook/eda/section3/4plot.htm)).        :








        .








  ,    .          .    .   :



    F (x),   ,    X    ,  x (, . 111).


    :



F (x)   ,     ,     ,    x ().


    .    ?     x .       y  ,          x. ,   4. ,       0,50,6, 55%.

     ,    .       . .   ( ).



      X   f (x)     F (x): f (x) = F (x) (, . 116).


   .








      . ,    , ,  .     .

  PDF, DF, PPF   [1.3.6.2. Related Distributions] (https://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm (https://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm))













      .       . -,         . -,    ,       .  ,    .    ,           , .    :   ?  ,     .

 ,      . ,   2.53.04.85.0   .      .  ,       ,     .          .

    .        , ࠖ    .   ,      .  .    (https://seaborn.pydata.org/generated/seaborn.histplot.html (https://seaborn.pydata.org/generated/seaborn.histplot.html)).

      [  ] (https://www.itl.nist.gov/div898/handbook/eda/section3/histogra.htm (https://www.itl.nist.gov/div898/handbook/eda/section3/histogra.htm)).       ( ,  , , ),           .  :

1)   ;

2)  ;

3) ;

4)  ;

5)    .

     6.5.2. What todo when data are non-normal (https://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm (https://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm))







       :



   ,    ,     .


     ,    ,    .    ,      ,   .        ,      ,       .    ,       ,    .

      :

1. ʠ    .    .       ,      .    ,    .

2.     ,   :

1)   ,       ;

2)  Grubbs Test      .

  :

1) Grubbs Test    ;

2) Tietjen-Moore Test ,      .      .

3) Generalized Extreme Studentized Deviate (ESD) Test ,     .        . ,     .

 ,     ,    .   ,      .























  .   -   .     .   ࠖ  25%-,  頖 75%.    .  ⠖ .

   ꠖ     ,  .      ?   ,    .  , .   .

  .



 0,5 .  = 0,25, 0,50,75   , = 0,2, 0,4, 0,6, 0,8  .





























  ?    ,   7, 8, 9, 1051, 53, 54, 100.   .   ࠖ   5.5.       . ?




  .


   .

   ,     (https://www.litres.ru/pages/biblio_book/?art=69435934)  .

      Visa, MasterCard, Maestro,    ,   ,     ,  PayPal, WebMoney, ., QIWI ,       .


