Umer Arif (@udadabhoy) 's Twitter Profile
Umer Arif

@udadabhoy

Programmer, Financial n Technical knowledge.
Python,
Django,
SQL,
Bootstrap,
HTML,
JavaScript,
Pandas.

ID: 3273151746

calendar_today09-07-2015 16:29:19

136 Tweet

210 Takipçi

3,3K Takip Edilen

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#python #pandas IF datafile too large to read use "chunksize" pd.read_csv('file path" , chunksize = 5000) #note large #chunksize can cause #data to skip

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#oneline to #find only #columns that #startwith some #specific #character [col for col in dataframe if col.startswith('character')] #pandas #python

Umer Arif (@udadabhoy) 's Twitter Profile Photo

working with #bigdata my 2cents: 1-dont try to #read or #load entire #data at once instead use #chunksize pd.read_csv(path , chunk size=n) you will not be able to read directly, its a #TextFileReader object

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#bigdata my 2cents: 2- Use #loop to #read from it , apply #counter & #break will help to #dataanalysis on a #sample help to know #dataPatterns , #trend and #shape in short amount of #time for c_no, chunk in enumerate(data): ...some code if c_no > 11: break

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#bigdata my 2cents: 3-#clean #data , sometimes #interpolation is better than #dropping #nan #values dataframe.interpolate() dataframe.fillna(some_value or function) dataframe.dropna()

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#BigData my 2cents: 4-#encode #categorical #values #sklearn #label_encoder for #target .replace(val1, with_val1x) only if straight forward like two with 2, #otherwise it will introduce #error C to 1 , D to 2 none higher or lower than the other

Umer Arif (@udadabhoy) 's Twitter Profile Photo

4. #use sklearn.preprocessing.OrdinalEncoder() , to encode each value inside a columns #use pd.getdummies(dataframe_name , columns = list_of_categorical_columns) , new column for each value, #no_error #cleaner #feature explanation but #dataset #expansion so not always ideal

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#BigData my 2cents: 5-#memory is very important try to find #correlation of each #feature with #target, #drop #insignificant #features #correlation < n #pandas dataframe.corr() handy method to #calculate #correlations, this will reduce data

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#BigData my 2cents: 6 - #memory is very important , #clean no longer required #data #variables, else forever stuck in #vertical #scaling of #hardware import gc import subprocess del dataframe_name gc.collect() if #model_training , #drop fed vars too

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#BigData my 2cents: 7- hopefully this will speed up fast enough , for further faster #processing further you can use import #modin.pandas as pd this will implement #multithreading , #careful it can slow down many tasks as well, also #some #code #does not like it

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#BigData my 2cents: #python #DataScience 8-#balance get a #ratio of each #class by #target, #sample as per the #lean #class , n = ratio of lesser occuring class df[df['target'] == higher_occuring_class].sample(frac=n) this will #balance the #examples in #training #data

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#BigData my #2cents : 9 to read small sample of data loops can be a way But #never use them to #read all #chunks instead use pd.concat(textfilereaderobject, ignore_index=True, axis =0) #THIS IS #Fast #Efficient #memory friendly

Umer Arif (@udadabhoy) 's Twitter Profile Photo

#recursion recursive func always get to the last call, and start evaluating and finalizing from the last one to first one because func call stops the caller from continuing to execute its code and instead makes program execute inside callee. Read more: stackoverflow.com/questions/4602…

Umer Arif (@udadabhoy) 's Twitter Profile Photo

Writing #SQL is integral part whether you are developing #webapp or working on #DataScience , sql-practice.com , provides a really practical way to learn from basics or to keep your skills sharp in between projects sql-practice.com