Predicting house prices using the Ames Housing Dataset.

Muhammad,Predict house prices.

What is best regression model based on the Ames Housing Dataset for predicting house prices?

We will be using RMSE as are main score variable and linear regression as are model. Sussess will be evaluated useing RMSE.

1.0 Import Libraries

import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn as sk
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
 
from sklearn.linear_model import LinearRegression, Lasso, LassoCV, RidgeCV
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.metrics import mean_squared_error

No scientification notation for best practices.

pd.options.display.float_format = '{:.4f}'.format # no sci notation
pd.set_option('display.max_columns', None)

1.1 Read Data

df= pd.read_csv('./datasets/train.csv')
df_test= pd.read_csv('./datasets/test.csv')
df_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 878 entries, 0 to 877
Data columns (total 80 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Id               878 non-null    int64  
 1   PID              878 non-null    int64  
 2   MS SubClass      878 non-null    int64  
 3   MS Zoning        878 non-null    object 
 4   Lot Frontage     718 non-null    float64
 5   Lot Area         878 non-null    int64  
 6   Street           878 non-null    object 
 7   Alley            58 non-null     object 
 8   Lot Shape        878 non-null    object 
 9   Land Contour     878 non-null    object 
 10  Utilities        878 non-null    object 
 11  Lot Config       878 non-null    object 
 12  Land Slope       878 non-null    object 
 13  Neighborhood     878 non-null    object 
 14  Condition 1      878 non-null    object 
 15  Condition 2      878 non-null    object 
 16  Bldg Type        878 non-null    object 
 17  House Style      878 non-null    object 
 18  Overall Qual     878 non-null    int64  
 19  Overall Cond     878 non-null    int64  
 20  Year Built       878 non-null    int64  
 21  Year Remod/Add   878 non-null    int64  
 22  Roof Style       878 non-null    object 
 23  Roof Matl        878 non-null    object 
 24  Exterior 1st     878 non-null    object 
 25  Exterior 2nd     878 non-null    object 
 26  Mas Vnr Type     877 non-null    object 
 27  Mas Vnr Area     877 non-null    float64
 28  Exter Qual       878 non-null    object 
 29  Exter Cond       878 non-null    object 
 30  Foundation       878 non-null    object 
 31  Bsmt Qual        853 non-null    object 
 32  Bsmt Cond        853 non-null    object 
 33  Bsmt Exposure    853 non-null    object 
 34  BsmtFin Type 1   853 non-null    object 
 35  BsmtFin SF 1     878 non-null    int64  
 36  BsmtFin Type 2   853 non-null    object 
 37  BsmtFin SF 2     878 non-null    int64  
 38  Bsmt Unf SF      878 non-null    int64  
 39  Total Bsmt SF    878 non-null    int64  
 40  Heating          878 non-null    object 
 41  Heating QC       878 non-null    object 
 42  Central Air      878 non-null    object 
 43  Electrical       877 non-null    object 
 44  1st Flr SF       878 non-null    int64  
 45  2nd Flr SF       878 non-null    int64  
 46  Low Qual Fin SF  878 non-null    int64  
 47  Gr Liv Area      878 non-null    int64  
 48  Bsmt Full Bath   878 non-null    int64  
 49  Bsmt Half Bath   878 non-null    int64  
 50  Full Bath        878 non-null    int64  
 51  Half Bath        878 non-null    int64  
 52  Bedroom AbvGr    878 non-null    int64  
 53  Kitchen AbvGr    878 non-null    int64  
 54  Kitchen Qual     878 non-null    object 
 55  TotRms AbvGrd    878 non-null    int64  
 56  Functional       878 non-null    object 
 57  Fireplaces       878 non-null    int64  
 58  Fireplace Qu     456 non-null    object 
 59  Garage Type      834 non-null    object 
 60  Garage Yr Blt    833 non-null    float64
 61  Garage Finish    833 non-null    object 
 62  Garage Cars      878 non-null    int64  
 63  Garage Area      878 non-null    int64  
 64  Garage Qual      833 non-null    object 
 65  Garage Cond      833 non-null    object 
 66  Paved Drive      878 non-null    object 
 67  Wood Deck SF     878 non-null    int64  
 68  Open Porch SF    878 non-null    int64  
 69  Enclosed Porch   878 non-null    int64  
 70  3Ssn Porch       878 non-null    int64  
 71  Screen Porch     878 non-null    int64  
 72  Pool Area        878 non-null    int64  
 73  Pool QC          4 non-null      object 
 74  Fence            172 non-null    object 
 75  Misc Feature     41 non-null     object 
 76  Misc Val         878 non-null    int64  
 77  Mo Sold          878 non-null    int64  
 78  Yr Sold          878 non-null    int64  
 79  Sale Type        878 non-null    object 
dtypes: float64(3), int64(35), object(42)
memory usage: 548.9+ KB

1.2 Check for Null Values

df.describe()
IdPIDMS SubClassLot FrontageLot AreaOverall QualOverall CondYear BuiltYear Remod/AddMas Vnr AreaBsmtFin SF 1BsmtFin SF 2Bsmt Unf SFTotal Bsmt SF1st Flr SF2nd Flr SFLow Qual Fin SFGr Liv AreaBsmt Full BathBsmt Half BathFull BathHalf BathBedroom AbvGrKitchen AbvGrTotRms AbvGrdFireplacesGarage Yr BltGarage CarsGarage AreaWood Deck SFOpen Porch SFEnclosed Porch3Ssn PorchScreen PorchPool AreaMisc ValMo SoldYr SoldSalePrice
count2051.00002051.00002051.00001721.00002051.00002051.00002051.00002051.00002051.00002029.00002050.00002050.00002050.00002050.00002051.00002051.00002051.00002051.00002049.00002049.00002051.00002051.00002051.00002051.00002051.00002051.00001937.00002050.00002050.00002051.00002051.00002051.00002051.00002051.00002051.00002051.00002051.00002051.00002051.0000
mean1474.0336713590006.091757.008869.055210065.20826.11215.56221971.70891984.190299.6959442.300547.9590567.72831057.98781164.4881329.32915.51291499.33010.42750.06341.57730.37102.84351.04296.43590.59091978.70781.7766473.671793.833747.556822.57192.591416.51152.397951.57446.21992007.7757181469.7016
std843.9808188691837.885342.824223.26076742.48891.42631.104530.177921.0363174.9631461.2041165.0009444.9548449.4107396.4469425.671051.0689500.44780.52270.25170.54930.50100.82660.20981.56020.638525.44110.7645215.9346128.549466.747259.845125.229657.374237.7826573.39402.74471.312079258.6594
min1.0000526301100.000020.000021.00001300.00001.00001.00001872.00001950.00000.00000.00000.00000.00000.0000334.00000.00000.0000334.00000.00000.00000.00000.00000.00000.00002.00000.00001895.00000.00000.00000.00000.00000.00000.00000.00000.00000.00001.00002006.000012789.0000
25%753.5000528458140.000020.000058.00007500.00005.00005.00001953.50001964.50000.00000.00000.0000220.0000793.0000879.50000.00000.00001129.00000.00000.00001.00000.00002.00001.00005.00000.00001961.00001.0000319.00000.00000.00000.00000.00000.00000.00000.00004.00002007.0000129825.0000
50%1486.0000535453200.000050.000068.00009430.00006.00005.00001974.00001993.00000.0000368.00000.0000474.5000994.50001093.00000.00000.00001444.00000.00000.00002.00000.00003.00001.00006.00001.00001980.00002.0000480.00000.000027.00000.00000.00000.00000.00000.00006.00002008.0000162500.0000
75%2198.0000907180080.000070.000080.000011513.50007.00006.00002001.00002004.0000161.0000733.75000.0000811.00001318.75001405.0000692.50000.00001728.50001.00000.00002.00001.00003.00001.00007.00001.00002002.00002.0000576.0000168.000070.00000.00000.00000.00000.00000.00008.00002009.0000214000.0000
max2930.0000924152030.0000190.0000313.0000159000.000010.00009.00002010.00002010.00001600.00005644.00001474.00002336.00006110.00005095.00001862.00001064.00005642.00003.00002.00004.00002.00008.00003.000015.00004.00002207.00005.00001418.00001424.0000547.0000432.0000508.0000490.0000800.000017000.000012.00002010.0000611657.0000
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2051 entries, 0 to 2050
Data columns (total 81 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Id               2051 non-null   int64  
 1   PID              2051 non-null   int64  
 2   MS SubClass      2051 non-null   int64  
 3   MS Zoning        2051 non-null   object 
 4   Lot Frontage     1721 non-null   float64
 5   Lot Area         2051 non-null   int64  
 6   Street           2051 non-null   object 
 7   Alley            140 non-null    object 
 8   Lot Shape        2051 non-null   object 
 9   Land Contour     2051 non-null   object 
 10  Utilities        2051 non-null   object 
 11  Lot Config       2051 non-null   object 
 12  Land Slope       2051 non-null   object 
 13  Neighborhood     2051 non-null   object 
 14  Condition 1      2051 non-null   object 
 15  Condition 2      2051 non-null   object 
 16  Bldg Type        2051 non-null   object 
 17  House Style      2051 non-null   object 
 18  Overall Qual     2051 non-null   int64  
 19  Overall Cond     2051 non-null   int64  
 20  Year Built       2051 non-null   int64  
 21  Year Remod/Add   2051 non-null   int64  
 22  Roof Style       2051 non-null   object 
 23  Roof Matl        2051 non-null   object 
 24  Exterior 1st     2051 non-null   object 
 25  Exterior 2nd     2051 non-null   object 
 26  Mas Vnr Type     2029 non-null   object 
 27  Mas Vnr Area     2029 non-null   float64
 28  Exter Qual       2051 non-null   object 
 29  Exter Cond       2051 non-null   object 
 30  Foundation       2051 non-null   object 
 31  Bsmt Qual        1996 non-null   object 
 32  Bsmt Cond        1996 non-null   object 
 33  Bsmt Exposure    1993 non-null   object 
 34  BsmtFin Type 1   1996 non-null   object 
 35  BsmtFin SF 1     2050 non-null   float64
 36  BsmtFin Type 2   1995 non-null   object 
 37  BsmtFin SF 2     2050 non-null   float64
 38  Bsmt Unf SF      2050 non-null   float64
 39  Total Bsmt SF    2050 non-null   float64
 40  Heating          2051 non-null   object 
 41  Heating QC       2051 non-null   object 
 42  Central Air      2051 non-null   object 
 43  Electrical       2051 non-null   object 
 44  1st Flr SF       2051 non-null   int64  
 45  2nd Flr SF       2051 non-null   int64  
 46  Low Qual Fin SF  2051 non-null   int64  
 47  Gr Liv Area      2051 non-null   int64  
 48  Bsmt Full Bath   2049 non-null   float64
 49  Bsmt Half Bath   2049 non-null   float64
 50  Full Bath        2051 non-null   int64  
 51  Half Bath        2051 non-null   int64  
 52  Bedroom AbvGr    2051 non-null   int64  
 53  Kitchen AbvGr    2051 non-null   int64  
 54  Kitchen Qual     2051 non-null   object 
 55  TotRms AbvGrd    2051 non-null   int64  
 56  Functional       2051 non-null   object 
 57  Fireplaces       2051 non-null   int64  
 58  Fireplace Qu     1051 non-null   object 
 59  Garage Type      1938 non-null   object 
 60  Garage Yr Blt    1937 non-null   float64
 61  Garage Finish    1937 non-null   object 
 62  Garage Cars      2050 non-null   float64
 63  Garage Area      2050 non-null   float64
 64  Garage Qual      1937 non-null   object 
 65  Garage Cond      1937 non-null   object 
 66  Paved Drive      2051 non-null   object 
 67  Wood Deck SF     2051 non-null   int64  
 68  Open Porch SF    2051 non-null   int64  
 69  Enclosed Porch   2051 non-null   int64  
 70  3Ssn Porch       2051 non-null   int64  
 71  Screen Porch     2051 non-null   int64  
 72  Pool Area        2051 non-null   int64  
 73  Pool QC          9 non-null      object 
 74  Fence            400 non-null    object 
 75  Misc Feature     65 non-null     object 
 76  Misc Val         2051 non-null   int64  
 77  Mo Sold          2051 non-null   int64  
 78  Yr Sold          2051 non-null   int64  
 79  Sale Type        2051 non-null   object 
 80  SalePrice        2051 non-null   int64  
dtypes: float64(11), int64(28), object(42)
memory usage: 1.3+ MB

Data Cleaning steps

1.Handling Missing Values:

Remove the rows/columns with missing values or impute them. For categorical data, we often fill missing values with the mode (the most frequent category) of the column, or we can use a placeholder like “Unknown”. For numerical data, we can use measures like the mean or median to fill in missing values, or use more sophisticated methods like model-based imputation. Converting Data Types:

May need to convert some object types to categorical if they represent categories. Some numerical types representing categories may also need to be converted to the categorical type. Handling Duplicates:

We need to identify and remove duplicate rows if any.

# **Data Cleaning and EDA**
 
# - Are outliers identified and addressed?
# - Are appropriate summary statistics provided?
# - Are steps taken during data cleaning and EDA framed appropriately?
# - Does the student address whether or not they are likely to be able to answer their problem statement with the provided data given what they've discovered during EDA?

Showing missing values and thier percentages.

missing_val = df.isnull().sum()
missing_val = missing_val[missing_val > 0].sort_values(ascending=False)
missing_val_per = (missing_val / len(df)) * 100
                                
missing_df = pd.DataFrame({'Missing Values': missing_val, 'Percentage': missing_val_per})# Creating a DF to display the missing values and their corresponding percentages
missing_df
Missing ValuesPercentage
Pool QC204299.5612
Misc Feature198696.8308
Alley191193.1741
Fence165180.4973
Fireplace Qu100048.7567
Lot Frontage33016.0897
Garage Yr Blt1145.5583
Garage Cond1145.5583
Garage Qual1145.5583
Garage Finish1145.5583
Garage Type1135.5095
Bsmt Exposure582.8279
BsmtFin Type 2562.7304
Bsmt Cond552.6816
Bsmt Qual552.6816
BsmtFin Type 1552.6816
Mas Vnr Area221.0726
Mas Vnr Type221.0726
Bsmt Half Bath20.0975
Bsmt Full Bath20.0975
Total Bsmt SF10.0488
Bsmt Unf SF10.0488
BsmtFin SF 210.0488
Garage Cars10.0488
Garage Area10.0488
BsmtFin SF 110.0488
print(f"Shape: {df.shape}")
df.head()
Shape: (2051, 81)
IdPIDMS SubClassMS ZoningLot FrontageLot AreaStreetAlleyLot ShapeLand ContourUtilitiesLot ConfigLand SlopeNeighborhoodCondition 1Condition 2Bldg TypeHouse StyleOverall QualOverall CondYear BuiltYear Remod/AddRoof StyleRoof MatlExterior 1stExterior 2ndMas Vnr TypeMas Vnr AreaExter QualExter CondFoundationBsmt QualBsmt CondBsmt ExposureBsmtFin Type 1BsmtFin SF 1BsmtFin Type 2BsmtFin SF 2Bsmt Unf SFTotal Bsmt SFHeatingHeating QCCentral AirElectrical1st Flr SF2nd Flr SFLow Qual Fin SFGr Liv AreaBsmt Full BathBsmt Half BathFull BathHalf BathBedroom AbvGrKitchen AbvGrKitchen QualTotRms AbvGrdFunctionalFireplacesFireplace QuGarage TypeGarage Yr BltGarage FinishGarage CarsGarage AreaGarage QualGarage CondPaved DriveWood Deck SFOpen Porch SFEnclosed Porch3Ssn PorchScreen PorchPool AreaPool QCFenceMisc FeatureMisc ValMo SoldYr SoldSale TypeSalePrice
010953335217060RLNaN13517PaveNaNIR1LvlAllPubCulDSacGtlSawyerRRAeNorm1Fam2Story6819762005GableCompShgHdBoardPlywoodBrkFace289.0000GdTACBlockTATANoGLQ533.0000Unf0.0000192.0000725.0000GasAExYSBrkr725754014790.00000.00002131Gd6Typ0NaNAttchd1976.0000RFn2.0000475.0000TATAY0440000NaNNaNNaN032010WD130500
154453137905060RL43.000011492PaveNaNIR1LvlAllPubCulDSacGtlSawyerWNormNorm1Fam2Story7519961997GableCompShgVinylSdVinylSdBrkFace132.0000GdTAPConcGdTANoGLQ637.0000Unf0.0000276.0000913.0000GasAExYSBrkr9131209021221.00000.00002141Gd8Typ1TAAttchd1997.0000RFn2.0000559.0000TATAY0740000NaNNaNNaN042009WD220000
215353530418020RL68.00007922PaveNaNRegLvlAllPubInsideGtlNAmesNormNorm1Fam1Story5719532007GableCompShgVinylSdVinylSdNone0.0000TAGdCBlockTATANoGLQ731.0000Unf0.0000326.00001057.0000GasATAYSBrkr10570010571.00000.00001031Gd5Typ0NaNDetchd1953.0000Unf1.0000246.0000TATAY0520000NaNNaNNaN012010WD109000
331891638606060RL73.00009802PaveNaNRegLvlAllPubInsideGtlTimberNormNorm1Fam2Story5520062007GableCompShgVinylSdVinylSdNone0.0000TATAPConcGdTANoUnf0.0000Unf0.0000384.0000384.0000GasAGdYSBrkr744700014440.00000.00002131TA7Typ0NaNBuiltIn2007.0000Fin2.0000400.0000TATAY10000000NaNNaNNaN042010WD174000
425590642504550RL82.000014235PaveNaNIR1LvlAllPubInsideGtlSawyerWNormNorm1Fam1.5Fin6819001993GableCompShgWd SdngPlywoodNone0.0000TATAPConcFaGdNoUnf0.0000Unf0.0000676.0000676.0000GasATAYSBrkr831614014450.00000.00002031TA6Typ0NaNDetchd1957.0000Unf2.0000484.0000TATAN0590000NaNNaNNaN032010WD138500

Before droping Colums and values; making a copy of the original

df2= df.copy()

Droping Colums with more then 80% NaN values

df2.drop([ "Pool QC", "Misc Feature","Alley", 'Fence' ], axis=1, inplace=True)

Ordinal mapping for data description

#Taken from data description
# FireplaceQu: Fireplace quality
# Ex Excellent - Exceptional Masonry Fireplace
# Gd Good - Masonry Fireplace in main level
# TA Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
# Fa Fair - Prefabricated Fireplace in basement
# Po Poor - Ben Franklin Stove
# NA No Fireplace
ordinal_mapping = {'NA':0, 'Po' : 1, 'Fa' : 2, 'TA' : 3, 'Gd' : 4, 'Ex' : 5}
df2['Fireplace Qu'].isnull().sum()/len(df2['Fireplace Qu']) * 100
48.75670404680644

Modrate NaN value features should be imputed

#Waiting to see what Gd, TA and NaN mean here
#df2['Fireplace Qu'].fillna(df2['Fireplace Qu'].median(), inplace=True)
df2['Fireplace Qu'].fillna('NA', inplace=True)

For the column “Lot Frontage” a meadian or a mode can be used. However, a lot on a same street or area would have aproximattly the same lot frontage. Next few lines of code will help deside what value in appropriate to filled inplace of non-values.

df2['Lot Frontage'].median()
68.0
df2['Lot Frontage'].mean()
69.05520046484602
 
plt.figure(figsize=(10, 6))
sns.histplot(df2['Lot Frontage'], kde=True, bins=30, color="black", line_kws={"color": "gray"})
plt.title('Distribution of Lot Frontage')
plt.xlabel('Lot Frontage (Linear feet of street connected to property)')
plt.ylabel('Frequency')
plt.show()

The Lot Frontage is normally distributed. Therefore, a mean valuse would be appropriate also beacuse only 16.08% of the column valuses are null.

df2['Lot Frontage'].fillna(df2['Lot Frontage'].mean(), inplace=True)

For ‘Garage Cond’

# GarageCond: Garage condition
# Ex Excellent
# Gd Good
# TA Typical/Average
# Fa Fair
# Po Poor
# NA No Garage
 
df2['Garage Cond'].unique()
array(['TA', 'Fa', nan, 'Po', 'Gd', 'Ex'], dtype=object)

The data description indicateds ‘NA’ for No Garage. However, when .unique() is called on the column there is no “NA” value. Hence, the nan value can be substituted with “NA’ indicating no Garage. Assuming that there is a parking lot for resident parking.

df2['Garage Cond'].fillna('NA', inplace=True)
df2['Garage Cond'].describe()
count     2051
unique       6
top         TA
freq      1868
Name: Garage Cond, dtype: object
df2['Garage Qual'].unique()
array(['TA', 'Fa', nan, 'Gd', 'Ex', 'Po'], dtype=object)
df2['Garage Qual'].fillna('NA', inplace=True)
df2['Garage Finish'].unique()
array(['RFn', 'Unf', 'Fin', nan], dtype=object)
df2['Garage Finish'].fillna('NA', inplace=True)
df2['Garage Yr Blt'].unique()
array([1976., 1997., 1953., 2007., 1957., 1966., 2005., 1959., 1952.,
       1969., 1971., 1900., 2000., 2004., 1916., 1963., 1977., 2009.,
       1968., 1992., 1955., 1961., 1973., 1937.,   nan, 2003., 1981.,
       1931., 1995., 1958., 1965., 2006., 1978., 1954., 1935., 1951.,
       1996., 1999., 1920., 1930., 1924., 1960., 1949., 1986., 1956.,
       1994., 1979., 1964., 2001., 1972., 1939., 1962., 1927., 1948.,
       1967., 1993., 2010., 1915., 1987., 1970., 1988., 1982., 1941.,
       1984., 1942., 1950., 2002., 1975., 2008., 1974., 1998., 1918.,
       1938., 1985., 1923., 1980., 1991., 1946., 1940., 1990., 1896.,
       1983., 1914., 1945., 1921., 1925., 1926., 1936., 1932., 1947.,
       1929., 1910., 1917., 1922., 1934., 1989., 1928., 2207., 1933.,
       1895., 1919.])
#df2['Garage Yr Blt'] = pd.to_numeric(df2['Garage Yr Blt'], errors='coerce') # Convert to numeric
#df2['Garage Yr Blt'].astype(int)
df2['Garage Yr Blt'].fillna(int(df2['Garage Yr Blt'].mean()), inplace=True) # instead of filling it with 0, a mean a much better option because these are all years.
df2['Garage Yr Blt'].unique()
array([1976., 1997., 1953., 2007., 1957., 1966., 2005., 1959., 1952.,
       1969., 1971., 1900., 2000., 2004., 1916., 1963., 1977., 2009.,
       1968., 1992., 1955., 1961., 1973., 1937., 1978., 2003., 1981.,
       1931., 1995., 1958., 1965., 2006., 1954., 1935., 1951., 1996.,
       1999., 1920., 1930., 1924., 1960., 1949., 1986., 1956., 1994.,
       1979., 1964., 2001., 1972., 1939., 1962., 1927., 1948., 1967.,
       1993., 2010., 1915., 1987., 1970., 1988., 1982., 1941., 1984.,
       1942., 1950., 2002., 1975., 2008., 1974., 1998., 1918., 1938.,
       1985., 1923., 1980., 1991., 1946., 1940., 1990., 1896., 1983.,
       1914., 1945., 1921., 1925., 1926., 1936., 1932., 1947., 1929.,
       1910., 1917., 1922., 1934., 1989., 1928., 2207., 1933., 1895.,
       1919.])
df2['Garage Type'].unique()
array(['Attchd', 'Detchd', 'BuiltIn', 'Basment', nan, '2Types', 'CarPort'],
      dtype=object)
df2['Garage Type'].fillna('NA', inplace=True)

All other null values can be droped as they are less then 2% of their respective columns.

To verify missing values

df2.dropna(inplace= True) 
missing_val = df2.isnull().sum()
missing_val = missing_val[missing_val > 0].sort_values(ascending=False)
missing_val_per = (missing_val / len(df2)) * 100
                                
missing_df = pd.DataFrame({'Missing Values': missing_val, 'Percentage': missing_val_per})# Creating a DF to display the missing values and their corresponding percentages
missing_df
Missing ValuesPercentage

Successfully removed all Null valuse from the entire dataset.

For EDA, I will start with descriptive statistics

Setting ID as the index

df2.set_index("Id", inplace=True)
df2.sort_index(ascending=True, inplace=True)
df2.head()
PIDMS SubClassMS ZoningLot FrontageLot AreaStreetLot ShapeLand ContourUtilitiesLot ConfigLand SlopeNeighborhoodCondition 1Condition 2Bldg TypeHouse StyleOverall QualOverall CondYear BuiltYear Remod/AddRoof StyleRoof MatlExterior 1stExterior 2ndMas Vnr TypeMas Vnr AreaExter QualExter CondFoundationBsmt QualBsmt CondBsmt ExposureBsmtFin Type 1BsmtFin SF 1BsmtFin Type 2BsmtFin SF 2Bsmt Unf SFTotal Bsmt SFHeatingHeating QCCentral AirElectrical1st Flr SF2nd Flr SFLow Qual Fin SFGr Liv AreaBsmt Full BathBsmt Half BathFull BathHalf BathBedroom AbvGrKitchen AbvGrKitchen QualTotRms AbvGrdFunctionalFireplacesFireplace QuGarage TypeGarage Yr BltGarage FinishGarage CarsGarage AreaGarage QualGarage CondPaved DriveWood Deck SFOpen Porch SFEnclosed Porch3Ssn PorchScreen PorchPool AreaMisc ValMo SoldYr SoldSale TypeSalePrice
Id
152630110020RL141.000031770PaveIR1LvlAllPubCornerGtlNAmesNormNorm1Fam1Story6519601960HipCompShgBrkFacePlywoodStone112.0000TATACBlockTAGdGdBLQ639.0000Unf0.0000441.00001080.0000GasAFaYSBrkr16560016561.00000.00001031TA7Typ2GdAttchd1960.0000Fin2.0000528.0000TATAP210620000052010WD215000
352635101020RL81.000014267PaveIR1LvlAllPubCornerGtlNAmesNormNorm1Fam1Story6619581958HipCompShgWd SdngWd SdngBrkFace108.0000TATACBlockTATANoALQ923.0000Unf0.0000406.00001329.0000GasATAYSBrkr13290013290.00000.00001131Gd6Typ0NAAttchd1958.0000Unf1.0000312.0000TATAY3933600001250062010WD172000
552710501060RL74.000013830PaveIR1LvlAllPubInsideGtlGilbertNormNorm1Fam2Story5519971998GableCompShgVinylSdVinylSdNone0.0000TATAPConcGdTANoGLQ791.0000Unf0.0000137.0000928.0000GasAGdYSBrkr928701016290.00000.00002131TA6Typ1TAAttchd1997.0000Fin2.0000482.0000TATAY212340000032010WD189900
8527145080120RL43.00005005PaveIR1HLSAllPubInsideGtlStoneBrNormNormTwnhsE1Story8519921992GableCompShgHdBoardHdBoardNone0.0000GdTAPConcGdTANoALQ263.0000Unf0.00001017.00001280.0000GasAExYSBrkr12800012800.00000.00002021Gd5Typ0NAAttchd1992.0000RFn2.0000506.0000TATAY082001440012010WD191500
9527146030120RL39.00005389PaveIR1LvlAllPubInsideGtlStoneBrNormNormTwnhsE1Story8519951996GableCompShgCemntBdCmentBdNone0.0000GdTAPConcGdTANoGLQ1180.0000Unf0.0000415.00001595.0000GasAExYSBrkr16160016161.00000.00002021Gd5Typ1TAAttchd1995.0000RFn2.0000608.0000TATAY2371520000032010WD236500

Using .describe().This will provide you with the count, mean, standard deviation, minimum, 25th percentile, median, 75th percentile, and maximum for each numerical column.

df2.describe()
PIDMS SubClassLot FrontageLot AreaOverall QualOverall CondYear BuiltYear Remod/AddMas Vnr AreaBsmtFin SF 1BsmtFin SF 2Bsmt Unf SFTotal Bsmt SF1st Flr SF2nd Flr SFLow Qual Fin SFGr Liv AreaBsmt Full BathBsmt Half BathFull BathHalf BathBedroom AbvGrKitchen AbvGrTotRms AbvGrdFireplacesGarage Yr BltGarage CarsGarage AreaWood Deck SFOpen Porch SFEnclosed Porch3Ssn PorchScreen PorchPool AreaMisc ValMo SoldYr SoldSalePrice
count1969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.00001969.0000
mean712565924.159057.079768.873710005.61966.15495.58351971.81461984.4347101.7501454.235149.5378580.45661084.22961164.3271331.92795.61151501.86640.43930.06601.57080.37532.84871.03306.43580.59981978.61401.7816475.140795.962947.869022.29562.699316.86492.212851.38606.21232007.7760182892.8619
std188672900.752243.009021.24186710.22581.38591.105430.227820.8607176.4330461.0140167.7054439.5154417.2030392.5702425.936351.9046498.91400.52530.25640.54680.50080.81490.18701.55390.640224.25990.7605215.2032129.593166.291759.698825.744257.797136.4442579.72022.74761.313378938.3343
min526301100.000020.000021.00001300.00001.00001.00001872.00001950.00000.00000.00000.00000.0000160.0000438.00000.00000.0000438.00000.00000.00000.00000.00000.00000.00003.00000.00001895.00000.00000.00000.00000.00000.00000.00000.00000.00000.00001.00002006.000012789.0000
25%528456240.000020.000060.00007500.00005.00005.00001954.00001965.00000.00000.00000.0000240.0000811.0000879.00000.00000.00001134.00000.00000.00001.00000.00002.00001.00005.00000.00001962.00001.0000320.00000.00000.00000.00000.00000.00000.00000.00004.00002007.0000130000.0000
50%535452090.000050.000069.05529405.00006.00005.00001975.00001993.00000.0000384.00000.0000484.00001008.00001092.00000.00000.00001445.00000.00000.00002.00000.00003.00001.00006.00001.00001978.00002.0000480.00000.000028.00000.00000.00000.00000.00000.00006.00002008.0000163500.0000
75%907175100.000070.000078.000011492.00007.00006.00002001.00002004.0000164.0000741.00000.0000816.00001329.00001405.0000694.00000.00001728.00001.00000.00002.00001.00003.00001.00007.00001.00002001.00002.0000576.0000168.000070.00000.00000.00000.00000.00000.00008.00002009.0000214900.0000
max924152030.0000190.0000313.0000159000.000010.00009.00002010.00002010.00001600.00005644.00001474.00002336.00006110.00005095.00001862.00001064.00005642.00003.00002.00004.00002.00008.00003.000015.00004.00002010.00005.00001418.00001424.0000547.0000432.0000508.0000490.0000800.000017000.000012.00002010.0000611657.0000
df2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1969 entries, 1 to 2930
Data columns (total 76 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   PID              1969 non-null   int64  
 1   MS SubClass      1969 non-null   int64  
 2   MS Zoning        1969 non-null   object 
 3   Lot Frontage     1969 non-null   float64
 4   Lot Area         1969 non-null   int64  
 5   Street           1969 non-null   object 
 6   Lot Shape        1969 non-null   object 
 7   Land Contour     1969 non-null   object 
 8   Utilities        1969 non-null   object 
 9   Lot Config       1969 non-null   object 
 10  Land Slope       1969 non-null   object 
 11  Neighborhood     1969 non-null   object 
 12  Condition 1      1969 non-null   object 
 13  Condition 2      1969 non-null   object 
 14  Bldg Type        1969 non-null   object 
 15  House Style      1969 non-null   object 
 16  Overall Qual     1969 non-null   int64  
 17  Overall Cond     1969 non-null   int64  
 18  Year Built       1969 non-null   int64  
 19  Year Remod/Add   1969 non-null   int64  
 20  Roof Style       1969 non-null   object 
 21  Roof Matl        1969 non-null   object 
 22  Exterior 1st     1969 non-null   object 
 23  Exterior 2nd     1969 non-null   object 
 24  Mas Vnr Type     1969 non-null   object 
 25  Mas Vnr Area     1969 non-null   float64
 26  Exter Qual       1969 non-null   object 
 27  Exter Cond       1969 non-null   object 
 28  Foundation       1969 non-null   object 
 29  Bsmt Qual        1969 non-null   object 
 30  Bsmt Cond        1969 non-null   object 
 31  Bsmt Exposure    1969 non-null   object 
 32  BsmtFin Type 1   1969 non-null   object 
 33  BsmtFin SF 1     1969 non-null   float64
 34  BsmtFin Type 2   1969 non-null   object 
 35  BsmtFin SF 2     1969 non-null   float64
 36  Bsmt Unf SF      1969 non-null   float64
 37  Total Bsmt SF    1969 non-null   float64
 38  Heating          1969 non-null   object 
 39  Heating QC       1969 non-null   object 
 40  Central Air      1969 non-null   object 
 41  Electrical       1969 non-null   object 
 42  1st Flr SF       1969 non-null   int64  
 43  2nd Flr SF       1969 non-null   int64  
 44  Low Qual Fin SF  1969 non-null   int64  
 45  Gr Liv Area      1969 non-null   int64  
 46  Bsmt Full Bath   1969 non-null   float64
 47  Bsmt Half Bath   1969 non-null   float64
 48  Full Bath        1969 non-null   int64  
 49  Half Bath        1969 non-null   int64  
 50  Bedroom AbvGr    1969 non-null   int64  
 51  Kitchen AbvGr    1969 non-null   int64  
 52  Kitchen Qual     1969 non-null   object 
 53  TotRms AbvGrd    1969 non-null   int64  
 54  Functional       1969 non-null   object 
 55  Fireplaces       1969 non-null   int64  
 56  Fireplace Qu     1969 non-null   object 
 57  Garage Type      1969 non-null   object 
 58  Garage Yr Blt    1969 non-null   float64
 59  Garage Finish    1969 non-null   object 
 60  Garage Cars      1969 non-null   float64
 61  Garage Area      1969 non-null   float64
 62  Garage Qual      1969 non-null   object 
 63  Garage Cond      1969 non-null   object 
 64  Paved Drive      1969 non-null   object 
 65  Wood Deck SF     1969 non-null   int64  
 66  Open Porch SF    1969 non-null   int64  
 67  Enclosed Porch   1969 non-null   int64  
 68  3Ssn Porch       1969 non-null   int64  
 69  Screen Porch     1969 non-null   int64  
 70  Pool Area        1969 non-null   int64  
 71  Misc Val         1969 non-null   int64  
 72  Mo Sold          1969 non-null   int64  
 73  Yr Sold          1969 non-null   int64  
 74  Sale Type        1969 non-null   object 
 75  SalePrice        1969 non-null   int64  
dtypes: float64(11), int64(27), object(38)
memory usage: 1.2+ MB
non_numerical_columns = df2.select_dtypes(include=['object']).columns
non_numerical_columns
Index(['MS Zoning', 'Street', 'Lot Shape', 'Land Contour', 'Utilities',
       'Lot Config', 'Land Slope', 'Neighborhood', 'Condition 1',
       'Condition 2', 'Bldg Type', 'House Style', 'Roof Style', 'Roof Matl',
       'Exterior 1st', 'Exterior 2nd', 'Mas Vnr Type', 'Exter Qual',
       'Exter Cond', 'Foundation', 'Bsmt Qual', 'Bsmt Cond', 'Bsmt Exposure',
       'BsmtFin Type 1', 'BsmtFin Type 2', 'Heating', 'Heating QC',
       'Central Air', 'Electrical', 'Kitchen Qual', 'Functional',
       'Fireplace Qu', 'Garage Type', 'Garage Finish', 'Garage Qual',
       'Garage Cond', 'Paved Drive', 'Sale Type'],
      dtype='object')

Baseline Correlation

correlation_matrix = df2[['Lot Area', 'Lot Frontage', 'SalePrice']].corr()
 
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap="binary", square=True, linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

EDA Continued

Ploting correlation of non-categorical features with sales price

numeric_column_corr = df2.select_dtypes(include=['int64', 'float64']).corr()[['SalePrice']].sort_values('SalePrice', ascending=False)
numeric_column_corr
SalePrice
SalePrice1.0000
Overall Qual0.7976
Gr Liv Area0.7009
Garage Cars0.6515
Garage Area0.6510
1st Flr SF0.6301
Total Bsmt SF0.6293
Year Built0.5678
Full Bath0.5480
Year Remod/Add0.5415
Garage Yr Blt0.5215
TotRms AbvGrd0.5135
Mas Vnr Area0.5104
Fireplaces0.4636
BsmtFin SF 10.4110
Lot Frontage0.3371
Wood Deck SF0.3229
Open Porch SF0.3208
Lot Area0.3085
Half Bath0.2722
Bsmt Full Bath0.2698
2nd Flr SF0.2454
Bsmt Unf SF0.1623
Screen Porch0.1389
Bedroom AbvGr0.1298
3Ssn Porch0.0481
Pool Area0.0291
Mo Sold0.0237
BsmtFin SF 20.0104
Misc Val-0.0035
Yr Sold-0.0094
Low Qual Fin SF-0.0418
Bsmt Half Bath-0.0511
MS SubClass-0.0874
Kitchen AbvGr-0.0917
Overall Cond-0.1145
Enclosed Porch-0.1385
PID-0.2487
plt.figure(figsize=(7, 11))
sns.heatmap(numeric_column_corr, cmap="binary")  
plt.title('Correlation Heatmap of Sales Price')
plt.show()

A baseline model of highest correlated features

We will choose the top 10 non-categorical (numeric) features with highest correlation to Sales Price to train a baseline model

 
X = df2[numeric_column_corr.index.values[1:11]]
y = df2['SalePrice']
print(f"X shape {X.shape}")
print(f"y shape {y.shape}")
 
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Instantiate
model = LinearRegression()
model.fit(X_train, y_train)
 
y_pred = model.predict(X_test)
 
# metrics
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
 
f"rmse = {rmse}", f"r2 = {r2}"
X shape (1969, 10)
y shape (1969,)

('rmse = 40412.86100369427', 'r2 = 0.7626541968542401')

With Cross validation

kf = KFold(n_splits=5, shuffle=True, random_state=42)
model = LinearRegression()
mse_scores = []
r2_scores = []
for train_index, test_index in kf.split(X):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse_scores.append(mean_squared_error(y_test, y_pred, squared=False))
    r2_scores.append(r2_score(y_test, y_pred))
 
f"mean rmse = {np.mean(mse_scores)}", f" mean r2 = {np.mean(r2_scores)}"
('mean rmse = 36949.87866445592', ' mean r2 = 0.779848655366105')

What about adding regularization?

 
# Base line predictions
X = df2[numeric_column_corr.index.values[1:11]]
y = df2['SalePrice']
print(f"X shape {X.shape}")
print(f"y shape {y.shape}")
 
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Instantiate
model = LassoCV(cv=5)  # 5-fold cross-validation
model.fit(X_train, y_train)
 
y_pred = model.predict(X_test)
 
# metrics
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
 
f"rmse = {rmse}", f"r2 = {r2}"
X shape (1969, 10)
y shape (1969,)

('rmse = 44359.00408473574', 'r2 = 0.7140395731173013')

A basline model with all the numeric features?

X = df2.select_dtypes(include=['int64', 'float64']).drop('SalePrice', axis=1)
y = df2['SalePrice']
print(f"X shape {X.shape}")
print(f"y shape {y.shape}")
 
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Instantiate
model = LinearRegression()
model.fit(X_train, y_train)
 
y_pred = model.predict(X_test)
 
# metrics
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
 
f"rmse = {rmse}", f"r2 = {r2}"
 
X shape (1969, 37)
y shape (1969,)

('rmse = 38337.59165478559', 'r2 = 0.7864045402331978')

With Cross validation

kf = KFold(n_splits=5, shuffle=True, random_state=42)
model = LinearRegression()
mse_scores = []
r2_scores = []
for train_index, test_index in kf.split(X):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse_scores.append(mean_squared_error(y_test, y_pred, squared=False))
    r2_scores.append(r2_score(y_test, y_pred))
 
f"mean rmse = {np.mean(mse_scores)}", f" mean r2 = {np.mean(r2_scores)}"
('mean rmse = 35642.85184537606', ' mean r2 = 0.791738446278189')

and with regularization

X = df2.select_dtypes(include=['int64', 'float64']).drop('SalePrice', axis=1)
y = df2['SalePrice']
 
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Instantiate
model = LassoCV(cv=5)  # 5-fold cross-validation
model.fit(X_train, y_train)
 
y_pred = model.predict(X_test)
 
# metrics
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
 
f"rmse = {rmse}", f"r2 = {r2}"
('rmse = 80240.8060669871', 'r2 = 0.06430797487379125')

A discovery: LassoCV’s RMSE and R2 for only numeric features gives us the most undesirable result.

model.coef_
array([-0.00010052, -0.        ,  0.        ,  0.        ,  0.        ,
       -0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
       -0.        ,  0.        ,  0.        , -0.        ,  0.        ,
        0.        ,  0.        , -0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
       -0.        ,  0.        ,  0.        ,  0.        , -0.        ,
        0.        , -0.        ])

Upon research, this phenomenon is due to the fact that Lasso regression introduces L1 regularization, which can force some coefficients to be exactly zero.

Solution: Scaling the features before modeling.

 
 
X = df2.select_dtypes(include=['int64', 'float64']).drop('SalePrice', axis=1)
y = df2['SalePrice']
 
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Scaling the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
 
# Instantiate
model = LassoCV(cv=5)  # 5-fold cross-validation
model.fit(X_train_scaled, y_train)
 
y_pred = model.predict(X_test_scaled)
 
# metrics
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
 
f"rmse = {rmse}", f"r2 = {r2}"
('rmse = 37934.24414094182', 'r2 = 0.7908753474821981')

A basline model with all the numeric and dummied features?

set(df2.dtypes.values)
{dtype('int64'), dtype('float64'), dtype('O')}
categorical_df = df2.select_dtypes(include=['object'])
categorical_df
MS ZoningStreetLot ShapeLand ContourUtilitiesLot ConfigLand SlopeNeighborhoodCondition 1Condition 2Bldg TypeHouse StyleRoof StyleRoof MatlExterior 1stExterior 2ndMas Vnr TypeExter QualExter CondFoundationBsmt QualBsmt CondBsmt ExposureBsmtFin Type 1BsmtFin Type 2HeatingHeating QCCentral AirElectricalKitchen QualFunctionalFireplace QuGarage TypeGarage FinishGarage QualGarage CondPaved DriveSale Type
Id
1RLPaveIR1LvlAllPubCornerGtlNAmesNormNorm1Fam1StoryHipCompShgBrkFacePlywoodStoneTATACBlockTAGdGdBLQUnfGasAFaYSBrkrTATypGdAttchdFinTATAPWD
3RLPaveIR1LvlAllPubCornerGtlNAmesNormNorm1Fam1StoryHipCompShgWd SdngWd SdngBrkFaceTATACBlockTATANoALQUnfGasATAYSBrkrGdTypNAAttchdUnfTATAYWD
5RLPaveIR1LvlAllPubInsideGtlGilbertNormNorm1Fam2StoryGableCompShgVinylSdVinylSdNoneTATAPConcGdTANoGLQUnfGasAGdYSBrkrTATypTAAttchdFinTATAYWD
8RLPaveIR1HLSAllPubInsideGtlStoneBrNormNormTwnhsE1StoryGableCompShgHdBoardHdBoardNoneGdTAPConcGdTANoALQUnfGasAExYSBrkrGdTypNAAttchdRFnTATAYWD
9RLPaveIR1LvlAllPubInsideGtlStoneBrNormNormTwnhsE1StoryGableCompShgCemntBdCmentBdNoneGdTAPConcGdTANoGLQUnfGasAExYSBrkrGdTypTAAttchdRFnTATAYWD
.....................................................................................................................
2924RLPaveRegLowAllPubInsideModMitchelNormNorm1Fam1StoryGableCompShgBrkFaceBrkFaceNoneTATACBlockTATANoALQUnfGasAFaYSBrkrTATypGdAttchdRFnTATAPWD
2926RLPaveIR1LvlAllPubCulDSacGtlMitchelNormNorm1FamSLvlGableCompShgHdBoardHdBoardNoneTATACBlockTATAAvGLQUnfGasATAYSBrkrTATypNADetchdUnfTATAYWD
2927RLPaveIR1LowAllPubInsideModMitchelNormNorm1Fam1StoryGableCompShgHdBoardHdBoardNoneTATACBlockGdTAAvBLQALQGasATAYSBrkrTATypNAAttchdUnfTATAYWD
2929RLPaveRegLvlAllPubInsideModMitchelNormNorm1Fam1StoryGableCompShgHdBoardHdBoardNoneTATACBlockGdTAAvALQLwQGasAGdYSBrkrTATypTAAttchdRFnTATAYWD
2930RLPaveRegLvlAllPubInsideModMitchelNormNorm1Fam2StoryGableCompShgHdBoardHdBoardBrkFaceTATAPConcGdTAAvLwQUnfGasAExYSBrkrTATypTAAttchdFinTATAYWD

1969 rows × 38 columns

#categorical_df = pd.get_dummies(categorical_df, drop_first=True)
# # Defining the ordinal mapping and columns list 
ordinal_mapping = {'NA':0, 'Po' : 1, 'Fa' : 2, 'TA' : 3, 'Gd' : 4, 'Ex' : 5}
 
for col in categorical_df:
    categorical_df[col] = categorical_df[col].replace(ordinal_mapping)
 
# # categorical_df
categorical_df = pd.get_dummies(categorical_df, drop_first=True)
categorical_df
Exter QualExter CondBsmt QualBsmt CondHeating QCKitchen QualFireplace QuGarage QualGarage CondMS Zoning_FVMS Zoning_RHMS Zoning_RLMS Zoning_RMStreet_PaveLot Shape_IR2Lot Shape_IR3Lot Shape_RegLand Contour_HLSLand Contour_LowLand Contour_LvlUtilities_NoSeWaLot Config_CulDSacLot Config_FR2Lot Config_FR3Lot Config_InsideLand Slope_ModLand Slope_SevNeighborhood_BluesteNeighborhood_BrDaleNeighborhood_BrkSideNeighborhood_ClearCrNeighborhood_CollgCrNeighborhood_CrawforNeighborhood_EdwardsNeighborhood_GilbertNeighborhood_GreensNeighborhood_GrnHillNeighborhood_IDOTRRNeighborhood_LandmrkNeighborhood_MeadowVNeighborhood_MitchelNeighborhood_NAmesNeighborhood_NPkVillNeighborhood_NWAmesNeighborhood_NoRidgeNeighborhood_NridgHtNeighborhood_OldTownNeighborhood_SWISUNeighborhood_SawyerNeighborhood_SawyerWNeighborhood_SomerstNeighborhood_StoneBrNeighborhood_TimberNeighborhood_VeenkerCondition 1_FeedrCondition 1_NormCondition 1_PosACondition 1_PosNCondition 1_RRAeCondition 1_RRAnCondition 1_RRNeCondition 1_RRNnCondition 2_FeedrCondition 2_NormCondition 2_PosACondition 2_PosNCondition 2_RRAeCondition 2_RRAnCondition 2_RRNnBldg Type_2fmConBldg Type_DuplexBldg Type_TwnhsBldg Type_TwnhsEHouse Style_1.5UnfHouse Style_1StoryHouse Style_2.5FinHouse Style_2.5UnfHouse Style_2StoryHouse Style_SFoyerHouse Style_SLvlRoof Style_GableRoof Style_GambrelRoof Style_HipRoof Style_MansardRoof Style_ShedRoof Matl_CompShgRoof Matl_MembranRoof Matl_Tar&GrvRoof Matl_WdShakeRoof Matl_WdShnglExterior 1st_AsphShnExterior 1st_BrkCommExterior 1st_BrkFaceExterior 1st_CBlockExterior 1st_CemntBdExterior 1st_HdBoardExterior 1st_ImStuccExterior 1st_MetalSdExterior 1st_PlywoodExterior 1st_StoneExterior 1st_StuccoExterior 1st_VinylSdExterior 1st_Wd SdngExterior 1st_WdShingExterior 2nd_AsphShnExterior 2nd_Brk CmnExterior 2nd_BrkFaceExterior 2nd_CBlockExterior 2nd_CmentBdExterior 2nd_HdBoardExterior 2nd_ImStuccExterior 2nd_MetalSdExterior 2nd_PlywoodExterior 2nd_StoneExterior 2nd_StuccoExterior 2nd_VinylSdExterior 2nd_Wd SdngExterior 2nd_Wd ShngMas Vnr Type_BrkFaceMas Vnr Type_NoneMas Vnr Type_StoneFoundation_CBlockFoundation_PConcFoundation_StoneFoundation_WoodBsmt Exposure_AvBsmt Exposure_MnBsmt Exposure_NoBsmtFin Type 1_BLQBsmtFin Type 1_GLQBsmtFin Type 1_LwQBsmtFin Type 1_RecBsmtFin Type 1_UnfBsmtFin Type 2_BLQBsmtFin Type 2_GLQBsmtFin Type 2_LwQBsmtFin Type 2_RecBsmtFin Type 2_UnfHeating_GasWHeating_GravHeating_OthWCentral Air_YElectrical_FuseFElectrical_FusePElectrical_MixElectrical_SBrkrFunctional_Maj2Functional_Min1Functional_Min2Functional_ModFunctional_SalFunctional_SevFunctional_TypGarage Type_2TypesGarage Type_AttchdGarage Type_BasmentGarage Type_BuiltInGarage Type_CarPortGarage Type_DetchdGarage Finish_FinGarage Finish_RFnGarage Finish_UnfPaved Drive_PPaved Drive_YSale Type_CWDSale Type_ConSale Type_ConLDSale Type_ConLISale Type_ConLwSale Type_NewSale Type_OthSale Type_WD
Id
13334234330010100000100000000000000000000010000000000000100000001000000000010000000100100000010000000000000000000100000001100000010000000010001000100000010100001001000000001
33333340330010100000100000000000000000000010000000000000100000001000000000010000000100100000000000000001000000000000010100100000100000000010001000100000010100000010100000001
53343433330010100000100001000000000100000000000000000000100000001000000000000010010000100000000000000010000000000000100010010000101000000010001000100000010100001000100000001
84343540330010100010000001000000000000000000000000001000100000001000000001010000010000100000000010000000000000100000000010010000100000000010001000100000010100000100100000001
94343543330010100000100001000000000000000000000000001000100000001000000001010000010000100000000100000000000001000000000010010000101000000010001000100000010100000100100000001
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
29243333234330010100101000001100000000000000100000000000000100000001000000000010000010000100000010000000000000100000000000010100000100000000010001000100000010100000101000000001
29263333330330010100000101000000000000000000100000000000000100000001000000000000000110000100000000010000000000000100000000010100010001000000010001000100000010000010010100000001
29273343330330010100001000001100000000000000100000000000000100000001000000000010000010000100000000010000000000000100000000010100010010000000000001000100000010100000010100000001
29293343433330010100100100001100000000000000100000000000000100000001000000000010000010000100000000010000000000000100000000010100010000000001000001000100000010100000100100000001
29303343533330010100100100001100000000000000100000000000000100000001000000000000010010000100000000010000000000000100000000100010010000100000010001000100000010100001000100000001

1969 rows × 172 columns

numeric_and_dummied = pd.concat([X,categorical_df], axis=1)
numeric_and_dummied
PIDMS SubClassLot FrontageLot AreaOverall QualOverall CondYear BuiltYear Remod/AddMas Vnr AreaBsmtFin SF 1BsmtFin SF 2Bsmt Unf SFTotal Bsmt SF1st Flr SF2nd Flr SFLow Qual Fin SFGr Liv AreaBsmt Full BathBsmt Half BathFull BathHalf BathBedroom AbvGrKitchen AbvGrTotRms AbvGrdFireplacesGarage Yr BltGarage CarsGarage AreaWood Deck SFOpen Porch SFEnclosed Porch3Ssn PorchScreen PorchPool AreaMisc ValMo SoldYr SoldExter QualExter CondBsmt QualBsmt CondHeating QCKitchen QualFireplace QuGarage QualGarage CondMS Zoning_FVMS Zoning_RHMS Zoning_RLMS Zoning_RMStreet_PaveLot Shape_IR2Lot Shape_IR3Lot Shape_RegLand Contour_HLSLand Contour_LowLand Contour_LvlUtilities_NoSeWaLot Config_CulDSacLot Config_FR2Lot Config_FR3Lot Config_InsideLand Slope_ModLand Slope_SevNeighborhood_BluesteNeighborhood_BrDaleNeighborhood_BrkSideNeighborhood_ClearCrNeighborhood_CollgCrNeighborhood_CrawforNeighborhood_EdwardsNeighborhood_GilbertNeighborhood_GreensNeighborhood_GrnHillNeighborhood_IDOTRRNeighborhood_LandmrkNeighborhood_MeadowVNeighborhood_MitchelNeighborhood_NAmesNeighborhood_NPkVillNeighborhood_NWAmesNeighborhood_NoRidgeNeighborhood_NridgHtNeighborhood_OldTownNeighborhood_SWISUNeighborhood_SawyerNeighborhood_SawyerWNeighborhood_SomerstNeighborhood_StoneBrNeighborhood_TimberNeighborhood_VeenkerCondition 1_FeedrCondition 1_NormCondition 1_PosACondition 1_PosNCondition 1_RRAeCondition 1_RRAnCondition 1_RRNeCondition 1_RRNnCondition 2_FeedrCondition 2_NormCondition 2_PosACondition 2_PosNCondition 2_RRAeCondition 2_RRAnCondition 2_RRNnBldg Type_2fmConBldg Type_DuplexBldg Type_TwnhsBldg Type_TwnhsEHouse Style_1.5UnfHouse Style_1StoryHouse Style_2.5FinHouse Style_2.5UnfHouse Style_2StoryHouse Style_SFoyerHouse Style_SLvlRoof Style_GableRoof Style_GambrelRoof Style_HipRoof Style_MansardRoof Style_ShedRoof Matl_CompShgRoof Matl_MembranRoof Matl_Tar&GrvRoof Matl_WdShakeRoof Matl_WdShnglExterior 1st_AsphShnExterior 1st_BrkCommExterior 1st_BrkFaceExterior 1st_CBlockExterior 1st_CemntBdExterior 1st_HdBoardExterior 1st_ImStuccExterior 1st_MetalSdExterior 1st_PlywoodExterior 1st_StoneExterior 1st_StuccoExterior 1st_VinylSdExterior 1st_Wd SdngExterior 1st_WdShingExterior 2nd_AsphShnExterior 2nd_Brk CmnExterior 2nd_BrkFaceExterior 2nd_CBlockExterior 2nd_CmentBdExterior 2nd_HdBoardExterior 2nd_ImStuccExterior 2nd_MetalSdExterior 2nd_PlywoodExterior 2nd_StoneExterior 2nd_StuccoExterior 2nd_VinylSdExterior 2nd_Wd SdngExterior 2nd_Wd ShngMas Vnr Type_BrkFaceMas Vnr Type_NoneMas Vnr Type_StoneFoundation_CBlockFoundation_PConcFoundation_StoneFoundation_WoodBsmt Exposure_AvBsmt Exposure_MnBsmt Exposure_NoBsmtFin Type 1_BLQBsmtFin Type 1_GLQBsmtFin Type 1_LwQBsmtFin Type 1_RecBsmtFin Type 1_UnfBsmtFin Type 2_BLQBsmtFin Type 2_GLQBsmtFin Type 2_LwQBsmtFin Type 2_RecBsmtFin Type 2_UnfHeating_GasWHeating_GravHeating_OthWCentral Air_YElectrical_FuseFElectrical_FusePElectrical_MixElectrical_SBrkrFunctional_Maj2Functional_Min1Functional_Min2Functional_ModFunctional_SalFunctional_SevFunctional_TypGarage Type_2TypesGarage Type_AttchdGarage Type_BasmentGarage Type_BuiltInGarage Type_CarPortGarage Type_DetchdGarage Finish_FinGarage Finish_RFnGarage Finish_UnfPaved Drive_PPaved Drive_YSale Type_CWDSale Type_ConSale Type_ConLDSale Type_ConLISale Type_ConLwSale Type_NewSale Type_OthSale Type_WD
Id
152630110020141.0000317706519601960112.0000639.00000.0000441.00001080.000016560016561.00000.00001031721960.00002.0000528.00002106200000520103334234330010100000100000000000000000000010000000000000100000001000000000010000000100100000010000000000000000000100000001100000010000000010001000100000010100001001000000001
35263510102081.0000142676619581958108.0000923.00000.0000406.00001329.000013290013290.00000.00001131601958.00001.0000312.000039336000012500620103333340330010100000100000000000000000000010000000000000100000001000000000010000000100100000000000000001000000000000010100100000100000000010001000100000010100000010100000001
55271050106074.00001383055199719980.0000791.00000.0000137.0000928.0000928701016290.00000.00002131611997.00002.0000482.00002123400000320103343433330010100000100001000000000100000000000000000000100000001000000000000010010000100000000000000010000000000000100010010000101000000010001000100000010100001000100000001
852714508012043.0000500585199219920.0000263.00000.00001017.00001280.000012800012800.00000.00002021501992.00002.0000506.00000820014400120104343540330010100010000001000000000000000000000000001000100000001000000001010000010000100000000010000000000000100000000010010000100000000010001000100000010100000100100000001
952714603012039.0000538985199519960.00001180.00000.0000415.00001595.000016160016161.00000.00002021511995.00002.0000608.000023715200000320104343543330010100000100001000000000000000000000000001000100000001000000001010000010000100000000100000000000001000000000010010000101000000010001000100000010100000100100000001
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
29249232500602080.00001740055197719770.0000936.00000.0000190.00001126.000011260011261.00000.00002031511977.00002.0000484.00002954100000520063333234330010100101000001100000000000000100000000000000100000001000000000010000010000100000010000000000000100000000000010100000100000000010001000100000010100000101000000001
29269232750808037.0000793766198419840.0000819.00000.0000184.00001003.000010030010031.00000.00001031601984.00002.0000588.0000120000000320063333330330010100000101000000000000000000100000000000000100000001000000000000000110000100000000010000000000000100000000010100010001000000010001000100000010000010010100000001
29279232761002069.0552888555198319830.0000301.0000324.0000239.0000864.0000902009021.00000.00001021501983.00002.0000484.0000164000000620063343330330010100001000001100000000000000100000000000000100000001000000000010000010000100000000010000000000000100000000010100010010000000000001000100000010100000010100000001
29299241000702077.00001001055197419750.00001071.0000123.0000195.00001389.000013890013891.00000.00001021611975.00002.0000418.00002403800000420063343433330010100100100001100000000000000100000000000000100000001000000000010000010000100000000010000000000000100000000010100010000000001000001000100000010100000100100000001
29309241510506074.00009627751993199494.0000758.00000.0000238.0000996.00009961004020000.00000.00002131911993.00003.0000650.000019048000001120063343533330010100100100001100000000000000100000000000000100000001000000000000010010000100000000010000000000000100000000100010010000100000010001000100000010100001000100000001

1969 rows × 209 columns

 
X = numeric_and_dummied
y = df2['SalePrice']
print(f"X shape {X.shape}")
print(f"y shape {y.shape}")
 
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Instantiate
model = LinearRegression()
model.fit(X_train, y_train)
 
y_pred = model.predict(X_test)
 
# metrics
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
 
f"rmse = {rmse}", f"r2 = {r2}"
X shape (1969, 209)
y shape (1969,)

('rmse = 34243.128952623614', 'r2 = 0.8295922881232749')

With Cross-Validation

kf = KFold(n_splits=5, shuffle=True, random_state=42)
model = LinearRegression()
mse_scores = []
r2_scores = []
for train_index, test_index in kf.split(X):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse_scores.append(mean_squared_error(y_test, y_pred, squared=False))
    r2_scores.append(r2_score(y_test, y_pred))
 
f"mean rmse = {np.mean(mse_scores)}", f" mean r2 = {np.mean(r2_scores)}"
('mean rmse = 34290.61380036194', ' mean r2 = 0.8002326002217905')

With Scaling and Regularization

X = df2.select_dtypes(include=['int64', 'float64']).drop('SalePrice', axis=1)
y = df2['SalePrice']
 
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Scaling the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
 
# Instantiate
model = LassoCV(cv=5)  # 5-fold cross-validation
model.fit(X_train_scaled, y_train)
 
y_pred = model.predict(X_test_scaled)
 
# metrics
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
 
f"rmse = {rmse}", f"r2 = {r2}"
('rmse = 37934.24414094182', 'r2 = 0.7908753474821981')

We tried features with highest correlation to Sales Price. We also tried using all numeric features and got decent R2. We also tried numeric + dummied features which gave us highest R2, however adding more variables does not necessiraly mean we have a better model.

Question: What subset of features does the best model use?

Our search space is exponential ${n \choose 335}$, we need to find a subset of features $bestfeatures$ such that $|bestfeatures|=n$ and $RMSE_{min} = RMSE(bestfeatures)$.

To answer the question we will take a greedy hueristic approach.

First experimenting with just one feature (numeric only)

 
 
# numerical_df = df.select_dtypes(include=[np.number]) # Drop non-numerical columns
# numerical_df = numerical_df.fillna(numerical_df.mean()) # Handle missing values
 
# X = numerical_df.drop('SalePrice', axis=1) # Extracting X 
# y = df['SalePrice']# Extracting y
 
X = df2.select_dtypes(include=['int64', 'float64']).drop('SalePrice', axis=1)
y = df2['SalePrice']
# Running linear regression for each feature individually and storing the RMSE
 
feature_rmse = {}
 
for feature in X.columns:
    X_temp = X[[feature]]
    
    # Splitting data
    X_train_temp, X_test_temp, y_train_temp, y_test_temp = train_test_split(X_temp, y, test_size=0.3, random_state=42)
    
    # Training the model
    model_temp = LinearRegression()
    model_temp.fit(X_train_temp, y_train_temp)
    
    # Predicting and computing RMSE
    y_pred_temp = model_temp.predict(X_test_temp)
    mse_temp = mean_squared_error(y_test_temp, y_pred_temp)
    rmse_temp = np.sqrt(mse_temp)
    
    feature_rmse[feature] = rmse_temp
 
# Sorting features based on RMSE
sorted_features = sorted(feature_rmse, key=feature_rmse.get)
 
#sorted_features, [feature_rmse[feature] for feature in sorted_features]
pd.DataFrame({"features" : sorted_features, "RMSE": [feature_rmse[feature] for feature in sorted_features]})
featuresRMSE
0Overall Qual50822.1991
1Gr Liv Area60335.1803
2Garage Area63418.2640
3Garage Cars64173.5222
4Total Bsmt SF64422.0079
51st Flr SF66338.5624
6Mas Vnr Area68124.1100
7Year Remod/Add68920.4481
8Year Built69428.7070
9Full Bath69554.1249
10Garage Yr Blt71998.3722
11TotRms AbvGrd72149.7047
12Fireplaces73040.4772
13BsmtFin SF 177673.4398
14Lot Frontage77903.1847
15Open Porch SF78826.7180
16Half Bath79723.0870
17Wood Deck SF80073.6566
18PID80240.2988
192nd Flr SF80411.1312
20Bsmt Full Bath80822.0652
21Bsmt Unf SF81203.1821
22Lot Area81525.4903
23Screen Porch81748.9756
24Enclosed Porch82379.8870
25Bedroom AbvGr82481.7493
26Kitchen AbvGr82502.0684
27MS SubClass82527.6362
28Overall Cond82783.4673
293Ssn Porch82836.5438
30Low Qual Fin SF82840.5059
31Bsmt Half Bath82862.3486
32Misc Val82955.9613
33BsmtFin SF 282978.7429
34Pool Area82984.8451
35Yr Sold83028.2278
36Mo Sold83379.5520

Using a greedy hueristic approch

Iteratively adding the next best feature and checking if the RMSE improves

This algorithm is also known as forward selection algorithm. starting with an empty model, it adds features one at a time, at each step adding the feature that gives the greatest additional improvement to the fit.

Alt text Alt text

With numeric featuers only

best_rmse = float('inf') # built-in support for infinity, both positive and negative. float('inf') indicates positive infinity, whereas float('-inf') indicates negative infinity. form python documentation.
best_r2 = float('inf') # built-in support for infinity, both positive and
best_features = []  # making an empty list of features
current_features = [] # making an empty list of current features
 
for feature in sorted_features: # looping through sorted_features
    current_features.append(feature)
    
    # Splitting data
    X_train_temp, X_test_temp, y_train_temp, y_test_temp = train_test_split(X[current_features], y, test_size=0.3, random_state=42)
    
    # Training the model
    model_temp = LinearRegression()
    model_temp.fit(X_train_temp, y_train_temp)
    
    # Predicting and computing RMSE
    y_pred_temp = model_temp.predict(X_test_temp)
    rmse_temp = mean_squared_error(y_test_temp, y_pred_temp, squared=False)
    r2_temp = r2_score(y_test_temp, y_pred_temp)
 
    
    # Checking if RMSE improved
    if rmse_temp < best_rmse:
        best_rmse = rmse_temp
        best_r2 = r2_temp
        best_features = current_features.copy()
print(f"Best RMSE with forward selection and numeric features only = {best_rmse}")
print(f"Number of features out of 37 selected by forward selection = {len(best_features)}")
f"rmse = {best_rmse}", f"r2 = {best_r2}"
Best RMSE with forward selection and numeric features only = 37753.02406324579
Number of features out of 37 selected by forward selection = 33

('rmse = 37753.02406324579', 'r2 = 0.7928686421507606')

With numeric + dummied

best_rmse = float('inf') # built-in support for infinity, both positive and negative. float('inf') indicates positive infinity, whereas float('-inf') indicates negative infinity. form python documentation.
best_r2 = float('inf') # built-in support for infinity, both positive and
best_features = []  # making an empty list of features
current_features = [] # making an empty list of current features
X = numeric_and_dummied
y = df2['SalePrice']
 
for feature in X.columns.values: # looping through all numeric and dummied features
    current_features.append(feature)
    
    # Splitting data
    X_train_temp, X_test_temp, y_train_temp, y_test_temp = train_test_split(X[current_features], y, test_size=0.3, random_state=42)
    
    # Training the model
    model_temp = LinearRegression()
    model_temp.fit(X_train_temp, y_train_temp)
    
    # Predicting and computing RMSE
    y_pred_temp = model_temp.predict(X_test_temp)
    rmse_temp = mean_squared_error(y_test_temp, y_pred_temp, squared=False)
    r2_temp = r2_score(y_test_temp, y_pred_temp)
 
    
    # Checking if RMSE improved
    if rmse_temp < best_rmse:
        best_rmse = rmse_temp
        best_r2 = r2_temp
        best_features = current_features.copy()
print(f"Best RMSE with forward selection with numeric and dummied features = {best_rmse}")
print(f"Number of features out of 209 selected by forward selection = {len(best_features)}")
f"rmse = {best_rmse}", f"r2 = {best_r2}"
Best RMSE with forward selection with numeric and dummied features = 33840.37573779651
Number of features out of 209 selected by forward selection = 120

('rmse = 33840.37573779651', 'r2 = 0.8335772418404599')

We also have a backward selection algorithm:

Alt text

With numeric features only

X = df2.select_dtypes(include=['int64', 'float64']).drop('SalePrice', axis=1)
y = df2['SalePrice']
best_rmse = float('inf')
best_r2 = float('inf')
best_features = sorted_features.copy()  # Start with all features
current_features = sorted_features.copy()
 
while current_features:
    worst_rmse = -float('inf')
    worst_r2 = -float('inf')
    worst_feature = None
 
    for feature in current_features:
        # Try removing the feature and evaluate performance
        features_without_current = [f for f in current_features if f != feature]
        
        if not features_without_current:  # Skip if no features remain
            continue
        
        # Splitting data
        X_train_temp, X_test_temp, y_train_temp, y_test_temp = train_test_split(X[features_without_current], y, test_size=0.3, random_state=42)
        
        # Training the model
        model_temp = LinearRegression()
        model_temp.fit(X_train_temp, y_train_temp)
        
        # Predicting and computing RMSE
        y_pred_temp = model_temp.predict(X_test_temp)
        rmse_temp = mean_squared_error(y_test_temp, y_pred_temp, squared=False)
        r2_temp = r2_score(y_test_temp, y_pred_temp)
        
        # Checking if this feature's removal worsens the RMSE the least
        if rmse_temp > worst_rmse:
            worst_rmse = rmse_temp
            worst_r2  = r2_temp
            worst_feature = feature
    if not worst_feature:
                break       
    # Remove the feature that worsens the RMSE the least
    current_features.remove(worst_feature)
    
    # If the new RMSE is better than the best known, update best RMSE and best features
    if len(current_features) == len(sorted_features) or worst_rmse <= best_rmse:
        best_rmse = worst_rmse
        best_r2 = worst_r2
        best_features = current_features.copy()
 
print(f"Best RMSE with backward selection = {best_rmse}")
print(f"Number of features out of 209 selected by backward selection = {len(best_features)}")
f"rmse = {best_rmse}", f"r2 = {best_r2}"
Best RMSE with backward selection = 41751.280747375204
Number of features out of 209 selected by backward selection = 36

('rmse = 41751.280747375204', 'r2 = 0.7466727159535904')

With numeric + dummied

best_rmse = float('inf') # built-in support for infinity, both positive and negative. float('inf') indicates positive infinity, whereas float('-inf') indicates negative infinity. form python documentation.
best_r2 = float('inf') # built-in support for infinity, both positive and
best_features = []  # making an empty list of features
current_features = [] # making an empty list of current features
X = numeric_and_dummied
y = df2['SalePrice']
best_rmse = float('inf')
best_r2 = float('inf')
best_features = X.columns.values.tolist()  # Start with all features
current_features = X.columns.values.tolist() 
total_features = len(X.columns.values.tolist())
 
while current_features:
    worst_rmse = -float('inf')
    worst_r2 = -float('inf')
    worst_feature = None
 
    for feature in current_features:
        # Try removing the feature and evaluate performance
        features_without_current = [f for f in current_features if f != feature]
        
        if not features_without_current:  # Skip if no features remain
            continue
        
        # Splitting data
        X_train_temp, X_test_temp, y_train_temp, y_test_temp = train_test_split(X[features_without_current], y, test_size=0.3, random_state=42)
        
        # Training the model
        model_temp = LinearRegression()
        model_temp.fit(X_train_temp, y_train_temp)
        
        # Predicting and computing RMSE
        y_pred_temp = model_temp.predict(X_test_temp)
        rmse_temp = mean_squared_error(y_test_temp, y_pred_temp, squared=False)
        r2_temp = r2_score(y_test_temp, y_pred_temp)
        
        # Checking if this feature's removal worsens the RMSE the least
        if rmse_temp > worst_rmse:
            worst_rmse = rmse_temp
            worst_r2  = r2_temp
            worst_feature = feature
    if not worst_feature:
                break       
    # Remove the feature that worsens the RMSE the least
    current_features.remove(worst_feature)
    
    # If the new RMSE is better than the best known, update best RMSE and best features
    if len(current_features) == total_features or worst_rmse <= best_rmse:
        best_rmse = worst_rmse
        best_r2 = worst_r2
        best_features = current_features.copy()
 
print(f"Best RMSE with backward selection With numeric + dummied = {best_rmse}")
print(f"Number of features out of 209 selected by backward selection = {len(best_features)}")
f"rmse = {best_rmse}", f"r2 = {best_r2}"
Best RMSE with backward selection With numeric + dummied = 35439.69000658719
Number of features out of 209 selected by backward selection = 208

('rmse = 35439.69000658719', 'r2 = 0.8174750696856952')

Summary so far:

Model DescriptionRMSER2Initial # FeaturesFinal # Features (if applicable)
Model of highest correlated features40412.861003694270.762654196854240110-
Model of highest correlated features with Cross validationmean rmse = 36949.87866445592mean r2 = 0.77984865536610510-
Model of highest correlated features with Lasso44359.004084735740.714039573117301310-
Model of all numeric features38337.591654785590.786404540233197837-
Model of all numeric features with Cross Validationmean rmse = 35642.85184537606mean r2 = 0.79173844627818937-
Model of all numeric features with Lasso37934.244140941820.790875347482198137-
Model of all features with dummied preprocessing34243.1289526236140.8295922881232749209-
Model of all features with dummied preprocessing and CVmean rmse = 34290.61380036194mean r2 = 0.8002326002217905209-
Model of all features with dummied preprocessing and Lasso37934.244140941820.7908753474821981209-
Forward selection algorithm with numeric features only37753.024063245790.79286864215076063733
Forward selection algorithm with numeric and dummied features33840.375737796510.8335772418404599209120
Backward selection algorithm with numeric features only41751.2807473752040.746672715953590420936
Backward selection algorithm with numeric and dummied features35439.690006587190.8174750696856952209208

Trying out Polynomial features for regression

experimenting with polynomial features:

len(df2.select_dtypes(include=['int64', 'float64']).drop('SalePrice', axis=1).columns.values)
37

All numeric features into poly features and grid searching for best alpha

 
X = df2.select_dtypes(include=['int64', 'float64']).drop('SalePrice', axis=1)
y = df2['SalePrice']
 
X_train_best, X_test_best, y_train_best, y_test_best = train_test_split(X, y, test_size=0.3, random_state=42)
 
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_best[sorted_features])
X_test_poly = poly.transform(X_test_best[sorted_features])
scaler_poly = StandardScaler()
X_train_poly_scaled = scaler_poly.fit_transform(X_train_poly)
X_test_poly_scaled = scaler_poly.transform(X_test_poly)
 
alphas = np.logspace(-4, 4, 10) # alpha in log space
 
# Spliting as random with 5 parts
kf = KFold(n_splits=5, shuffle=True, random_state=42)
avg_rmse_list = []
 
# CV
for alpha in alphas:
    rmse_list = []
    for train_index, val_index in kf.split(X_train_poly_scaled):
        X_train_cv, X_val_cv = X_train_poly_scaled[train_index], X_train_poly_scaled[val_index]
        y_train_cv, y_val_cv = y_train_best.iloc[train_index], y_train_best.iloc[val_index]
        
        model = Lasso(alpha=alpha, max_iter=500)
        model.fit(X_train_cv, y_train_cv)
        y_pred_val = model.predict(X_val_cv)
        
        rmse = mean_squared_error(y_val_cv, y_pred_val, squared=False)
        rmse_list.append(rmse)
        
    avg_rmse_list.append(np.mean(rmse_list))
 
# best alpha value based on RMSE
best_alpha = alphas[np.argmin(avg_rmse_list)]
 
# model using the best alpha value with itera just 500 # Changed to 100 because of better RMSE then 500
final_model = Lasso(alpha=best_alpha, max_iter=100)
final_model.fit(X_train_poly_scaled, y_train_best)
 
# Pred. on the validation set
y_pred_final = final_model.predict(X_test_poly_scaled)
final_rmse = mean_squared_error(y_test_best, y_pred_final, squared=False)
final_r2 = r2_score(y_test_best, y_pred_final)
 
print(f"best_alpha = {best_alpha}\nRMSE with best alpha {best_alpha} and Lasso = {final_rmse}")
f"rmse = {final_rmse}", f"r2 = {final_r2}"
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.309e+11, tolerance: 6.762e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.173e+11, tolerance: 6.474e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.265e+11, tolerance: 6.473e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.158e+11, tolerance: 6.760e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.273e+11, tolerance: 6.299e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.309e+11, tolerance: 6.762e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.173e+11, tolerance: 6.474e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.265e+11, tolerance: 6.473e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.158e+11, tolerance: 6.760e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.273e+11, tolerance: 6.299e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.309e+11, tolerance: 6.762e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.173e+11, tolerance: 6.474e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.265e+11, tolerance: 6.473e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.158e+11, tolerance: 6.760e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.273e+11, tolerance: 6.299e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.308e+11, tolerance: 6.762e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.173e+11, tolerance: 6.474e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.264e+11, tolerance: 6.473e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.157e+11, tolerance: 6.760e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.272e+11, tolerance: 6.299e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.303e+11, tolerance: 6.762e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.171e+11, tolerance: 6.474e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.259e+11, tolerance: 6.473e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.149e+11, tolerance: 6.760e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.267e+11, tolerance: 6.299e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.263e+11, tolerance: 6.762e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.148e+11, tolerance: 6.474e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.216e+11, tolerance: 6.473e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.094e+11, tolerance: 6.760e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.231e+11, tolerance: 6.299e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.531e+10, tolerance: 6.762e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 6.575e+10, tolerance: 6.474e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 7.622e+10, tolerance: 6.473e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 6.744e+10, tolerance: 6.760e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 8.558e+10, tolerance: 6.299e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.812e+10, tolerance: 6.762e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.552e+10, tolerance: 6.474e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.137e+10, tolerance: 6.473e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.512e+10, tolerance: 6.760e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.739e+10, tolerance: 6.299e+08
  model = cd_fast.enet_coordinate_descent(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.548e+10, tolerance: 8.196e+08
  model = cd_fast.enet_coordinate_descent(

best_alpha = 1291.5496650148827
RMSE with best alpha 1291.5496650148827 and Lasso = 38505.77372194258

('rmse = 38505.77372194258', 'r2 = 0.7845263982307846')

Trying polynomial features with Lasso CV

X = df2.select_dtypes(include=['int64', 'float64']).drop('SalePrice', axis=1)
y = df2['SalePrice']
 
X_train_best, X_test_best, y_train_best, y_test_best = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Apply polynomial transformations and scale the features
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_best[sorted_features])
X_test_poly = poly.transform(X_test_best[sorted_features])
scaler_poly = StandardScaler()
X_train_poly_scaled = scaler_poly.fit_transform(X_train_poly)
X_test_poly_scaled = scaler_poly.transform(X_test_poly)
 
# Train LassoCV model on the polynomial features  #100 iterations are giving me better RMSE
lasso_poly_cv = LassoCV(cv=5, random_state=42, max_iter=100)
lasso_poly_cv.fit(X_train_poly_scaled, y_train_best)
 
# Predict SalePrice on validation set using the trained LassoCV model on polynomial features
y_pred_lasso_poly_cv = lasso_poly_cv.predict(X_test_poly_scaled)
 
# Compute RMSE for the LassoCV model with polynomial features on the validation set
rmse_lasso_poly_cv = mean_squared_error(y_test_best, y_pred_lasso_poly_cv, squared=False)
f"only numeric RMSE polynomial features with Lasso CV: {rmse_lasso_poly_cv}"
final_r2 = r2_score(y_test_best, y_pred_lasso_poly_cv)
f"rmse = {rmse_lasso_poly_cv}", f"r2 = {final_r2}"
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 655092614.9396973, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 669020781.7202148, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 710176063.6931152, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 760337841.2438354, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 878013162.8587646, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1098363168.9244995, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1536586484.8308105, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1267946306.4621582, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1236573307.3970947, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1033601307.2145386, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 959974000.0894775, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 901618131.0731812, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 878723012.7056885, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 759739023.954834, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 847703975.0005493, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1188533199.017517, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1321028397.18396, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1086296751.0699463, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1289683297.8372803, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1209462009.6765137, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1269925632.743103, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1702513887.9799805, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1946494847.046753, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1996451173.843628, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1873246667.932373, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1765862824.241455, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1874205771.527832, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1795219207.5671997, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2276901733.0200195, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2342666514.837219, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2672454766.6070557, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2906887298.3221436, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3786789395.3465576, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3956707168.33844, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4887829022.859009, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 6415325581.362427, tolerance: 640110907.2100472
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 710490494.4610596, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 931802687.9956055, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 900214386.706543, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 851165373.6281738, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 875569480.4025269, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 980378564.241333, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 747730231.0474854, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 701697828.9777832, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 980553271.4138184, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1250303040.321045, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1316587288.954712, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1765858280.8981323, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2081950878.5326538, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2525127606.7941284, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2557941007.447693, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2457527984.4022217, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1958787803.2093506, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2126266877.369751, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2571886103.375061, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2945502590.069031, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3053576328.2835693, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2628359045.1015625, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2838444010.659424, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2852275871.284546, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2746276515.2092285, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2583521614.786865, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2404014177.8653564, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4503430728.748474, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 6131984713.784546, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 7826577632.119385, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 8929527773.01233, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 10743571386.32019, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 12433583167.359009, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 12267855313.926025, tolerance: 669957695.2476618
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 861678840.765625, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 968993796.1010742, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 983271665.5161133, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 912136914.9145508, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 716713511.6699219, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 798763413.4882812, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 748051381.5751953, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 718786908.8554688, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 700270196.9912109, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 698790503.4768066, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 739807812.352417, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1106942107.9666748, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1101723408.762024, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 788258668.0662231, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 842562747.3908081, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1229305034.6760254, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1288926380.9262695, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1155549920.3555908, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1611335841.7492065, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1357983947.2506104, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2023441760.6868896, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2438209689.17749, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2973415199.0687256, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4112655078.585266, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 5047036833.321899, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4563507309.416443, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2897100104.317505, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3858770029.3067627, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3992887495.4349365, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4434176080.818787, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 5034397042.474243, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 5393645956.638306, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4867187288.870239, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4268480195.2766724, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3946406071.1400146, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4944872283.399536, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 5501143663.0911255, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3441263878.828247, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3970951916.8688354, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4068534020.84375, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4718907275.976929, tolerance: 669621636.3395969
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 900207802.6901855, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1176650573.7490234, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1055648696.9013672, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 900108386.3588867, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 963175472.0842285, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 740267535.8983154, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 751036338.9658203, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 746275380.2701416, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1159899453.4041748, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1741993784.710205, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 688411233.3903198, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 965961029.9855957, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1253194288.991333, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1361762038.2955322, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1702280571.423462, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1865904693.5332031, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2178261966.2056885, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2438502525.02124, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2246753078.5827637, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2658166015.419739, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3098744867.050415, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3119277896.8673096, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2715145974.438965, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2541991715.9294434, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2474858250.6070557, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3221362345.0756836, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4202091654.2251587, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 6284502602.11084, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 8472189300.622559, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 9516158842.685303, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 9399781717.09961, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 7686590577.031128, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 6557441375.676392, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 5442707877.81604, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 6501064024.134277, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 8373358600.6188965, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 11199906640.334106, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 13517355334.877625, tolerance: 685686815.3466206
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 647465023.9515381, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1051101921.5264893, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1151580831.158081, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 795847954.7052002, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 731537414.6986084, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1112945994.2697144, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 916942379.0632324, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1165248965.5579224, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1506671268.9281006, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1771143731.4749756, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1503785329.553711, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 849407723.0316162, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1009897086.8809814, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1278963909.0230103, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2164322957.3719482, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2644682450.4575195, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2747251270.514282, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2989925107.3340454, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2807933715.243164, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2859884871.377075, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2760131961.605591, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3043536645.7053223, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3351874016.2009277, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3517978455.862671, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 3650555862.8530273, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 4102284109.3651123, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 5022671081.479431, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 6284684987.910828, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 7338175590.067749, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:633: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 7715333898.238159, tolerance: 612450731.869917
  model = cd_fast.enet_coordinate_descent_gram(
c:\Users\muham\anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:647: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 5.334e+09, tolerance: 8.196e+08
  model = cd_fast.enet_coordinate_descent(

('rmse = 38421.485445693455', 'r2 = 0.7854686995389903')

same as above with numeric and dummied:

X = numeric_and_dummied
y = df2['SalePrice']
 
X_train_best, X_test_best, y_train_best, y_test_best = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Apply polynomial transformations and scale the features
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_best)
X_test_poly = poly.transform(X_test_best)
scaler_poly = StandardScaler()
X_train_poly_scaled = scaler_poly.fit_transform(X_train_poly)
X_test_poly_scaled = scaler_poly.transform(X_test_poly)
 
# Train LassoCV model on the polynomial features  #100 iterations are giving me better RMSE
lasso_poly_cv = LassoCV(cv=5, random_state=42, max_iter=100)
lasso_poly_cv.fit(X_train_poly_scaled, y_train_best)
 
# Predict SalePrice on validation set using the trained LassoCV model on polynomial features
y_pred_lasso_poly_cv = lasso_poly_cv.predict(X_test_poly_scaled)
 
# Compute RMSE for the LassoCV model with polynomial features on the validation set
rmse_lasso_poly_cv = mean_squared_error(y_test_best, y_pred_lasso_poly_cv, squared=False)
f"Numeric and dummdied RMSE polynomial features with Lasso CV: {rmse_lasso_poly_cv}"
final_r2 = r2_score(y_test_best, y_pred_lasso_poly_cv)
f"rmse = {rmse_lasso_poly_cv}", f"r2 = {final_r2}"

Experiment: All numeric features and NaN filled with mean values

%%time 
# more data = magic
 
numerical_df = df.select_dtypes(include=['int64', 'float64'])
numerical_df_filled = numerical_df.fillna(numerical_df.mean())
X = numerical_df_filled.drop('SalePrice', axis=1)
y = df['SalePrice']
 
X_train_best, X_test_best, y_train_best, y_test_best = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Apply polynomial transformations and scale the features
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_best[sorted_features])
X_test_poly = poly.transform(X_test_best[sorted_features])
scaler_poly = StandardScaler()
X_train_poly_scaled = scaler_poly.fit_transform(X_train_poly)
X_test_poly_scaled = scaler_poly.transform(X_test_poly)
 
# Train LassoCV model on the polynomial features  #100 iterations are giving me better RMSE
lasso_poly_cv = LassoCV(cv=5, random_state=42, max_iter=100)
lasso_poly_cv.fit(X_train_poly_scaled, y_train_best)
 
# Predict SalePrice on validation set using the trained LassoCV model on polynomial features
y_pred_lasso_poly_cv = lasso_poly_cv.predict(X_test_poly_scaled)
 
# Compute RMSE for the LassoCV model with polynomial features on the validation set
rmse_lasso_poly_cv = mean_squared_error(y_test_best, y_pred_lasso_poly_cv, squared=False)
f"Numeric with NaN filled with mean RMSE polynomial features with Lasso CV: {rmse_lasso_poly_cv}"
final_r2 = r2_score(y_test_best, y_pred_lasso_poly_cv)
f"rmse = {rmse_lasso_poly_cv}", f"r2 = {final_r2}"
numerical_df.mean()
Id                     1474.0336
PID               713590006.0917
MS SubClass              57.0088
Lot Frontage             69.0552
Lot Area              10065.2082
Overall Qual              6.1121
Overall Cond              5.5622
Year Built             1971.7089
Year Remod/Add         1984.1902
Mas Vnr Area             99.6959
BsmtFin SF 1            442.3005
BsmtFin SF 2             47.9590
Bsmt Unf SF             567.7283
Total Bsmt SF          1057.9878
1st Flr SF             1164.4881
2nd Flr SF              329.3291
Low Qual Fin SF           5.5129
Gr Liv Area            1499.3301
Bsmt Full Bath            0.4275
Bsmt Half Bath            0.0634
Full Bath                 1.5773
Half Bath                 0.3710
Bedroom AbvGr             2.8435
Kitchen AbvGr             1.0429
TotRms AbvGrd             6.4359
Fireplaces                0.5909
Garage Yr Blt          1978.7078
Garage Cars               1.7766
Garage Area             473.6717
Wood Deck SF             93.8337
Open Porch SF            47.5568
Enclosed Porch           22.5719
3Ssn Porch                2.5914
Screen Porch             16.5115
Pool Area                 2.3979
Misc Val                 51.5744
Mo Sold                   6.2199
Yr Sold                2007.7757
SalePrice            181469.7016
dtype: float64

Summary of experiments with polynomial features

Model Description (features converted to polynomial of degree 2)RMSER2Initial # Features
Model with all numeric features and grid searching for best alpha38505.773721942580.784526398230784637
Model with LassoCV with numeric features only38421.4854456934550.785468699538990337
Model with LassoCV with numeric and dummied36777.603177034960.803433634559926209
Model with all numeric features and NaN filled with mean of data distribution23139.721460855740.910421873723488339
from dmba import stepwise_selection
from dmba import AIC_score, adjusted_r2_score, regressionSummary
 
X = numeric_and_dummied
y = df2['SalePrice']
 
def train_model(variables):
    if len(variables) == 0:
        return None
    #model = LinearRegression()
    model =RidgeCV(cv=5)
    model.fit(X[variables], y)
    return model
 
def score_model(model, variables):
    if len(variables) == 0:
        return AIC_score(y, [y.mean()] * len(y), model, df=1)
    return AIC_score(y, model.predict(X[variables]), model)
 
best_model, best_variables = stepwise_selection(X.columns, train_model, score_model, 
                                                verbose=True)
 
print()
print(f'Intercept: {best_model.intercept_:.3f}')
print('Coefficients:')
for name, coef in zip(best_variables, best_model.coef_):
    print(f' {name}: {coef}')
 
r2 = r2_score(y, best_model.predict(X[best_variables]))
print(f"r2 = {r2}")
X_train_best, X_test_best, y_train_best, y_test_best = train_test_split(X[best_variables], y, test_size=0.3, random_state=42)
r2 = r2_score(y_test_best, best_model.predict(X_test_best))
print(f"r2 = {r2}")
r2 = 0.9093469681096508
X_train_best, X_test_best, y_train_best, y_test_best = train_test_split(X[best_variables], y, test_size=0.3, random_state=42)
r2 = r2_score(y_test_best, best_model.predict(X_test_best))
print(f"r2 = {r2}")
r2 = 0.8547022546339753
numerical_df_test = df_test.select_dtypes(include=[np.number]).fillna(df_test.mean())
 
# Apply polynomial transformations to the test data
X_test_poly = poly.transform(numerical_df_test[sorted_features])
 
# Scale the transformed test features using the same scaler
X_test_poly_scaled = scaler_poly.transform(X_test_poly)
 
# Predict SalePrice using trained model
predicted_sale_price = lasso_poly_cv.predict(X_test_poly_scaled)
 
# Create DF for the predictions
predictions_df_test = pd.DataFrame({
    'Id': df_test['Id'],
    'SalePrice': predicted_sale_price
})
 
# Save to CSV file
output_path_test = "./datasets/pred_test_finish.csv"
predictions_df_test.to_csv(output_path_test, index=False)
 
output_path_test
C:\Users\muham\AppData\Local\Temp\ipykernel_62000\3298112629.py:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
  numerical_df_test = df_test.select_dtypes(include=[np.number]).fillna(df_test.mean())

'./datasets/pred_test_finish.csv'
© Muhammad Hassan.RSS