JasonW.A.Selby,FraserP.Ruffell,MarkGiesbrechtandMichaelW.Godfrey
DavidR.CheritonSchoolofComputerScience
UniversityofWaterloo,Waterloo,Ontario,CanadaN2L3G1
E-mail:{j2selby,fruffell,mwg,migod}@uwaterloo.caAbstract
Astheusefullifeexpectancyofsoftwarecontinuestoincrease,thetaskofmaintainingthesourcecodehasbe-comethedominantphaseofthesoftwarelife-cycle.Inordertoimprovetheabilityofsoftwaretoageandsuccessfullyevolveovertime,itisimportanttoidentifysystemdesignandprogrammingpracticeswhichmayresultinincreaseddifficultyformaintainingthecodebase.
Thisstudyattemptstocorrelatetheuseofglobalvari-ablestothemaintainabilityofseveralwidelydeployed,largescalesoftwareprojectsastheyevolveovertime.Twomeasuresareproposedtoquantifythemaintenanceeffortofaproject.ThefirstmeasurecomparesthenumberofCVSrevisionsforallsourcefilesinareleasetothenumberofrevisionsappliedtofileswheretheusageofglobaldataismostprevalent.Aseconddegreeofchangeischaracterizedbycontrastingtheamountofsourcecodethatwaschangedoveralltothechangesmadetothosesourcefileswhichcon-tainthemajorityofthereferencestoglobaldata.
Weobservedastrongcorrelationbetweenthenumberofrevisionstoglobalvariablesreferencesandlinesofcodetoglobalvariablereferences.Inallcasesthecorrelationbetweenthenumberofrevisionsandglobalvariablerefer-enceswasstronger.Thisprovidesevidencethatastrongre-lationshipexistsbetweentheusageofglobalvariablesandboththenumberandscopeofchangesappliedtofilesbe-tweenproductreleases.
indicatorstothemaintainabilityofaproject[11].However,fewempiricalstudieshaveexaminedthedegreetowhichthesefactorsimpactprojectmaintainability.Evidencelink-ingmanyofthesemeasurestoanapproximationofthemaintenanceeffortforaproducthasfoundthatthecorrela-tionisweakatbestformanyofthesefactors[10,3,7,1,9].Inordertoimprovetheabilityofsoftwaretoageandsuc-cessfullyevolveovertime,itisimportanttoidentifysystemdesignandprogrammingpracticeswhichmayresultinin-creaseddifficultyformaintainingthecodebase.
Thedecompositionofcomplexsoftwaresystemsintoin-dividualmodulesthatgrouptogetherrelatedconceptsandtasksimprovesprogramcomprehensionandmaintenance.Ideally,modulesaredesignedtoexhibitalowdegreeofcouplingbetweenothermodulesandahighdegreewithinthesamemodule.However,toenableinter-modulecommu-nication,someformofcouplingmustexist.Commoncou-plingisanundesirableformofcouplingintroducedwhenmodulesreferencethesameglobaldata[23](andworse,commoncouplingcanbeclandestineinthesensethatitcanbeintroducedwithoutexplicitchangestoamodule;see[15]).Themanyreasonswhyglobalvariableusageisconsideredharmfulandshouldbeavoidedarewelldocu-mentedin[8,12,16,23].Examplesoftheunanticipatedsideeffectsofglobalvariableusageincludehiddenalias-ing,namespacepollution,andevenhamperingcodereuseacrossprojects.
Weareinterestedinanempiricalevaluationofhowtheuseofglobalvariablesmayaffectsoftwareevolution.Inthispaper,weinvestigatetwohypothesisconcerningtheim-pactofglobalvariablesonmaintainability:
H1Theuseofglobalvariablesleadstocodethatrequires
moremaintenance.H2Theuseofglobalvariablesleadstocodethatisdifficult
tocomprehend.Weinterpretthesehypothesesmoredirectlybyaligningthefiles(modules)ofthesystemsourcecodebasebythenumberofreferencestoglobalvariables.Wethenobserve
1.Introduction
Themaintenancephaseofthesoftwarelifecyclehasbeenidentifiedasbeingthedominantphaseintermsofbothtimeandmoney[22].Logically,onecouldpointtothecodesize,structure,age,complexity,developmentlanguageandthequalityoftheinternaldocumentationasbeingthekey
researchwassupportedinpartbyaNaturalSciencesandEngi-neeringResearchCouncilofCanadaStrategicandDiscoverygrants.
∗This
themaintenanceeffortforallfilesbytrackingthenumberandsizeofCVScommits.Inparticular,wetestthefollow-ingtwohypotheses:
H3Fileswithagreaternumberofglobalvariablerefer-enceschangemoreoftenthanfileswithfewerrefer-ences.H4Fileswithagreaternumberofglobalvariablerefer-encesentaillargerchangedeltasthanfileswithfewerreferences.IfwefindsupportforH3,wetakethisasevidencethatH1istrue,andsimilarlyifwefindsupportforH4,wetakethisasevidencethatH2istrue.Usingourapproach,wean-alyzedbinariesfrommanypopularopen-sourceprojectsin-cludingEmacs,GCC,GDB,Make,Vim,andPostgreSQL.Theremainderofthepaperisorganizedinthefollowingmanner.Section2detailshowourstudywasperformedandtheprojectsthatweexamined.InSection3wereportanddiscusstheresultsthatwereobtained.Section5discussesrelatedresearchthathasexaminedtheusageofglobaldataandtoolswhichhavebeendevelopedtoextractsoftwareartifactsfromCVSrepositories.Finally,weconcludeandpointoutpromisingfuturedirectionsbasedonthisworkinSection6.
2Methodology
Thissectionbeginsbydescribingourapproachtotrack-ingglobalvariableusageandtomeasuringthemaintain-abilityeffortthroughoutthelifetimeofaproject.Wethengiveanoverviewofthesystemsthatwereexaminedinthiscasestudy.
2.1
ExtractingData-GlobalVariableUsage
gv-finder
Ourinitialexaminationoftheevolutionofglobaldatathroughoutthelifetimeofmanyopensourceprojectsre-sultedinthecreationofalinker-liketoolcapableofex-tractingglobalvariableusagedatafromobjectfiles[14].Thistool,namedgv-finder,interceptsrelocatableEx-ecutableandLinkingFormat(ELF)objectfiles(non-stripped)atthelinkingstageofthecompilationprocess,analyzesthefilesandthenpassesthefilesontotheactuallinker.Thisprocessofcollectingglobalvariableinforma-tionfitsseamlesslyintothebuildprocessandenablestheanalysisofevolutionarytrendsoverentireproductlifetimes.Examinationofthesymbolandrelocationtablesintheobjectfilesyieldsthenamesofallglobalvariables,themoduleinwhicheachisdefined,aswellasthenamesofallmoduleswhichreferenceeachglobal.Weclassifyeachreferenceaseithertrue,staticorexternal.Atrueglobal
variablecontainsa“global”entryinthesymboltableandiswhatonetypicallythinksofasaglobalvariable–auserde-finedvariablestoragewhichcanbereferencedinanymod-ule.Usageofatrueglobalvariableisconsideredthemostdangerousduetotheimplicitcouplingbetweenanyandallmoduleswhichreferencethesameglobalsymbol[23].Staticglobalsaremarkedas“local”inthesymbolta-bleandthereforecanbereferencedanywhereinsidethefileinwhichitisdefined(forexample,aCfilescopedvari-able).Theuseofastaticglobaldoesnotintroduceclan-destinecouplinghowever,itdoescarrytheotherpotentialdrawbacksofusingglobalvariables.Anexternalglobalisdenotedbyan“undefined”symboltableentryandisasymbolimportedfromalibrary(forexample,printf()orstdoutfromtheCstandardlibrary).Differentiationofexternalreferencesbetweenfunctioncallsandvariablereferencesisperformedbydisassemblingtheinstructionswhichcontainareferencetoglobaldata.Iftheinstructionisajmporcall,thentheusageisconsideredafunctionotherwise,itisavariable.Allofthedatapresentedinthispaperisrestrictedtoreferencestotrueglobaldata.Furtherdetailsongv-findercanbefoundin[14].
Theintegrationofgv-finderintothelinkagestageenablesustobypassbuildenvironmentissuesand,moreimportantly,tobaseourresultssolelyupontheactualmod-ulesincludedinthefinalresult.Ouranalysisisrestrictedtothespecificglobalvariablereferencesthatarepresentinthefinalexecutableandnotthosepresentintheentiresourcecodebase.Thiseliminatesthepossibilityofcountingequiv-alentglobalvariablereferencesmultipletimesthatarenotpresentintheexecutableduetoreasonssuchasconditionalinclusionofobjectfilesforspecificmachinearchitecturesandoperatingsystems.Thedisadvantageofourlink-timeanalysisisthatgv-finderrequiresasuccessfulcompila-tionofthetargetexecutable.Whenanalyzingolderreleases(forexample,westudiedversionsofEmacsovertenyearsold),thebuildprocessoftenfailedduetodependenciesondeprecatedAPIs(eitherlibraryorOS).Ratherthanomitreleaseswhichfailedtobuild,wedeployedfourdifferentmachineseachrecreatingaspecificandolderbuildenviron-mentneededtosatisfyvariousreleases.Theuseofdifferentsystemsintroducedaminimalamountoferror,sinceallofthemachinesareofthesamearchitecture(x86,Linux),andthereforeareequallyimpactedbyexternalfactorsaffectingthesourcecode(suchasconditionalcompilation).
2.2
MeasuringMaintainabilityEffort
2.2.1
TheConcurrentVersionsSystem
Weproposetwomeasurestoanswerourpostulates,bothofwhichharnessinformationextractedfromtheConcurrentVersionsSystem(CVS),apopularsourcecodemanagementsystemthattracksthevariouschangesmadetofilesanden-
ablesconcurrentdevelopmentbymanydevelopers[2].Forexample,mininginformationfromaCVSrepositorycanyieldthenumberofrevisionsmadetoeachfilebetweeneachproductrelease.ThisthenenablesthecomparisonofthenumberofCVSrevisionsforfilesinwhichtheusageofglobaldataismostprevalenttothosewhichhavefewerornoreferences.CVSisalsoabletoreportthenumberoflineschangedbetweentworevisionsofafile.Inanattempttocharacterizethescopeofthechangesperformedonafile,weextractthisinformationfromtherepositoryandcomparethetotallineschangedinfileswhichhavealargenumberofreferencestoglobalvariablestootherfilesinthesystem.CVSuniquelyidentifieseachversionofafilethroughtheuseofarevisionnumber.Theinitialversionofafileisassignedtherevisionnumber1.1afterwhich,eachtimeanupdatetothefileischeckedintotherepository,anewnumberisassignedtothefile(forexample,1.2).CVSre-visionnumbersareinternaltothesystemandhavenore-lationshipwithsoftwarereleases.Instead,symbolicnamesortagsareappliedtothesetoffileswhichconstituteapar-ticularreleaseofasystem.Typically,allofthefilesinarepositoryareassignedanewtagateveryreleasepoint,creatingaCVSsnapshotofthecodewhichcanbelaterreferenced.Unfortunately,notallreleasesofthevariousprojectsthatweexaminedweretagged.Forreleaseswhichweretagged,identifyingtherevisionnumberofeachfilewassimple.However,ifnoreleasetagwaspresent,were-sortedtoabruteforceapproachwhichcomparedtheactualsourcecodefilescontainedinthereleasewitheachrevisionofthefileintherepositoryinanattempttofindamatch.Insomecases(typicallyinearlyreleasesofaprojectwhenthedevelopmentprocesswasnotformalized)wewereunabletofindamatchforallofthefilesinareleaseandthere-forelimitedourresultstoreleasesinwhichwewereabletomatchatleast80%ofthesourcefileswhichconstitutethebinaryexecutableexamined.
SincethetaggingofthesourcefilesatspecificpointsismanagedbydevelopersandnotCVS,eachprojectthatwasexaminedhadadifferentprocessinplacetorecordthemergingofbranchesintothemainline(ifthiswasevenrecordedatallintherepository).Thisposedaprobleminuniformlycomparingthenumberofrevisionsmadetoafilebetweentworeleasesinthepresenceofbranching.Toovercomethisissuewerecordedtwodifferentreleasecounts.Thefirstisaconservativelower-boundapproachwhichdoesnotcountrevisionsalongabranchbetweentworeleases,therebyassumingthateverybranchisinfactadeadbranch.Oursecondmethodisanoptimisticupper-boundapproachandcountseveryrevisionalongabranchandpossiblyevenfollowsotherbranchesthatexistbetweenthetworeleases.Forexample,supposethatforsomefiletherevisions1.4.2.1,1.4.2.2,1.4.2.3,and1.5existbetweentworeleases.Ifweidentifiedthatthefirstreleaseincludedrevi-
sion1.4.2.1andthelater1.5thenthelower-boundapproachwouldreportthatasinglerevisionwasmadebetweenre-leases,whileourupper-boundapproachwouldfindthatthreerevisionswereapplied(thelower-andupper-boundapproachesarelaterreferredtoasno-branchandbranchrespectively,inthegraphspresentedinSection3).Eventhoughthelower-andupper-boundapproachesmayrespec-tivelyunder-orover-estimatethemaintainabilityeffortap-pliedtoafile,wefoundthatinpracticetherewasverylittledifferencebetweenthetwoapproaches.
2.3CaseStudy
Usingourapproach,weanalyzedtheprimarybinariesfrommanypopularopen-sourceprojects,includingEmacs,GCC,GDB,Make,Vim,andPostgreSQLoverasignificantspanoftheirdevelopments.Specifically,weexaminedthefollowingbinaries(thenumberofreleasesofeachbinarystudiedandthetimespanofthereleasesisalsoreported):temacsTheCcoreoftheGNUEmacseditorwhichcon-tainsaLISPinterpreterandbasicI/Ohandling(10re-leases,14years)[17].cc1TheGNUCcompiler(gcc)notincludingthelibraries
whicharelinkedwithit(29releases,7years)[20].Weincludedonlythe“hand-written”codeandnottheextensiveamountofautomaticallygeneratedcodethatisincorporatedintocc1.libbackend.aAlibrarylinkedwithgccwhichper-formscodeanalysis,optimizationandgeneration(27releases,7years)[20].libgdb.so.aAlibrarywhichexportsthefunctionality
ofGDBthroughanAPI(11releases,7years)[19].makeTheGNUutilitywhichautomatesthecompilation
processofsourcecode(18releases,16years)[18].postgresTheback-endserverofthePostgreSQLrela-tionaldatabasemanagementsystem(10releases,11years).vimApopularopen-sourcetexteditormodeledafterVI(9
releases,8years).
3.Results
Inordertovisuallycomparethemaintenanceeffortap-pliedtothesourcefileswhichcontainmanyreferencestoglobalvariablestothosethatdonot,wegraphedtheaveragenumberofrevisionsforallfilesalongwiththeaveragerevi-sionsforthefileswith50%ofthereferencestoglobalvari-ables,andforthefileswith100%ofthereferencestoglobal
variables(thefilescomposing50%oftheglobalvariablereferenceswereselectedbysortingthefilesbythenumberofreferencesandchoosingthefirstfileswhichsumto50%ofthetotalnumberofglobalvariablereferences).Simi-larly,wegraphedthenormalizedaveragenumberoflineschangedineachrelease.Ifthepresenceofglobalvariablesisinfactdetrimentaltothecomprehensionandmodificationofcodethenwewouldexpectagreaternumberofchangeswouldberequiredtomaintainthefilescontainingalargenumberofreferencestoglobaldatacomparedtothosefileswhichhavefewerreferences(H3)(althoughpreviousre-search[10]hasdifferentiatedbetweenthevariousformsofmaintenance,wedonotinthispaper).Notonlydoweex-pectthepresenceofglobalvariablestoincreasethenumberofmodificationsrequiredbetweentworeleasesofaproduct,butwewouldalsoexpectthattheusageofglobalvariableswouldincreasethescopeofthemodifications,therebyin-creasingtheamountofsourcecodethatischanged(H4).Table1reportsthedetailsoftheinitialandfi-nalreleasesexaminedforeachproject.Inanat-tempttolimittheamountofgraphspresentedwese-lectedtworepresentativeprojectsanddirecttheinterestedreadertohttp://plg.uwaterloo.ca/˜j2selby/wcre07-results.htmlfortheomittedgraphs.Theresultsforcc1(Figures1and2)andpostgres(Figures3and4)illustrateourfindings.Itshouldbenotedthatnosignificantdifferencebetweentheupperandlower-boundapproacheswasfoundfortemacs,libbackend,make,postgresandvimandthereforetoimprovetheclarityofthegraphs,theupper-bound(branch)isomitted.
Everyeffortwasmadetoincludeallreleases,bothma-jorandminor,ofeachprojectthatweexamined.How-ever,somereleaseswereeitherunanalyzable(duetoei-therfailedcompilationordifficultiesinextractingtheCVSinformation)oromitted(aproductreleasewasissuedbutthefilesthatconstitutethetargetthatweexaminedwereunchanged).Onespecialincidentwasencounteredintheanalysisofvimandlibbackend.Theresultsforthesetargetswereskewedbythefactthatbothincludeaversion.csourcefilewhichhasadisproportionalnum-berofrevisionsandlineschangedincomparisontootherfiles(forlibbackendthisfilesimplystorestheversionnumberofthereleaseinastringandsimilarlyforvim).Wethereforeomittedthisfilefromouranalysis,however,thiswastheonlyspecialcircumstance.
Asexpected,examinationofthegraphsillustratesthatatalmostallpointsboththenumberofrevisionsandthetotalnumberoflinesofcodechangedarehigherforthesubsetoffileswhichcontainagreaternumberofreferencestoglobalvariables.
Theonlyinstanceswherethegraphsdeviatefromthispatternwhencontrastinglinesofcodechangedtoglobalvariablesisformakeandlibbackend.Inonlyonein-
stancedidcomparingthenumberoffilerevisionstoglobalvariableusagenotfollowthetrendwhichweenvisioned,namelyvim.Furtherexaminationoftheseoutlyingpointsprovidedsomeinsightintowhytheywerecontrarytoourhypotheses.Wefoundthatforsixoftheseventeenreleasesofmakeexamined,theaveragenumberoffilerevisionsforallofthefilescontainingaglobalvariablereferencewashigherthanthatofthefilescontainingthetop50%oftheglobalvariablereferences.Ateachofthesesixpointsasmallgroupoffiles(2–3)whicharejustoutsideofthe50%areheavilymodified.Interestingly,itisalwaysthesamesmallsetoffileswhichrequiressubstantialchanges,pos-siblyindicatingtheirimportancetothesystemorthattheyrequirecomplexmodification.Investigationofthelastthreereleasesofvimdiscoveredtheexistenceofthreefileswhichcontainzeroreferencestoaglobalvariablehowever,theywerechangedslightlyabovetheaveragenumberofrevi-sionsappliedtoallfiles.Wewereunabletoidentifyasin-glecauseforthegreaternumberoflinesofcodechangedforthefilescontainingatleastoneglobalvariablereferenceatthefourspikesinlibbackend.Weplantoexaminethisingreaterdepthinordertofindtheexactcauseofthisbehaviour.
Inanattempttotracktheevolutionofglobalvariableusagethroughouteachoftheprojectsweidentifiedthetopfivefilesandfunctionswhichcontainthegreatestnumberofreferencestoglobalvariablesineachrelease.Further-more,wealsoexaminedthefiveglobalsthatwerethemostheavilyreferencedineachproductrelease.libgdbexhib-itedtheleastamountoffluctuationwiththesamefourfiles,functionsandvariablesremaininginthesetoftopfiveoverallofthereleasesexamined.temacsandvimwerealsofoundtobequitestablewhenconsideringfilesandvari-ables.Inboth,onlyonefilewasdisplacedfromthetopfivesetwhilethreevariablesremainedheavilyreferencedintemacsandfourinvim.Greatervariationwasdisplayedinthefunctionswhichcontainedthemostglobalvariablereferences.Intemacsonlyonefunctionremainedinthetopfive,whiletwooffiveremainedfixedinvim.Aninterestingaspectofexaminingcc1andlibbackendfromGCCisthatmostofthelibbackendcodewassplitofffromcc1inrelease3.0ofGCC.Inthecreationoflibbackendthefivefilescontainingthegreatestnumberofreferencestoglobalvariableswasextractedfromcc1.Afterthesplit,thesetoffileswhichreliedmostheavilyonglobalvariablesremainedfairlyfixedwiththreefilesremaininginthetopfiveinlibbackend,andfourofthefiveincc1.Thespecificglobalvariableswhichwerereferencedmostheavilyincc1werealsothehighestusedinlibbackendandcontinuedtobeoverallreleasesexamined.Therewasgreatervariabilityexhibitedincc1withonlytwoofthetopfiveglobalvariablesremaininginthesetafterthesplit.
Table1.Thistablereportstheinitialandfinalreleasesexaminedforeachbinaryaswellasthenumberofthousandsoflinesofsourcecode(KLOC),thetotalnumberoffilesexamined,andthenumberoffilesthatcontainthegreatestnumberofreferencestoglobalvariablesthatcumulativelyaccountfor50%,and100%ofallglobalvariablereferencesrespectively.
Binarytemacscc1libbackendlibgdbmakepostgresvimRelease19.2521.42.954.1.03.04.0.35.06.56.636.811.028.135.56.4
KLOC109198232102312331442171324142355126217
TotalFiles576767217915210420816242363583947
TotalFiles50%refs10910311201215331826811
TotalFiles100%refs
5361591672133914515161772923544
Ave. File Revisions (No Br.)Ave. File Revisions (100% Refs, No Br.) Ave. File Revisions (50% Refs, Br.)32028024020016012080400Ave. File Revisions (50% Refs, No Br.)Ave. File Revisions (Br.)Ave. File Revisions (100% Refs, Br.) Average Number of File Revisions.1.2.3.3.2.2.0.2.4.1.3.1.30.14-31-33-33.03.23.33.43.43.44.04.0.95.95-3....--------22001221161302--............533335.23.03.23.33.33.43.44.04.02.92.9ReleasesFigure1.AcomparisonofthenumberofCVSfilerevisionsforcc1fromGCC.
Norm. Ave. LOC Changed (No Br.)Norm. Ave. LOC Changed (100% Refs, No Br.) Norm. Ave. LOC Changed (50% Refs, Br.) Norm. Ave. LOC Changed (50% Refs, No Br.) Norm. Ave. LOC Changed (Br.)Norm. Ave. LOC Changed (100% Refs, Br.) 640Normalized Average Number of LOC Changed560480400320240160800.1.1.2.3.3.2.2.0.2.4.1.3.1.33.04-31-33-33.03.23.33.43.43.44.04.0.95.95-...--------22001221161302--........53.3.3.3.5.23.03.23.33.33.43.44.04.02.92.9ReleasesFigure2.Acomparisonofthenormalizednumberoflineschangedbetweenreleasesofcc1fromGCC.
Ave. File Revisions (No Br.)70Ave. File Revisions (50% Refs, No Br.)Ave. File Revisions (100% Refs, No Br.) 60Average Number of File Revisions504030201001.02-6.56.5-7.07.0-7.27.2-7.47.4-8.0.08.0.0-8.0.18.0.1-8.0.78.0.7-8.0.88.0.8-8.1.08.1.0-8.1.3ReleasesFigure3.AcomparisonofthenumberofCVSfilerevisionsforpostgres.
Norm. Ave. LOC Changed (No Br.)Norm. Ave. LOC Changed (100% Refs, No Br.) Norm. Ave. LOC Changed (50% Refs, No Br.) 350Normalized Average Number of LOC Changed3002502001501005001.02-6.56.5-7.07.0-7.27.2-7.47.4-8.0.08.0.0-8.0.18.0.1-8.0.78.0.7-8.0.88.0.8-8.1.08.1.0-8.1.3ReleasesFigure4.Acomparisonofthenormalizednumberoflineschangedbetweenreleasesofpostgres.
Thesetoftopfivefilesandfunctionsremainedrelativelyconstantinbothmakeandpostgres,withthreeremain-inginthetopfiveovertheentirelifetimethatweexamined.However,themostheavilyreferencedglobalvariablesfluc-tuatedgreatly,withnoneofthetopfiveintheinitialreleaseremaininginthetopfivesetatthefinalrelease.
Althoughthegraphsappeartosubstantiatethelinkbe-tweenglobalvariableusageandmaintenanceeffort,furtherevidenceoftheconnectionisrequired.Therefore,wecal-culatedthecorrelationcoefficients(rvalues)ofbothmea-sures.Calculationofanrvalueenablesonetoevaluatethedegreeofcorrelationbetweentwoindependentvari-ables(specifically,revisionstoglobalvariablesandtotallineschangedtoglobalvariablereferences).Table2liststheresultsofcorrelatingthenumberofreferencestoglobalvariablesinafiletothenumberofrevisionscheckedintoCVS(r(Rev,Ref))andalsoforthetotallinesofcodechangedtothenumberofreferencestoglobalvariables(r(Lines,Ref)).Thecorrelationcoefficientsinboldrepre-sentinstancesofclosecorrelationbetweenthetwovariablesforanacceptableerrorrateof5%(α=0.05),however,al-mostallwerewithina1%errorrate.Strongcorrelationwasfoundbetweenbothrevisionstoreferencesandlinestoreferences.However,inallcasesthecorrelationbetweenthenumberofrevisionsandglobalvariablereferenceswascloser.Although,thisdoesnotestablishacauseandeffect
relationshipitdoesprovideevidencethatastrongrelation-shipexistsbetweentheusageofglobalvariablesandboththenumberandscopeofchangesappliedtofilesbetweenproductreleases.Furthermore,thisprovidessupportforourhypothesesthatfileswhichcontainagreaternumberofref-erencestoglobalvariablesrequiremorechanges(H3),andthatthesechangescorrespondtothemodificationofmorelinesofcode(H4).ExtrapolationfromH3andH4providesevidencefortheacceptanceofouroriginalhypothesesthatglobalvariableusagebothincreasesmaintenance(H1),andimpairscomprehension(H2).
4.ThreatstoValidity
Weshouldnotethepossiblethreatstothevalidityofourstudy.Asstatedearlier,gv-finderrequiresasuccessfulcompilationofthetargetexecutableinordertoperformitsanalysis.Intheworstcasethisrequiredcommentingouttheoffendinglinesofcode(this,however,occurredfairlyin-frequentlyandonlyforsmallcodesegments).Additionally,sincethebuildenvironmenthaschangedoverthecourseoftheprojectslifetime,wedeployedfourdifferentma-chines,eachrecreatingaspecificandolderbuildenviron-mentneededtosatisfyvariousreleases.Theuseofdiffer-entsystemsintroducedaminimalamountoferror,sinceallofthemachinesareofthesamearchitecture(x86,Linux),
Table2.Resultsofcorrelatingthenumberofre-visionsmadetoafilebetweenreleaseswiththeamountofreferencestoglobalvariableswithinthefile(r(Rev,Ref)),andforthetotalnumberoflineschangedinafiletoitsnumberofreferencestoglobalvariables(r(Lines,Ref)).Correlationcoefficientsinboldidentifyinstancesofaclosecorrelation.Nisthenumberofpairsexamined.BinaryNr(Rev,Ref)r(Lines,Ref)temacs5200.270.16cc16420.160.09libbackend28220.120.08libgdb15630.440.39make3370.420.31vim3360.330.27postgres31560.240.22andthereforeareequallyimpactedbyexternalfactorsaf-fectingthesourcecode(suchasconditionalcompilation).Althoughthisstudyexaminedawidespectrumofsoftwareproducts,alloftheprojectsareopen-source(evenfurtheral-mostallaredevelopedbyGNU)andthereforeitisnotclearthatourfindingsareapplicabletoproprietarysoftware.Inposingourhypothesesweequatedthepresenceofglobalvariablestoincreasedmaintenancecostsintheformofboththenumber,andthesizeofthechangesperformed.However,otherexplanationsarealsopossible.Forexam-ple,afilethatchangedfrequentlymightbeanarchitec-tural“hotspot”fortheadditionofnewfeatures;thusfre-quentchangesmaybeasignofsuccessfulgrowthratherthanpoordesign.Similarly,largedeltasmightmeanthatthesystem’sdesignwassufficientlyrobusttoallowfortheadditionofnewfunctionality.However,intheabsenceofawayofautomaticallycategorizingtheintentoftheindi-vidualchanges,weassumethatmostchangesaredueto“fixing”ratherthanaddingnewfeatures.
Finally,whenexaminingtheextentofthemodificationsperformedwenormalizedthedeltavaluesbythefilesize.However,wedidnotnormalizethenumberofchangestothesizeofthefile.Infutureworkweplanontakingthisintoaccountandnormalizingthenumberofchangesbytheamountofreferencestoglobalvariablesperlineofsourcecode.
5.RelatedWork
Atoolsimilartogv-finderisdescribedin[21]whichusestheoutputofobjdumptogatherglobalsymbolin-formation.WechosetoextractthedataourselvessincewealreadyhadanexistinginfrastructureforanalyzingELFob-
jectfilesandalsotoimproveefficiency.
Schachetal.[15]andlaterYuetal.[23]examinedglobalvariableusageintheLinuxkernel.Theirinitialworkin[15]discoveredthatslightlymorethanhalfofallmodulesexaminedsufferedfromsomeformofclandestinecoupling.Thelatterworkin[23]continuedtheexaminationofclan-destinecouplingbetweenkernelandnon-kernelmodulesinLinux.Applyingdefinition-useanalysisfromcompilerthe-ory[13],theyidentifiedallmoduleswhichdefined(wrote)aglobalvariableandtheotherswhichreferenced(read)eachglobal.Theyfoundthatalargenumberofglobalvariablesaredefinedinnon-kernelmodulesandarereferencedinakernelmodule.Giventhelackofcontrolovernon-kernelmodulesbykerneldevelopers[15,23]raisedconcernsoverthelongevityofLinux,suggestingthatmaintainabilityis-suesmightarisegiventhecommoncouplingfoundtoexistbetweenkernelandnon-kernelmodules.However,theanal-ysisbasedsimplyonthebulknumberofdefinitionsandusesmightbemisleading.Amoreconclusiveexaminationcouldusedefinition-usechains[13].Def-usechainsconnectusesofavariablewiththeirexactpointofdefinition.Usingacodeanalysistooltoconstructthedef-usechains,wecouldthenidentifythechainswhichareformedfromthedefini-tionofavariableinanon-kernelmoduleandthenlaterusedinakernelmodule.
Theapplicationofdataminingtovariousartifactsofthesoftwaredevelopmentprocesstodiscoveranddirectevolu-tionpatternshasrecentlyreceivedextensivetreatment,mostnotablyin[4,6,5,24].AcommonmeasureofsoftwarechangethroughoutmuchofthisresearchisbaseduponthenumberofCVSupdatestoafile(CVSreleasenumbers)andthetotallinesofcodechangedbetweenreleases.
Eppingetal.[3]examinedtheconnectionbetweenver-tical(specification)andhorizontal(inter-module)designcomplexitiesandmaintainability(change)effortduringtheacceptanceandmaintenancephasesoftwoFORTRANsys-tems.Specifically,inregardstoglobalvariableusagetheyexaminedthenumberofglobalsdefined,theactualnumberofglobalsreferencedandmaintainability,whichischarac-terizedbychangeeffort.Thechangeeffortmetricwasfur-thercategorizedasbeingisolationeffort(identifyingwhichmodulesrequiremodification),implementationeffort(de-velop,programandtestthechange)orlocality(thenumberofmodulesalsorequiringmodification).Additionally,thesubsetofallthetasksperformedduringthemaintenancephasewhichwerebugfixeswasidentified.Resultsforallchanges(bugandenhancement)inthemaintenancephaseindicatedacorrelationbetweenchangeisolationandtoboththenumberofglobalvariablesandtheamountofreferencestoglobals.However,nolinkwasfoundtoexistinimple-mentationeffortorlocality.Whenfocusingstrictlyuponmaintenancephasebugfixes,bothchangeisolationandim-plementationeffortwerefoundtocorrelatetotheusageof
globalvariables.
HarrisonandWalton[7]appliedasimilarmetricformaintainabilityasinthisstudytoalargenumberofsmalllegacyFORTRANprogramsminingthreeyearsofCVSdata.Themeasuresexaminedincludedlinesofcodeandstruc-turalcomplexity(numberofGOTOstatementsandcyclo-maticcomplexity).Theirfindingsindicatedthatlinesofcodeofferedonlyminorinsightintofuturemaintenancecostswhilenocorrelationbetweenanyofthestructuralcharacteristicsoftheprogramsandmaintenancecostswerefoundtoexist.Incontrastto[3]and[7],ouranalysisisbaseduponamuchlargerdatasetencompassingmanyre-leasesofsevenlargesystems,differentmeasuresofmain-tainabilityeffortareusedandalsothedifferingsemanticsofglobaldatainFORTRANcomparedtoC.
Zimmermannetal.[24]applieddataminingtoCVSrepositoriesinordertodeterminevarioussourcecompo-nents(forexample,files,functionsandvariables)whichareconsistentlychangedinunison.IntegrationoftheirtoolintoanIDEenabledthemtosuggest,withareasonabledegreeofaccuracy,otherpartsofthecodewhichmightneedtobemodifiedgivenachangetoanelementinwhichithasbeendeterminedtohavebeenchangedtogetherinthepast.Simi-larworkappearedin[6],howevertheirworkfocusedonthehigher-levelgranularityofclasses.
Itiscommonlybelievedthatbyemployingautomaticcodegeneratorsandpackagedlibraries,theinitialsoftwaredevelopmentcostscouldbedecreasedandthisreductionofeffortwouldcontinueintothelattermaintenancephaseofaproject.Banker,DavisandSlaughter[1]examinedhowtheuseoftheseaffectedsoftwarecomplexity,whichinturnin-creasesthedifficultyinperformingmaintenancetasks.Thisperceptionwasconfirmedfortheuseofpackagedlibrariesfortheirsample(theyexaminedtheapplicationof29per-fectivemaintenancetasksto23COBOLprograms).How-ever,contrarytointuition,theuseofautomaticcodegen-eratorsactuallyleadtoanincreaseintheamountoftimespentonmaintenancetasks.Thisisaninterestingresultinconsiderationoftheprojectsthatwereexaminedinthisstudy.Thedatacollectedforcc1waslimitedtothe“handwritten”coderatherthantheextensiveamountofautomat-icallygeneratedcode.Comparisonoftheusageofglobalvariablesintheauto-generatedcodetothatofhand-writtencodeandisolationofwhichpartofthecodeismodifiedcouldbeanotherapproachtoinvestigatingthiscontraryre-sult.
6.Conclusions
Inthispaperweexaminedthelinkbetweentheuseofglobalvariablesandsoftwaremaintenanceeffort.Harness-inginformationextractedfromCVSrepositories,weex-aminedthislinkforsevenlargeopensourceprojects.We
proposedtwomeasuresofsoftwaremaintenance;specifi-cally,thenumberofrevisionsmadetoafileandthetotallinesofcodechangedbetweentworeleases.Examinationofthegraphsillustratedthatatalmostallpointsboththenumberofrevisionsandthetotalnumberoflinesofcodechangedwerehigherforthesubsetoffileswhichcontainedagreaternumberofreferencestoglobalvariables.Furtherinvestigationusingstatisticalanalysisrevealedastrongcor-relationbetweenboththenumberofrevisionstoglobalvari-ablereferencesandlinesofcodechangedtoglobalvariablereferences.However,inallcasesthecorrelationbetweenthenumberofrevisionsandglobalvariablereferenceswasstronger.Althoughthisdoesnotestablishacauseandef-fectrelationship,itdoesprovideevidencethatastrongre-lationshipexistsbetweentheusageofglobalvariablesandboththenumberandscopeofchangesappliedtofilebe-tweenproductreleases.Furthermore,theresultingcorrela-tionsoffersupportforourhypothesesthatglobalvariableusagereducesmaintainabilityandimpairscomprehension.Theseresultssuggestthattheuseofglobalvariablesshouldbeavoidedwhenpossible,therebyimprovingtheabilityofsoftwaretoageandsuccessfullyevolveovertime.
References
[1]R.D.Banker,G.B.Davis,andS.A.Slaughter.Soft-waredevelopmentpractices,softwarecomplexity,andsoft-waremaintenanceperformance:afieldstudy.Manage.Sci.,[2]44(4):433–450,P.Cederqvist.Version1998.
ManagementwithCVS,2005.Avail-[3]ableA.Eppingathttp://ximbiot.com/cvs/manualandC.Lott.Doessoftwaredesigncomplexity.
af-fectmaintenanceeffort?InProceedingsoftheNASA/GSFC19thAnnualSoftwareEngineeringWorkshop.SoftwareEn-gineeringLaboratory:NASAGoddardSpaceFlightCenter,[4]1994.
M.FischerandH.Gall.Visualizingfeatureevolutionof
large-scalesoftwarebasedonproblemandmodificationre-portdata.JournalofSoftwareMaintenanceandEvolution:[5]ResearchM.Fischer,andJ.PracticeOberleitner,,16:385–403,J.Ratzinger,NovemberandH.Gall.2004.
Mining
evolutiondataofaproductfamily.SIGSOFTSoftw.Eng.[6]NotesH.Gall,,30(4):1–5,M.Jazayeri,2005.
andJ.Krajewski.CVSreleasehistory
datafordetectinglogicalcouplings.InIWPSE’03:Pro-ceedingsofthe6thInternationalWorkshoponPrinciplesofSoftwareEvolution,page13,Washington,DC,USA,2003.[7]IEEEM.S.ComputerHarrisonandSociety.
G.H.Walton.Identifyinghighmain-tenancelegacysoftware.JournalofSoftwareMaintenance,[8]14(6):429–446,A.HuntandD.2002.
Thomas.Thepragmaticprogrammer:from
journeymantomaster.Addison-WesleyLongmanPublish-[9]ingC.F.Co.,KemererInc.,Boston,andS.A.MA,Slaughter.USA,1999.
Determinantsofsoftware
maintenanceprofiles:anempiricalinvestigation.JournalofSoftwareMaintenance,9(4):235–251,1997.
[10]B.P.LientzandE.B.Swanson.SoftwareMaintenanceMan-agement.Addison-WesleyLongmanPublishingCo.,Inc.,Boston,MA,USA,1980.
[11]J.MartinandC.L.McClure.SoftwareMaintenance:The
ProblemsandItsSolutions.PrenticeHallProfessionalTech-nicalReference,1983.
[12]S.McConnell.Codecomplete:apracticalhandbookofsoft-wareconstruction.MicrosoftPress,Redmond,WA,USA,secondedition,2004.
[13]S.S.Muchnick.Advancedcompilerdesignandimplementa-tion.MorganKaufmannPublishersInc.,SanFrancisco,CA,USA,1997.
[14]F.P.RuffellandJ.W.A.Selby.Thepervasivenessof
globaldatainevolvingsoftwaresystems.InL.BaresiandR.Heckel,editors,FASE,volume3922ofLectureNotesinComputerScience,pages396–410.Springer,2006.
[15]S.R.Schach,B.Jin,D.R.Wright,G.Z.Heller,andJ.Offutt.
Qualityimpactsofclandestinecommoncoupling.SoftwareQualityControl,11(3):211–218,2003.
[16]S.R.SchachandA.J.Offutt.Onthenon-maintainabilityof
open-sourcesoftwarepositionpaper.2ndWorkshoponOpenSourceSoftwareEngineering,May2002.
[17]R.M.Stallman.GNUEMACSManual.FreeSoftwareFoun-dation,2000.
[18]R.M.Stallman,R.McGrath,andP.D.Smith.GNUMake:A
ProgramforDirectingRecompilation.FreeSoftwareFoun-dation,2004.
[19]R.M.Stallman,R.Pesch,andS.Shebs.Debuggingwith
GDB:TheGNUSource-LevelDebugger.FreeSoftwareFoundation,2002.
[20]R.M.StallmanandtheGCCDeveloperCommunity.Us-ingGCC:TheGNUCompilerCollectionReferenceManual.FreeSoftwareFoundation,2003.
[21]H.S.TeohandD.B.Wortman.Toolsforextractingsoftware
structurefromcompiledprograms.InICSM’04:Proceed-ingsofthe20thIEEEInternationalConferenceonSoftwareMaintenance,page526,Washington,DC,USA,2004.IEEEComputerSociety.
[22]J.v.Vliet.SoftwareEngineering–PrinciplesandPractice.
JohnWiley&Sons,NewYork,NewYork,USA,2ndedition,2000.
[23]L.YuandK.Chen.Categorizationofcommoncoupling
anditsapplicationtothemaintainabilityoftheLinuxker-nel.IEEETrans.SoftwareEng.,30(10):694–706,2004.[24]T.Zimmermann,P.Weisgerber,S.Diehl,andA.Zeller.Min-ingversionhistoriestoguidesoftwarechanges.InICSE’04:Proceedingsofthe26thInternationalConferenceonSoft-wareEngineering,pages563–572,Washington,DC,USA,2004.IEEEComputerSociety.
因篇幅问题不能全部显示,请点此查看更多更全内容