搜索
您的当前位置:首页Recovering Maintainability Effort in the Presence of Global Data Usage

Recovering Maintainability Effort in the Presence of Global Data Usage

来源:飒榕旅游知识分享网
Appears in Proc 14th Working Conference on Reverse Engineering (WCRE), pp. 60-69, 2007.RecoveringMaintainabilityEffortinthePresenceofGlobalDataUsage∗

JasonW.A.Selby,FraserP.Ruffell,MarkGiesbrechtandMichaelW.Godfrey

DavidR.CheritonSchoolofComputerScience

UniversityofWaterloo,Waterloo,Ontario,CanadaN2L3G1

E-mail:{j2selby,fruffell,mwg,migod}@uwaterloo.caAbstract

Astheusefullifeexpectancyofsoftwarecontinuestoincrease,thetaskofmaintainingthesourcecodehasbe-comethedominantphaseofthesoftwarelife-cycle.Inordertoimprovetheabilityofsoftwaretoageandsuccessfullyevolveovertime,itisimportanttoidentifysystemdesignandprogrammingpracticeswhichmayresultinincreaseddifficultyformaintainingthecodebase.

Thisstudyattemptstocorrelatetheuseofglobalvari-ablestothemaintainabilityofseveralwidelydeployed,largescalesoftwareprojectsastheyevolveovertime.Twomeasuresareproposedtoquantifythemaintenanceeffortofaproject.ThefirstmeasurecomparesthenumberofCVSrevisionsforallsourcefilesinareleasetothenumberofrevisionsappliedtofileswheretheusageofglobaldataismostprevalent.Aseconddegreeofchangeischaracterizedbycontrastingtheamountofsourcecodethatwaschangedoveralltothechangesmadetothosesourcefileswhichcon-tainthemajorityofthereferencestoglobaldata.

Weobservedastrongcorrelationbetweenthenumberofrevisionstoglobalvariablesreferencesandlinesofcodetoglobalvariablereferences.Inallcasesthecorrelationbetweenthenumberofrevisionsandglobalvariablerefer-enceswasstronger.Thisprovidesevidencethatastrongre-lationshipexistsbetweentheusageofglobalvariablesandboththenumberandscopeofchangesappliedtofilesbe-tweenproductreleases.

indicatorstothemaintainabilityofaproject[11].However,fewempiricalstudieshaveexaminedthedegreetowhichthesefactorsimpactprojectmaintainability.Evidencelink-ingmanyofthesemeasurestoanapproximationofthemaintenanceeffortforaproducthasfoundthatthecorrela-tionisweakatbestformanyofthesefactors[10,3,7,1,9].Inordertoimprovetheabilityofsoftwaretoageandsuc-cessfullyevolveovertime,itisimportanttoidentifysystemdesignandprogrammingpracticeswhichmayresultinin-creaseddifficultyformaintainingthecodebase.

Thedecompositionofcomplexsoftwaresystemsintoin-dividualmodulesthatgrouptogetherrelatedconceptsandtasksimprovesprogramcomprehensionandmaintenance.Ideally,modulesaredesignedtoexhibitalowdegreeofcouplingbetweenothermodulesandahighdegreewithinthesamemodule.However,toenableinter-modulecommu-nication,someformofcouplingmustexist.Commoncou-plingisanundesirableformofcouplingintroducedwhenmodulesreferencethesameglobaldata[23](andworse,commoncouplingcanbeclandestineinthesensethatitcanbeintroducedwithoutexplicitchangestoamodule;see[15]).Themanyreasonswhyglobalvariableusageisconsideredharmfulandshouldbeavoidedarewelldocu-mentedin[8,12,16,23].Examplesoftheunanticipatedsideeffectsofglobalvariableusageincludehiddenalias-ing,namespacepollution,andevenhamperingcodereuseacrossprojects.

Weareinterestedinanempiricalevaluationofhowtheuseofglobalvariablesmayaffectsoftwareevolution.Inthispaper,weinvestigatetwohypothesisconcerningtheim-pactofglobalvariablesonmaintainability:

H1Theuseofglobalvariablesleadstocodethatrequires

moremaintenance.H2Theuseofglobalvariablesleadstocodethatisdifficult

tocomprehend.Weinterpretthesehypothesesmoredirectlybyaligningthefiles(modules)ofthesystemsourcecodebasebythenumberofreferencestoglobalvariables.Wethenobserve

1.Introduction

Themaintenancephaseofthesoftwarelifecyclehasbeenidentifiedasbeingthedominantphaseintermsofbothtimeandmoney[22].Logically,onecouldpointtothecodesize,structure,age,complexity,developmentlanguageandthequalityoftheinternaldocumentationasbeingthekey

researchwassupportedinpartbyaNaturalSciencesandEngi-neeringResearchCouncilofCanadaStrategicandDiscoverygrants.

∗This

themaintenanceeffortforallfilesbytrackingthenumberandsizeofCVScommits.Inparticular,wetestthefollow-ingtwohypotheses:

H3Fileswithagreaternumberofglobalvariablerefer-enceschangemoreoftenthanfileswithfewerrefer-ences.H4Fileswithagreaternumberofglobalvariablerefer-encesentaillargerchangedeltasthanfileswithfewerreferences.IfwefindsupportforH3,wetakethisasevidencethatH1istrue,andsimilarlyifwefindsupportforH4,wetakethisasevidencethatH2istrue.Usingourapproach,wean-alyzedbinariesfrommanypopularopen-sourceprojectsin-cludingEmacs,GCC,GDB,Make,Vim,andPostgreSQL.Theremainderofthepaperisorganizedinthefollowingmanner.Section2detailshowourstudywasperformedandtheprojectsthatweexamined.InSection3wereportanddiscusstheresultsthatwereobtained.Section5discussesrelatedresearchthathasexaminedtheusageofglobaldataandtoolswhichhavebeendevelopedtoextractsoftwareartifactsfromCVSrepositories.Finally,weconcludeandpointoutpromisingfuturedirectionsbasedonthisworkinSection6.

2Methodology

Thissectionbeginsbydescribingourapproachtotrack-ingglobalvariableusageandtomeasuringthemaintain-abilityeffortthroughoutthelifetimeofaproject.Wethengiveanoverviewofthesystemsthatwereexaminedinthiscasestudy.

2.1

ExtractingData-GlobalVariableUsage

gv-finder

Ourinitialexaminationoftheevolutionofglobaldatathroughoutthelifetimeofmanyopensourceprojectsre-sultedinthecreationofalinker-liketoolcapableofex-tractingglobalvariableusagedatafromobjectfiles[14].Thistool,namedgv-finder,interceptsrelocatableEx-ecutableandLinkingFormat(ELF)objectfiles(non-stripped)atthelinkingstageofthecompilationprocess,analyzesthefilesandthenpassesthefilesontotheactuallinker.Thisprocessofcollectingglobalvariableinforma-tionfitsseamlesslyintothebuildprocessandenablestheanalysisofevolutionarytrendsoverentireproductlifetimes.Examinationofthesymbolandrelocationtablesintheobjectfilesyieldsthenamesofallglobalvariables,themoduleinwhicheachisdefined,aswellasthenamesofallmoduleswhichreferenceeachglobal.Weclassifyeachreferenceaseithertrue,staticorexternal.Atrueglobal

variablecontainsa“global”entryinthesymboltableandiswhatonetypicallythinksofasaglobalvariable–auserde-finedvariablestoragewhichcanbereferencedinanymod-ule.Usageofatrueglobalvariableisconsideredthemostdangerousduetotheimplicitcouplingbetweenanyandallmoduleswhichreferencethesameglobalsymbol[23].Staticglobalsaremarkedas“local”inthesymbolta-bleandthereforecanbereferencedanywhereinsidethefileinwhichitisdefined(forexample,aCfilescopedvari-able).Theuseofastaticglobaldoesnotintroduceclan-destinecouplinghowever,itdoescarrytheotherpotentialdrawbacksofusingglobalvariables.Anexternalglobalisdenotedbyan“undefined”symboltableentryandisasymbolimportedfromalibrary(forexample,printf()orstdoutfromtheCstandardlibrary).Differentiationofexternalreferencesbetweenfunctioncallsandvariablereferencesisperformedbydisassemblingtheinstructionswhichcontainareferencetoglobaldata.Iftheinstructionisajmporcall,thentheusageisconsideredafunctionotherwise,itisavariable.Allofthedatapresentedinthispaperisrestrictedtoreferencestotrueglobaldata.Furtherdetailsongv-findercanbefoundin[14].

Theintegrationofgv-finderintothelinkagestageenablesustobypassbuildenvironmentissuesand,moreimportantly,tobaseourresultssolelyupontheactualmod-ulesincludedinthefinalresult.Ouranalysisisrestrictedtothespecificglobalvariablereferencesthatarepresentinthefinalexecutableandnotthosepresentintheentiresourcecodebase.Thiseliminatesthepossibilityofcountingequiv-alentglobalvariablereferencesmultipletimesthatarenotpresentintheexecutableduetoreasonssuchasconditionalinclusionofobjectfilesforspecificmachinearchitecturesandoperatingsystems.Thedisadvantageofourlink-timeanalysisisthatgv-finderrequiresasuccessfulcompila-tionofthetargetexecutable.Whenanalyzingolderreleases(forexample,westudiedversionsofEmacsovertenyearsold),thebuildprocessoftenfailedduetodependenciesondeprecatedAPIs(eitherlibraryorOS).Ratherthanomitreleaseswhichfailedtobuild,wedeployedfourdifferentmachineseachrecreatingaspecificandolderbuildenviron-mentneededtosatisfyvariousreleases.Theuseofdifferentsystemsintroducedaminimalamountoferror,sinceallofthemachinesareofthesamearchitecture(x86,Linux),andthereforeareequallyimpactedbyexternalfactorsaffectingthesourcecode(suchasconditionalcompilation).

2.2

MeasuringMaintainabilityEffort

2.2.1

TheConcurrentVersionsSystem

Weproposetwomeasurestoanswerourpostulates,bothofwhichharnessinformationextractedfromtheConcurrentVersionsSystem(CVS),apopularsourcecodemanagementsystemthattracksthevariouschangesmadetofilesanden-

ablesconcurrentdevelopmentbymanydevelopers[2].Forexample,mininginformationfromaCVSrepositorycanyieldthenumberofrevisionsmadetoeachfilebetweeneachproductrelease.ThisthenenablesthecomparisonofthenumberofCVSrevisionsforfilesinwhichtheusageofglobaldataismostprevalenttothosewhichhavefewerornoreferences.CVSisalsoabletoreportthenumberoflineschangedbetweentworevisionsofafile.Inanattempttocharacterizethescopeofthechangesperformedonafile,weextractthisinformationfromtherepositoryandcomparethetotallineschangedinfileswhichhavealargenumberofreferencestoglobalvariablestootherfilesinthesystem.CVSuniquelyidentifieseachversionofafilethroughtheuseofarevisionnumber.Theinitialversionofafileisassignedtherevisionnumber1.1afterwhich,eachtimeanupdatetothefileischeckedintotherepository,anewnumberisassignedtothefile(forexample,1.2).CVSre-visionnumbersareinternaltothesystemandhavenore-lationshipwithsoftwarereleases.Instead,symbolicnamesortagsareappliedtothesetoffileswhichconstituteapar-ticularreleaseofasystem.Typically,allofthefilesinarepositoryareassignedanewtagateveryreleasepoint,creatingaCVSsnapshotofthecodewhichcanbelaterreferenced.Unfortunately,notallreleasesofthevariousprojectsthatweexaminedweretagged.Forreleaseswhichweretagged,identifyingtherevisionnumberofeachfilewassimple.However,ifnoreleasetagwaspresent,were-sortedtoabruteforceapproachwhichcomparedtheactualsourcecodefilescontainedinthereleasewitheachrevisionofthefileintherepositoryinanattempttofindamatch.Insomecases(typicallyinearlyreleasesofaprojectwhenthedevelopmentprocesswasnotformalized)wewereunabletofindamatchforallofthefilesinareleaseandthere-forelimitedourresultstoreleasesinwhichwewereabletomatchatleast80%ofthesourcefileswhichconstitutethebinaryexecutableexamined.

SincethetaggingofthesourcefilesatspecificpointsismanagedbydevelopersandnotCVS,eachprojectthatwasexaminedhadadifferentprocessinplacetorecordthemergingofbranchesintothemainline(ifthiswasevenrecordedatallintherepository).Thisposedaprobleminuniformlycomparingthenumberofrevisionsmadetoafilebetweentworeleasesinthepresenceofbranching.Toovercomethisissuewerecordedtwodifferentreleasecounts.Thefirstisaconservativelower-boundapproachwhichdoesnotcountrevisionsalongabranchbetweentworeleases,therebyassumingthateverybranchisinfactadeadbranch.Oursecondmethodisanoptimisticupper-boundapproachandcountseveryrevisionalongabranchandpossiblyevenfollowsotherbranchesthatexistbetweenthetworeleases.Forexample,supposethatforsomefiletherevisions1.4.2.1,1.4.2.2,1.4.2.3,and1.5existbetweentworeleases.Ifweidentifiedthatthefirstreleaseincludedrevi-

sion1.4.2.1andthelater1.5thenthelower-boundapproachwouldreportthatasinglerevisionwasmadebetweenre-leases,whileourupper-boundapproachwouldfindthatthreerevisionswereapplied(thelower-andupper-boundapproachesarelaterreferredtoasno-branchandbranchrespectively,inthegraphspresentedinSection3).Eventhoughthelower-andupper-boundapproachesmayrespec-tivelyunder-orover-estimatethemaintainabilityeffortap-pliedtoafile,wefoundthatinpracticetherewasverylittledifferencebetweenthetwoapproaches.

2.3CaseStudy

Usingourapproach,weanalyzedtheprimarybinariesfrommanypopularopen-sourceprojects,includingEmacs,GCC,GDB,Make,Vim,andPostgreSQLoverasignificantspanoftheirdevelopments.Specifically,weexaminedthefollowingbinaries(thenumberofreleasesofeachbinarystudiedandthetimespanofthereleasesisalsoreported):temacsTheCcoreoftheGNUEmacseditorwhichcon-tainsaLISPinterpreterandbasicI/Ohandling(10re-leases,14years)[17].cc1TheGNUCcompiler(gcc)notincludingthelibraries

whicharelinkedwithit(29releases,7years)[20].Weincludedonlythe“hand-written”codeandnottheextensiveamountofautomaticallygeneratedcodethatisincorporatedintocc1.libbackend.aAlibrarylinkedwithgccwhichper-formscodeanalysis,optimizationandgeneration(27releases,7years)[20].libgdb.so.aAlibrarywhichexportsthefunctionality

ofGDBthroughanAPI(11releases,7years)[19].makeTheGNUutilitywhichautomatesthecompilation

processofsourcecode(18releases,16years)[18].postgresTheback-endserverofthePostgreSQLrela-tionaldatabasemanagementsystem(10releases,11years).vimApopularopen-sourcetexteditormodeledafterVI(9

releases,8years).

3.Results

Inordertovisuallycomparethemaintenanceeffortap-pliedtothesourcefileswhichcontainmanyreferencestoglobalvariablestothosethatdonot,wegraphedtheaveragenumberofrevisionsforallfilesalongwiththeaveragerevi-sionsforthefileswith50%ofthereferencestoglobalvari-ables,andforthefileswith100%ofthereferencestoglobal

variables(thefilescomposing50%oftheglobalvariablereferenceswereselectedbysortingthefilesbythenumberofreferencesandchoosingthefirstfileswhichsumto50%ofthetotalnumberofglobalvariablereferences).Simi-larly,wegraphedthenormalizedaveragenumberoflineschangedineachrelease.Ifthepresenceofglobalvariablesisinfactdetrimentaltothecomprehensionandmodificationofcodethenwewouldexpectagreaternumberofchangeswouldberequiredtomaintainthefilescontainingalargenumberofreferencestoglobaldatacomparedtothosefileswhichhavefewerreferences(H3)(althoughpreviousre-search[10]hasdifferentiatedbetweenthevariousformsofmaintenance,wedonotinthispaper).Notonlydoweex-pectthepresenceofglobalvariablestoincreasethenumberofmodificationsrequiredbetweentworeleasesofaproduct,butwewouldalsoexpectthattheusageofglobalvariableswouldincreasethescopeofthemodifications,therebyin-creasingtheamountofsourcecodethatischanged(H4).Table1reportsthedetailsoftheinitialandfi-nalreleasesexaminedforeachproject.Inanat-tempttolimittheamountofgraphspresentedwese-lectedtworepresentativeprojectsanddirecttheinterestedreadertohttp://plg.uwaterloo.ca/˜j2selby/wcre07-results.htmlfortheomittedgraphs.Theresultsforcc1(Figures1and2)andpostgres(Figures3and4)illustrateourfindings.Itshouldbenotedthatnosignificantdifferencebetweentheupperandlower-boundapproacheswasfoundfortemacs,libbackend,make,postgresandvimandthereforetoimprovetheclarityofthegraphs,theupper-bound(branch)isomitted.

Everyeffortwasmadetoincludeallreleases,bothma-jorandminor,ofeachprojectthatweexamined.How-ever,somereleaseswereeitherunanalyzable(duetoei-therfailedcompilationordifficultiesinextractingtheCVSinformation)oromitted(aproductreleasewasissuedbutthefilesthatconstitutethetargetthatweexaminedwereunchanged).Onespecialincidentwasencounteredintheanalysisofvimandlibbackend.Theresultsforthesetargetswereskewedbythefactthatbothincludeaversion.csourcefilewhichhasadisproportionalnum-berofrevisionsandlineschangedincomparisontootherfiles(forlibbackendthisfilesimplystorestheversionnumberofthereleaseinastringandsimilarlyforvim).Wethereforeomittedthisfilefromouranalysis,however,thiswastheonlyspecialcircumstance.

Asexpected,examinationofthegraphsillustratesthatatalmostallpointsboththenumberofrevisionsandthetotalnumberoflinesofcodechangedarehigherforthesubsetoffileswhichcontainagreaternumberofreferencestoglobalvariables.

Theonlyinstanceswherethegraphsdeviatefromthispatternwhencontrastinglinesofcodechangedtoglobalvariablesisformakeandlibbackend.Inonlyonein-

stancedidcomparingthenumberoffilerevisionstoglobalvariableusagenotfollowthetrendwhichweenvisioned,namelyvim.Furtherexaminationoftheseoutlyingpointsprovidedsomeinsightintowhytheywerecontrarytoourhypotheses.Wefoundthatforsixoftheseventeenreleasesofmakeexamined,theaveragenumberoffilerevisionsforallofthefilescontainingaglobalvariablereferencewashigherthanthatofthefilescontainingthetop50%oftheglobalvariablereferences.Ateachofthesesixpointsasmallgroupoffiles(2–3)whicharejustoutsideofthe50%areheavilymodified.Interestingly,itisalwaysthesamesmallsetoffileswhichrequiressubstantialchanges,pos-siblyindicatingtheirimportancetothesystemorthattheyrequirecomplexmodification.Investigationofthelastthreereleasesofvimdiscoveredtheexistenceofthreefileswhichcontainzeroreferencestoaglobalvariablehowever,theywerechangedslightlyabovetheaveragenumberofrevi-sionsappliedtoallfiles.Wewereunabletoidentifyasin-glecauseforthegreaternumberoflinesofcodechangedforthefilescontainingatleastoneglobalvariablereferenceatthefourspikesinlibbackend.Weplantoexaminethisingreaterdepthinordertofindtheexactcauseofthisbehaviour.

Inanattempttotracktheevolutionofglobalvariableusagethroughouteachoftheprojectsweidentifiedthetopfivefilesandfunctionswhichcontainthegreatestnumberofreferencestoglobalvariablesineachrelease.Further-more,wealsoexaminedthefiveglobalsthatwerethemostheavilyreferencedineachproductrelease.libgdbexhib-itedtheleastamountoffluctuationwiththesamefourfiles,functionsandvariablesremaininginthesetoftopfiveoverallofthereleasesexamined.temacsandvimwerealsofoundtobequitestablewhenconsideringfilesandvari-ables.Inboth,onlyonefilewasdisplacedfromthetopfivesetwhilethreevariablesremainedheavilyreferencedintemacsandfourinvim.Greatervariationwasdisplayedinthefunctionswhichcontainedthemostglobalvariablereferences.Intemacsonlyonefunctionremainedinthetopfive,whiletwooffiveremainedfixedinvim.Aninterestingaspectofexaminingcc1andlibbackendfromGCCisthatmostofthelibbackendcodewassplitofffromcc1inrelease3.0ofGCC.Inthecreationoflibbackendthefivefilescontainingthegreatestnumberofreferencestoglobalvariableswasextractedfromcc1.Afterthesplit,thesetoffileswhichreliedmostheavilyonglobalvariablesremainedfairlyfixedwiththreefilesremaininginthetopfiveinlibbackend,andfourofthefiveincc1.Thespecificglobalvariableswhichwerereferencedmostheavilyincc1werealsothehighestusedinlibbackendandcontinuedtobeoverallreleasesexamined.Therewasgreatervariabilityexhibitedincc1withonlytwoofthetopfiveglobalvariablesremaininginthesetafterthesplit.

Table1.Thistablereportstheinitialandfinalreleasesexaminedforeachbinaryaswellasthenumberofthousandsoflinesofsourcecode(KLOC),thetotalnumberoffilesexamined,andthenumberoffilesthatcontainthegreatestnumberofreferencestoglobalvariablesthatcumulativelyaccountfor50%,and100%ofallglobalvariablereferencesrespectively.

Binarytemacscc1libbackendlibgdbmakepostgresvimRelease19.2521.42.954.1.03.04.0.35.06.56.636.811.028.135.56.4

KLOC109198232102312331442171324142355126217

TotalFiles576767217915210420816242363583947

TotalFiles50%refs10910311201215331826811

TotalFiles100%refs

5361591672133914515161772923544

Ave. File Revisions (No Br.)Ave. File Revisions (100% Refs, No Br.) Ave. File Revisions (50% Refs, Br.)32028024020016012080400Ave. File Revisions (50% Refs, No Br.)Ave. File Revisions (Br.)Ave. File Revisions (100% Refs, Br.) Average Number of File Revisions.1.2.3.3.2.2.0.2.4.1.3.1.30.14-31-33-33.03.23.33.43.43.44.04.0.95.95-3....--------22001221161302--............533335.23.03.23.33.33.43.44.04.02.92.9ReleasesFigure1.AcomparisonofthenumberofCVSfilerevisionsforcc1fromGCC.

Norm. Ave. LOC Changed (No Br.)Norm. Ave. LOC Changed (100% Refs, No Br.) Norm. Ave. LOC Changed (50% Refs, Br.) Norm. Ave. LOC Changed (50% Refs, No Br.) Norm. Ave. LOC Changed (Br.)Norm. Ave. LOC Changed (100% Refs, Br.) 640Normalized Average Number of LOC Changed560480400320240160800.1.1.2.3.3.2.2.0.2.4.1.3.1.33.04-31-33-33.03.23.33.43.43.44.04.0.95.95-...--------22001221161302--........53.3.3.3.5.23.03.23.33.33.43.44.04.02.92.9ReleasesFigure2.Acomparisonofthenormalizednumberoflineschangedbetweenreleasesofcc1fromGCC.

Ave. File Revisions (No Br.)70Ave. File Revisions (50% Refs, No Br.)Ave. File Revisions (100% Refs, No Br.) 60Average Number of File Revisions504030201001.02-6.56.5-7.07.0-7.27.2-7.47.4-8.0.08.0.0-8.0.18.0.1-8.0.78.0.7-8.0.88.0.8-8.1.08.1.0-8.1.3ReleasesFigure3.AcomparisonofthenumberofCVSfilerevisionsforpostgres.

Norm. Ave. LOC Changed (No Br.)Norm. Ave. LOC Changed (100% Refs, No Br.) Norm. Ave. LOC Changed (50% Refs, No Br.) 350Normalized Average Number of LOC Changed3002502001501005001.02-6.56.5-7.07.0-7.27.2-7.47.4-8.0.08.0.0-8.0.18.0.1-8.0.78.0.7-8.0.88.0.8-8.1.08.1.0-8.1.3ReleasesFigure4.Acomparisonofthenormalizednumberoflineschangedbetweenreleasesofpostgres.

Thesetoftopfivefilesandfunctionsremainedrelativelyconstantinbothmakeandpostgres,withthreeremain-inginthetopfiveovertheentirelifetimethatweexamined.However,themostheavilyreferencedglobalvariablesfluc-tuatedgreatly,withnoneofthetopfiveintheinitialreleaseremaininginthetopfivesetatthefinalrelease.

Althoughthegraphsappeartosubstantiatethelinkbe-tweenglobalvariableusageandmaintenanceeffort,furtherevidenceoftheconnectionisrequired.Therefore,wecal-culatedthecorrelationcoefficients(rvalues)ofbothmea-sures.Calculationofanrvalueenablesonetoevaluatethedegreeofcorrelationbetweentwoindependentvari-ables(specifically,revisionstoglobalvariablesandtotallineschangedtoglobalvariablereferences).Table2liststheresultsofcorrelatingthenumberofreferencestoglobalvariablesinafiletothenumberofrevisionscheckedintoCVS(r(Rev,Ref))andalsoforthetotallinesofcodechangedtothenumberofreferencestoglobalvariables(r(Lines,Ref)).Thecorrelationcoefficientsinboldrepre-sentinstancesofclosecorrelationbetweenthetwovariablesforanacceptableerrorrateof5%(α=0.05),however,al-mostallwerewithina1%errorrate.Strongcorrelationwasfoundbetweenbothrevisionstoreferencesandlinestoreferences.However,inallcasesthecorrelationbetweenthenumberofrevisionsandglobalvariablereferenceswascloser.Although,thisdoesnotestablishacauseandeffect

relationshipitdoesprovideevidencethatastrongrelation-shipexistsbetweentheusageofglobalvariablesandboththenumberandscopeofchangesappliedtofilesbetweenproductreleases.Furthermore,thisprovidessupportforourhypothesesthatfileswhichcontainagreaternumberofref-erencestoglobalvariablesrequiremorechanges(H3),andthatthesechangescorrespondtothemodificationofmorelinesofcode(H4).ExtrapolationfromH3andH4providesevidencefortheacceptanceofouroriginalhypothesesthatglobalvariableusagebothincreasesmaintenance(H1),andimpairscomprehension(H2).

4.ThreatstoValidity

Weshouldnotethepossiblethreatstothevalidityofourstudy.Asstatedearlier,gv-finderrequiresasuccessfulcompilationofthetargetexecutableinordertoperformitsanalysis.Intheworstcasethisrequiredcommentingouttheoffendinglinesofcode(this,however,occurredfairlyin-frequentlyandonlyforsmallcodesegments).Additionally,sincethebuildenvironmenthaschangedoverthecourseoftheprojectslifetime,wedeployedfourdifferentma-chines,eachrecreatingaspecificandolderbuildenviron-mentneededtosatisfyvariousreleases.Theuseofdiffer-entsystemsintroducedaminimalamountoferror,sinceallofthemachinesareofthesamearchitecture(x86,Linux),

Table2.Resultsofcorrelatingthenumberofre-visionsmadetoafilebetweenreleaseswiththeamountofreferencestoglobalvariableswithinthefile(r(Rev,Ref)),andforthetotalnumberoflineschangedinafiletoitsnumberofreferencestoglobalvariables(r(Lines,Ref)).Correlationcoefficientsinboldidentifyinstancesofaclosecorrelation.Nisthenumberofpairsexamined.BinaryNr(Rev,Ref)r(Lines,Ref)temacs5200.270.16cc16420.160.09libbackend28220.120.08libgdb15630.440.39make3370.420.31vim3360.330.27postgres31560.240.22andthereforeareequallyimpactedbyexternalfactorsaf-fectingthesourcecode(suchasconditionalcompilation).Althoughthisstudyexaminedawidespectrumofsoftwareproducts,alloftheprojectsareopen-source(evenfurtheral-mostallaredevelopedbyGNU)andthereforeitisnotclearthatourfindingsareapplicabletoproprietarysoftware.Inposingourhypothesesweequatedthepresenceofglobalvariablestoincreasedmaintenancecostsintheformofboththenumber,andthesizeofthechangesperformed.However,otherexplanationsarealsopossible.Forexam-ple,afilethatchangedfrequentlymightbeanarchitec-tural“hotspot”fortheadditionofnewfeatures;thusfre-quentchangesmaybeasignofsuccessfulgrowthratherthanpoordesign.Similarly,largedeltasmightmeanthatthesystem’sdesignwassufficientlyrobusttoallowfortheadditionofnewfunctionality.However,intheabsenceofawayofautomaticallycategorizingtheintentoftheindi-vidualchanges,weassumethatmostchangesaredueto“fixing”ratherthanaddingnewfeatures.

Finally,whenexaminingtheextentofthemodificationsperformedwenormalizedthedeltavaluesbythefilesize.However,wedidnotnormalizethenumberofchangestothesizeofthefile.Infutureworkweplanontakingthisintoaccountandnormalizingthenumberofchangesbytheamountofreferencestoglobalvariablesperlineofsourcecode.

5.RelatedWork

Atoolsimilartogv-finderisdescribedin[21]whichusestheoutputofobjdumptogatherglobalsymbolin-formation.WechosetoextractthedataourselvessincewealreadyhadanexistinginfrastructureforanalyzingELFob-

jectfilesandalsotoimproveefficiency.

Schachetal.[15]andlaterYuetal.[23]examinedglobalvariableusageintheLinuxkernel.Theirinitialworkin[15]discoveredthatslightlymorethanhalfofallmodulesexaminedsufferedfromsomeformofclandestinecoupling.Thelatterworkin[23]continuedtheexaminationofclan-destinecouplingbetweenkernelandnon-kernelmodulesinLinux.Applyingdefinition-useanalysisfromcompilerthe-ory[13],theyidentifiedallmoduleswhichdefined(wrote)aglobalvariableandtheotherswhichreferenced(read)eachglobal.Theyfoundthatalargenumberofglobalvariablesaredefinedinnon-kernelmodulesandarereferencedinakernelmodule.Giventhelackofcontrolovernon-kernelmodulesbykerneldevelopers[15,23]raisedconcernsoverthelongevityofLinux,suggestingthatmaintainabilityis-suesmightarisegiventhecommoncouplingfoundtoexistbetweenkernelandnon-kernelmodules.However,theanal-ysisbasedsimplyonthebulknumberofdefinitionsandusesmightbemisleading.Amoreconclusiveexaminationcouldusedefinition-usechains[13].Def-usechainsconnectusesofavariablewiththeirexactpointofdefinition.Usingacodeanalysistooltoconstructthedef-usechains,wecouldthenidentifythechainswhichareformedfromthedefini-tionofavariableinanon-kernelmoduleandthenlaterusedinakernelmodule.

Theapplicationofdataminingtovariousartifactsofthesoftwaredevelopmentprocesstodiscoveranddirectevolu-tionpatternshasrecentlyreceivedextensivetreatment,mostnotablyin[4,6,5,24].AcommonmeasureofsoftwarechangethroughoutmuchofthisresearchisbaseduponthenumberofCVSupdatestoafile(CVSreleasenumbers)andthetotallinesofcodechangedbetweenreleases.

Eppingetal.[3]examinedtheconnectionbetweenver-tical(specification)andhorizontal(inter-module)designcomplexitiesandmaintainability(change)effortduringtheacceptanceandmaintenancephasesoftwoFORTRANsys-tems.Specifically,inregardstoglobalvariableusagetheyexaminedthenumberofglobalsdefined,theactualnumberofglobalsreferencedandmaintainability,whichischarac-terizedbychangeeffort.Thechangeeffortmetricwasfur-thercategorizedasbeingisolationeffort(identifyingwhichmodulesrequiremodification),implementationeffort(de-velop,programandtestthechange)orlocality(thenumberofmodulesalsorequiringmodification).Additionally,thesubsetofallthetasksperformedduringthemaintenancephasewhichwerebugfixeswasidentified.Resultsforallchanges(bugandenhancement)inthemaintenancephaseindicatedacorrelationbetweenchangeisolationandtoboththenumberofglobalvariablesandtheamountofreferencestoglobals.However,nolinkwasfoundtoexistinimple-mentationeffortorlocality.Whenfocusingstrictlyuponmaintenancephasebugfixes,bothchangeisolationandim-plementationeffortwerefoundtocorrelatetotheusageof

globalvariables.

HarrisonandWalton[7]appliedasimilarmetricformaintainabilityasinthisstudytoalargenumberofsmalllegacyFORTRANprogramsminingthreeyearsofCVSdata.Themeasuresexaminedincludedlinesofcodeandstruc-turalcomplexity(numberofGOTOstatementsandcyclo-maticcomplexity).Theirfindingsindicatedthatlinesofcodeofferedonlyminorinsightintofuturemaintenancecostswhilenocorrelationbetweenanyofthestructuralcharacteristicsoftheprogramsandmaintenancecostswerefoundtoexist.Incontrastto[3]and[7],ouranalysisisbaseduponamuchlargerdatasetencompassingmanyre-leasesofsevenlargesystems,differentmeasuresofmain-tainabilityeffortareusedandalsothedifferingsemanticsofglobaldatainFORTRANcomparedtoC.

Zimmermannetal.[24]applieddataminingtoCVSrepositoriesinordertodeterminevarioussourcecompo-nents(forexample,files,functionsandvariables)whichareconsistentlychangedinunison.IntegrationoftheirtoolintoanIDEenabledthemtosuggest,withareasonabledegreeofaccuracy,otherpartsofthecodewhichmightneedtobemodifiedgivenachangetoanelementinwhichithasbeendeterminedtohavebeenchangedtogetherinthepast.Simi-larworkappearedin[6],howevertheirworkfocusedonthehigher-levelgranularityofclasses.

Itiscommonlybelievedthatbyemployingautomaticcodegeneratorsandpackagedlibraries,theinitialsoftwaredevelopmentcostscouldbedecreasedandthisreductionofeffortwouldcontinueintothelattermaintenancephaseofaproject.Banker,DavisandSlaughter[1]examinedhowtheuseoftheseaffectedsoftwarecomplexity,whichinturnin-creasesthedifficultyinperformingmaintenancetasks.Thisperceptionwasconfirmedfortheuseofpackagedlibrariesfortheirsample(theyexaminedtheapplicationof29per-fectivemaintenancetasksto23COBOLprograms).How-ever,contrarytointuition,theuseofautomaticcodegen-eratorsactuallyleadtoanincreaseintheamountoftimespentonmaintenancetasks.Thisisaninterestingresultinconsiderationoftheprojectsthatwereexaminedinthisstudy.Thedatacollectedforcc1waslimitedtothe“handwritten”coderatherthantheextensiveamountofautomat-icallygeneratedcode.Comparisonoftheusageofglobalvariablesintheauto-generatedcodetothatofhand-writtencodeandisolationofwhichpartofthecodeismodifiedcouldbeanotherapproachtoinvestigatingthiscontraryre-sult.

6.Conclusions

Inthispaperweexaminedthelinkbetweentheuseofglobalvariablesandsoftwaremaintenanceeffort.Harness-inginformationextractedfromCVSrepositories,weex-aminedthislinkforsevenlargeopensourceprojects.We

proposedtwomeasuresofsoftwaremaintenance;specifi-cally,thenumberofrevisionsmadetoafileandthetotallinesofcodechangedbetweentworeleases.Examinationofthegraphsillustratedthatatalmostallpointsboththenumberofrevisionsandthetotalnumberoflinesofcodechangedwerehigherforthesubsetoffileswhichcontainedagreaternumberofreferencestoglobalvariables.Furtherinvestigationusingstatisticalanalysisrevealedastrongcor-relationbetweenboththenumberofrevisionstoglobalvari-ablereferencesandlinesofcodechangedtoglobalvariablereferences.However,inallcasesthecorrelationbetweenthenumberofrevisionsandglobalvariablereferenceswasstronger.Althoughthisdoesnotestablishacauseandef-fectrelationship,itdoesprovideevidencethatastrongre-lationshipexistsbetweentheusageofglobalvariablesandboththenumberandscopeofchangesappliedtofilebe-tweenproductreleases.Furthermore,theresultingcorrela-tionsoffersupportforourhypothesesthatglobalvariableusagereducesmaintainabilityandimpairscomprehension.Theseresultssuggestthattheuseofglobalvariablesshouldbeavoidedwhenpossible,therebyimprovingtheabilityofsoftwaretoageandsuccessfullyevolveovertime.

References

[1]R.D.Banker,G.B.Davis,andS.A.Slaughter.Soft-waredevelopmentpractices,softwarecomplexity,andsoft-waremaintenanceperformance:afieldstudy.Manage.Sci.,[2]44(4):433–450,P.Cederqvist.Version1998.

ManagementwithCVS,2005.Avail-[3]ableA.Eppingathttp://ximbiot.com/cvs/manualandC.Lott.Doessoftwaredesigncomplexity.

af-fectmaintenanceeffort?InProceedingsoftheNASA/GSFC19thAnnualSoftwareEngineeringWorkshop.SoftwareEn-gineeringLaboratory:NASAGoddardSpaceFlightCenter,[4]1994.

M.FischerandH.Gall.Visualizingfeatureevolutionof

large-scalesoftwarebasedonproblemandmodificationre-portdata.JournalofSoftwareMaintenanceandEvolution:[5]ResearchM.Fischer,andJ.PracticeOberleitner,,16:385–403,J.Ratzinger,NovemberandH.Gall.2004.

Mining

evolutiondataofaproductfamily.SIGSOFTSoftw.Eng.[6]NotesH.Gall,,30(4):1–5,M.Jazayeri,2005.

andJ.Krajewski.CVSreleasehistory

datafordetectinglogicalcouplings.InIWPSE’03:Pro-ceedingsofthe6thInternationalWorkshoponPrinciplesofSoftwareEvolution,page13,Washington,DC,USA,2003.[7]IEEEM.S.ComputerHarrisonandSociety.

G.H.Walton.Identifyinghighmain-tenancelegacysoftware.JournalofSoftwareMaintenance,[8]14(6):429–446,A.HuntandD.2002.

Thomas.Thepragmaticprogrammer:from

journeymantomaster.Addison-WesleyLongmanPublish-[9]ingC.F.Co.,KemererInc.,Boston,andS.A.MA,Slaughter.USA,1999.

Determinantsofsoftware

maintenanceprofiles:anempiricalinvestigation.JournalofSoftwareMaintenance,9(4):235–251,1997.

[10]B.P.LientzandE.B.Swanson.SoftwareMaintenanceMan-agement.Addison-WesleyLongmanPublishingCo.,Inc.,Boston,MA,USA,1980.

[11]J.MartinandC.L.McClure.SoftwareMaintenance:The

ProblemsandItsSolutions.PrenticeHallProfessionalTech-nicalReference,1983.

[12]S.McConnell.Codecomplete:apracticalhandbookofsoft-wareconstruction.MicrosoftPress,Redmond,WA,USA,secondedition,2004.

[13]S.S.Muchnick.Advancedcompilerdesignandimplementa-tion.MorganKaufmannPublishersInc.,SanFrancisco,CA,USA,1997.

[14]F.P.RuffellandJ.W.A.Selby.Thepervasivenessof

globaldatainevolvingsoftwaresystems.InL.BaresiandR.Heckel,editors,FASE,volume3922ofLectureNotesinComputerScience,pages396–410.Springer,2006.

[15]S.R.Schach,B.Jin,D.R.Wright,G.Z.Heller,andJ.Offutt.

Qualityimpactsofclandestinecommoncoupling.SoftwareQualityControl,11(3):211–218,2003.

[16]S.R.SchachandA.J.Offutt.Onthenon-maintainabilityof

open-sourcesoftwarepositionpaper.2ndWorkshoponOpenSourceSoftwareEngineering,May2002.

[17]R.M.Stallman.GNUEMACSManual.FreeSoftwareFoun-dation,2000.

[18]R.M.Stallman,R.McGrath,andP.D.Smith.GNUMake:A

ProgramforDirectingRecompilation.FreeSoftwareFoun-dation,2004.

[19]R.M.Stallman,R.Pesch,andS.Shebs.Debuggingwith

GDB:TheGNUSource-LevelDebugger.FreeSoftwareFoundation,2002.

[20]R.M.StallmanandtheGCCDeveloperCommunity.Us-ingGCC:TheGNUCompilerCollectionReferenceManual.FreeSoftwareFoundation,2003.

[21]H.S.TeohandD.B.Wortman.Toolsforextractingsoftware

structurefromcompiledprograms.InICSM’04:Proceed-ingsofthe20thIEEEInternationalConferenceonSoftwareMaintenance,page526,Washington,DC,USA,2004.IEEEComputerSociety.

[22]J.v.Vliet.SoftwareEngineering–PrinciplesandPractice.

JohnWiley&Sons,NewYork,NewYork,USA,2ndedition,2000.

[23]L.YuandK.Chen.Categorizationofcommoncoupling

anditsapplicationtothemaintainabilityoftheLinuxker-nel.IEEETrans.SoftwareEng.,30(10):694–706,2004.[24]T.Zimmermann,P.Weisgerber,S.Diehl,andA.Zeller.Min-ingversionhistoriestoguidesoftwarechanges.InICSE’04:Proceedingsofthe26thInternationalConferenceonSoft-wareEngineering,pages563–572,Washington,DC,USA,2004.IEEEComputerSociety.

因篇幅问题不能全部显示,请点此查看更多更全内容

Top