Blogpost

Machine learning on source code

June 8, 2018
10:31 am
.data, AI, Conferences

Machine learning, AI… het is niet meer weg te denken uit ons dagdagelijks leven en al zeker niet meer uit onze toekomst. Software Crafters van Continuum waren onlangs aanwezig op dotAI. dot conferences streven het niveau van TED na in verscheidene disciplines zoals java, swift en dus ook Artificiële Intelligentie.
Op dot.ai waren zeer ervaren sprekers (e.g. Amazon, Google) aanwezig in een geweldige setting in een van de mooiste steden van Europa, namelijk Parijs. Op zo’n event konden wij uiteraard niet ontbreken. Onze consultants ter plaatse brengen verslag uit over de verschillende talks. In deze post is Dieter aan de beurt.

Dieter’s favoriet

Vadim Markovtsev is een google developer expert in machine learning en de lead machine learning engineer bij source{d}.
Dit bedrijf bouwt open-source componenten die code analyse en machine learning bouwen op basis van source code.
Op dot.ai kwam hij hier meer over vertellen.

Het probleem…

De laatste jaren zijn de code editors steeds beter geworden. Dit is een van de grootste beweegredenen tussen editors (e.g. Sublime Text, Intellij). Deze editors doen erg veel voor ons, waaronder het automatisch aanvullen van woorden. Dit levert een enorme tijdswinst op voor zowat elke developer, tenzij deze slechte hints geeft. Hoewel veel developers graag hebben dat ze geholpen worden, storen we ons nog steeds aan typfouten die bewaard worden of dat de editors er niet in slagen om ‘get’ als suggestie te geven zodra we de letter ‘g’ typen zoals in volgend voorbeeld.

Machine learning op Source Code als oplossing?

Machine Learning on source Code (#MLOnCode) kan een oplossing bieden voor deze slechte predicties. Vaak wordt er zelfs helemaal geen voorspelling gedaan en als er al eentje gedaan wordt, beperkt zich dit vaak tot suggesties met betrekking op de geschreven source code in het project. Het grote probleem is het ontbreken van trainingsdata.
Net hiervoor heeft Vadim Markovtsev een zeer elegante oplossing.
Ze gebruiken hiervoor alle beschikbare, relevante code op github. Voorlopig is dit nog steeds de grootste bron van open source code (afhankelijk van de reactie van de markt op de overname van Microsoft).

Is het dat dan?

Het stopt niet bij het voorspellen van methodes, klassen of variabelen. Uit al deze github data kan men ook best practices afleiden. Als we deze ervaren spreker mogen geloven, staat ons niets in de weg om in de toekomst code reviews door computers te laten doen. Vaak komen hier steeds dezelfde (al dan niet terechte) opmerkingen naar boven.
Het kan dus statische code analyses aanvullen met zeer menselijke constructieve feedback en het leven van developers aangenamer maken. Dit sluit zeer goed aan bij een gezonde visie op wat Machine Learning kan betekenen voor ieder van ons in de toekomst. We moeten het zien als een service, waarbij we repetitieve dingen kunnen delegeren naar computers in de plaats van de mens.

Ask a programmer to review 10 lines of code, he’ll find 10 issues.
Ask him to do 500 lines and he’ll say it looks good.

Sluit jij je aan bij deze visie of ga je graag met ons in debat?
Of misschien is werken bij Continuum iets voor jou! Check onze vacatures!

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Analytics" category.
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Necessary" category.
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie stores user consent for cookies in the category "Others".
cookielawinfo-checkbox-preferences	1 year	CookieYes set this cookie to record the user consent for the cookies in the category "Functional".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	The website's WordPress theme uses this cookie. It allows the website owner to implement or change the website's content in real-time.
viewed_cookie_policy	1 year	The GDPR Cookie Consent plugin sets the cookie to store whether or not the user has consented to use cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_UA-*	1 minute	Google Analytics sets this cookie for user behaviour tracking.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
ln_or	1 day	Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.

Cookie	Duration	Description
_rdt_uuid	3 months	Reddit sets this cookie to build a profile of your interests and show you relevant ads.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.