{"id":3091,"date":"2018-11-21T12:10:26","date_gmt":"2018-11-21T12:10:26","guid":{"rendered":"https:\/\/sandbox.weareadaptive.com\/?p=3091\/"},"modified":"2020-08-13T15:32:07","modified_gmt":"2020-08-13T14:32:07","slug":"using-antlr-parse-calculate-expressions-part","status":"publish","type":"post","link":"https:\/\/sandbox.weareadaptive.com\/fr\/2018\/11\/21\/using-antlr-parse-calculate-expressions-part\/","title":{"rendered":"Using ANTLR to parse and calculate expressions (Part I)"},"content":{"rendered":"<p>In an upcoming series of blog posts, we are going to talk about how we have developed and integrated a simple Domain Specific Language using ANTLR4 with some of our Visual Studio projects on .NET and C#. We will also show how we\u2019ve used the generated code to evaluate expressions at runtime for various mathematical calculations.<\/p>\n<p>In this first entry, we\u2019ll discuss the initial steps, such as creating a grammar, visualizing the parsed expression tree and moving on to code generation as well as its inclusion in the .NET projects. We\u2019ll take advantage of some of the features in the new C# project structure (csproj) that comes with Visual Studio 2017 to ensure that the latest version of our grammar is always parsed, the code generated and included in our project structure.<\/p>\n<p>In later posts, we\u2019ll see how we can stream updates to a client, whenever a component\u2019s value is updated. Using EventStore and Reactive Extensions we can make it a push based model. But more on that later.<\/p>\n<p>For the uninitiated ANTLR or \u201cANother Tool for Language Recognition\u201d can be used (among other things) to build languages. More info on the <a href=\"https:\/\/www.antlr.org\" target=\"_blank\" rel=\"noopener noreferrer\">ANTLR4 official website<\/a>.<\/p>\n<h2>The Grammar<\/h2>\n<p>The first step is understanding the problem and writing a simple grammar to solve it. We need a way to parse custom expressions or \u2018formulas\u2019 that are allowed, parsed and evaluated at runtime once we have all the necessary information. Some of the variables in our grammar can be constant values, some are fed into our application by an external source, and at \u2018evaluation\u2019 time, we substitute them and calculate a result.<br \/>\nAn example of this could be the following:<\/p>\n<blockquote><p><strong>FXRate(\u2018EURUSD\u2019) * UomConvert(\u2018ST\u2019,\u2019MT\u2019) * 100<\/strong><br \/>\n<em>In this formula, we need the EURUSD foreign exchange rate, the conversion factor between Short Tons and Metric Tons and finally, we multiply that by 100.<\/em><\/p><\/blockquote>\n<p>It is a simple example, but it represents the basic operations that we need to support.<br \/>\nParsing can be done with Regex, however, the resulting pattern to accommodate the requirements would be complex and difficult to understand, not to mention that it could also be more error-prone. A better solution should be something testable and extensible, as well as easy to understand to a newcomer. This is where ANTLR can be beneficial. Having a grammar that defines all our supported operations makes the code more readable and more maintainable than a very complex regular expression.<br \/>\nWe start with a grammar file that then gets fed into the ANTLR binary and the necessary classes are generated, to successfully work with the operations we want, in the form of C# classes.<\/p>\n<p>The grammar file defines every single element of our language. Starting from what is considered the building blocks (a digit, number, alphanumerical characters etc.) to the functions we want to support and finally the full set of allowed operations.<br \/>\nFor the purposes of this post, we will define a simple set of rules and some basic expressions that our language will allow to parse the above expression:<\/p>\n<pre>grammar MyGrammar;\r\n\/* * Parser Rules *\/\r\nnumber: INT | FLOAT;\r\nfromUomCode: NAME | IDENTIFIER;\r\ntoUomCode: NAME | IDENTIFIER;\r\nfxRateFunc: \u2018FXRate\u2019 \u2018(\u2018currencyPair\u2019)\u2019;\r\ncurrencyPair: NAME;\r\nuomConvertFunc: \u2018UomConvert\u2019 \u2018(\u2018fromUomCode \u2018,\u2019 toUomCode \u2018)\u2019;\r\nexpr: expr op=(MUL | DIV) expr #mulDiv\r\n| expr op=(ADD | SUB) expr #addSub\r\n| number | \u2018(\u2018expr\u2019)\u2019 #num\r\n| fxRateFunc#fxRate\r\n| uomConvertFunc #uomFactor\r\n;\r\n\/* * Lexer Rules *\/\r\nfragment DIGIT: [0 \u2013 9];\r\nfragment LETTER: [a \u2013 zA \u2013 Z];\r\nINT: DIGIT +;\r\nFLOAT: DIGIT + \u2018.\u2019\r\nDIGIT +;\r\nSTRING_LITERAL: \u2018\\\u201d.* ? \u2018\\\u201d;\r\nNAME: LETTER(LETTER | DIGIT) * ;\r\nIDENTIFIER: [a-zA-Z0-9]+;\r\nMUL: \u2018*\u2019;\r\nDIV: \u2018\/\u2019;\r\nADD: \u2018+\u2019;\r\nSUB: \u2018-\u2018;\r\nWS: [\\t\\r\\n]+ -&gt; skip;\r\n<\/pre>\n<p>Without needing to go into too much detail, we can see the basic components of our language. Lexer Rules define what a Digit, Letter, and Integer are. Other structures like Float, Name, String literal etc. are composed by combining primitive types, and basic arithmetic operators are also specified. As we can see, Lexer rules defining digits, letters and even string literals are very similar to Regex (in fact, it is all regex underneath). Meanwhile, ANTLR is keeping us away from the more complex regular expressions that are happening under the hood.<\/p>\n<p>The more interesting part of the grammar comes from the parser rules. These are defining the structure of the operations that we want to support in our parser. In our case, we want to support the following:<\/p>\n<ul>\n<li>UomConvert(From, To). This expression is meant to receive two units of measure and will (given all the information it needs) convert the \u2018From\u2019 unit of measure to the \u2018To\u2019 unit of measure, by implementing the necessary code in C#. We\u2019ll go into more details on the next post of this serie.<\/li>\n<li>FXRate(\u2018currencyPair\u2019): This will perform a currency conversion, given the FXRates we need.<\/li>\n<li>IDENTIFIER: As it is possible to have a unit of measure like Bushels.56, the Identifier is used to define one of the possible parameters to the UomConvert function. Therefore, we have the fromUomCode defined as \u201cNAME | IDENTIFIER\u201d. Note that \u2018currencyPair\u2019 is just a name because so far, we have no currencies that contain numbers in them.<\/li>\n<li>Number: This can be either an Integer or a floating-point number, hence the \u201cINT | FLOAT\u201d definition in our parser rule.<\/li>\n<\/ul>\n<p>Finally, as we want to support not just the above two expression, but any valid mathematical operator with said formulas, we will create the \u2018expr\u2019 rule and recursively allow all the valid combinations by having:<\/p>\n<pre>expr: expr op=(MUL | DIV) expr #mulDiv\r\n| expr op=(ADD | SUB) expr  #addSub\r\n| number | '('expr')' #num\r\n| fxRateFunc #fxRate\r\n| uomConvertFunc\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0#uomFactor\r\n;\r\n<\/pre>\n<p>When we implement all these functions later in our C# code, we will have to specify what to do for each case as we\u2019re visiting the parse tree.<\/p>\n<p>The file is saved with a .g4 extension and we\u2019re ready to use it.<\/p>\n<p>Note: the # operator used here is to supply alternative names for the functions we will use later. We will get a more in depth look on this once we get to using the generated code in C#, on an upcoming blog post.<\/p>\n<h2>A look at the tree<\/h2>\n<p>Once we have our grammar set up, there are a few ways to visualize what is happening behind the scenes. I found the most convenient way is to set up the ANTLR plugin for Visual Studio Code which can be installed from the Marketplace.<\/p>\n<p>After creating a launch configuration for Visual Studio Code, we\u2019ll have all we need to be able to parse, generate and visualize our grammar\u2019s parse tree.<\/p>\n<p>Here is a launch configuration we can use for VSCode:<\/p>\n<pre class=\"\">{\r\n \u00a0\u00a0\u00a0\u00a0\u00a0 \"version\":\"2.0.0\",\r\n \u00a0\u00a0\u00a0\u00a0\u00a0 \"configurations\":[\r\n \u00a0\u00a0\u00a0 \u00a0 {\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \"name\":\"antlr4-MyGrammar\",\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \"type\":\"antlr-debug\",\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \"request\":\"launch\",\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \"input\":\"input.txt\",\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \"grammar\":\"MyGrammar.g4\",\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \"startRule\":\"expr\",\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \"printParseTree\":true,\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \"visualParseTree\":true\r\n \u00a0\u00a0\u00a0 \u00a0 }\r\n \u00a0\u00a0\u00a0\u00a0\u00a0 ]\r\n }<\/pre>\n<p>To test the above grammar file, we\u2019ll create a simple input text file with the following:<\/p>\n<ol>\n<li>UomConvert(MT, ST)<\/li>\n<\/ol>\n<p>The generated parse tree looks as follows:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/sandbox.weareadaptive.com\/wp-content\/uploads\/2018\/11\/001-1024x497.png\" alt=\"\" width=\"1024\" height=\"497\" \/><\/p>\n<p>Other more complex expressions can be parsed:<\/p>\n<ol>\n<li>2 \/ UomConvert(MT, ST) * FXRate(EURUSD)<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/sandbox.weareadaptive.com\/wp-content\/uploads\/2018\/11\/002-1024x506.png\" alt=\"\" width=\"1024\" height=\"506\" \/><\/p>\n<p>Generating C# we can use\u2026<br \/>\nThe latest version of a Visual Studio C# project file has been massively simplified by Microsoft. Not only is the file easier to understand, but editing and changing things is much quicker without needing to unload and reload the project. It all just works on the fly.<br \/>\nTo have a consistent working set of generated C# classes, and to avoid any issues in any possible development environment as well as the CI\/CD pipeline, we wanted to have the following steps as part of the build:<br \/>\nNOTE: Generated files go under the $ProjectDir\\Expressions\\Generated folder<\/p>\n<ol>\n<li>Delete all the previously existing *.cs files in the Generated folder.<\/li>\n<li>Delete the Generated folder<\/li>\n<li>Call ANTLR4 binary using java -jar as a pre-build step and setting it to output all files to the directory<\/li>\n<li>Include *.cs under Generated\\<\/li>\n<li>Compile!<\/li>\n<li>Profit?<\/li>\n<\/ol>\n<p>To do the above steps, we used the following project file: (comments provided in each line about what it\u2019s doing)<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"default\" data-enlighter-title=\"\">&lt;Project Sdk=&quot;Microsoft.NET.Sdk&quot; ToolsVersion=&quot;15.0&quot;&gt;\r\n  &lt;PropertyGroup&gt;\r\n    &lt;OutputType&gt;Exe&lt;\/OutputType&gt;\r\n    &lt;TargetFramework&gt;net471&lt;\/TargetFramework&gt;\r\n  &lt;\/PropertyGroup&gt;\r\n\r\n  &lt;ItemGroup&gt;\r\n    &lt;!-- Include Antlr4.Runtime --&gt;\r\n    &lt;PackageReference Include=&quot;Antlr4.Runtime.Standard&quot; Version=&quot;4.7.1.1&quot; \/&gt;\r\n  &lt;\/ItemGroup&gt;\r\n\r\n  &lt;Target Name=&quot;PreBuild&quot; BeforeTargets=&quot;PreBuildEvent&quot;&gt;\r\n    &lt;ItemGroup&gt;\r\n      &lt;!-- Use Compile Remove to delete generated files --&gt;\r\n      &lt;Compile Remove=&quot;Expressions\\Generated\\*.cs&quot; \/&gt;\r\n    &lt;\/ItemGroup&gt;\r\n    &lt;!-- Remove Generated dir.--&gt;\r\n    &lt;RemoveDir Directories=&quot;$(ProjectDir)Expressions\\Generated&quot; \/&gt;\r\n    &lt;!-- Run ANTLR on the grammar --&gt;\r\n    &lt;Exec Command=&quot;java -jar $(SolutionDir)\\tools\\antlr-4.7.1-complete.jar $(ProjectDir)Expressions\\Grammar\\MyGrammar.g4 -o $(ProjectDir)Expressions\\Generated -Dlanguage=CSharp -no-listener -visitor -package $(ProjectName).Expressions.Generated&quot; \/&gt;\r\n    &lt;ItemGroup&gt;\r\n      &lt;!-- Include generated C# files --&gt;\r\n      &lt;Compile Include=&quot;Expressions\\Generated\\*.cs&quot; \/&gt;\r\n    &lt;\/ItemGroup&gt;\r\n  &lt;\/Target&gt;\r\n&lt;\/Project&gt;<\/pre>\n<p>As a prerequisite of this step, we need the ANTLR4 binary. We\u2019ve decided to use the Java version of ATLR4 even though there\u2019s a C# port which also works. Under the Solution root directory, we\u2019ve created a \u201ctools\u201d folder which contains the Antlr4 java binary. Java needs to be installed on the build servers as well for this step to work on CI\/CD toolchains.<\/p>\n<p>Let\u2019s have a look at some of the command line arguments for the ANTLR4 step that we used.<\/p>\n<ul>\n<li>$(ProjectDir)Expressions\\Grammar\\MyGrammar.g4: this is the path to our grammar file.<\/li>\n<li>-o: specifies the output directory.<\/li>\n<li>Dlanguage=CSharp: because that\u2019s the beauty of ANTLR, it generates a C# visitor class structure (even generating interfaces and abstract classes that we can extend)<\/li>\n<li>-no-listener: don\u2019t generate the parse tree listener. We don\u2019t really need it for what we want to do, and it is enabled by default.<\/li>\n<li>-visitor: generates the tree visitor. This will allow us to implement our behavior later as the tree is visited in our code and we can perform the proper actions. We\u2019ll need to get our information from external sources to substitute variables into actual values.<\/li>\n<li>-package: specify a package\/namespace for our code.<\/li>\n<\/ul>\n<p>After the build succeeds we should have all the output files added to our solution automatically.<\/p>\n<h2>Summary<\/h2>\n<p>This was a learning process. We realized that Regular Expressions could get tricky, if there was ever a need to add more variations to the required inputs, the expressions could get extremely hard to follow. We preferred to investigate and spend some time learning how ANTLR works and what benefits we could get from it.<\/p>\n<p>Turns out, it\u2019s not very difficult to get it up and running. So far, we\u2019ve been using this solution (or one very similar to it) and it has proven that the code can be very extensible. It might occasionally need some tweaking here and there, but our grammar files have stayed the same since we first tried this approach. The grammar really is the main driver for all of this. Getting it right in the beginning can help avoid a lot of headaches and it will rarely ever need to be changed (unless requirements change, of course).<br \/>\nA copy of the code can be found on our GitHub repository with the details of this blog post.<\/p>\n<p>The second part of this serie will focus more on the generated files and how we use them. The Visitor pattern is the main driver for the next part of the process.<\/p>\n<p>Some useful links:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/AdaptiveConsulting\/Blog\/tree\/master\/AntlrExpressions\" rel=\"noopener\">https:\/\/github.com\/AdaptiveConsulting\/Blog\/tree\/master\/AntlrExpressions<\/a><\/li>\n<li><a href=\"http:\/\/www.antlr.org\/\" rel=\"noopener\">http:\/\/www.antlr.org\/<\/a><\/li>\n<li><a href=\"https:\/\/marketplace.visualstudio.com\/items?itemName=mike-lischke.vscode-antlr4\" rel=\"noopener\">https:\/\/marketplace.visualstudio.com\/items?itemName=mike-lischke.vscode-antlr4<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Visitor_pattern\" rel=\"noopener\">https:\/\/en.wikipedia.org\/wiki\/Visitor_pattern<\/a><\/li>\n<\/ul>\n<h2>Keep reading:<\/h2>\n<p><span class=\"btn-flat\"><a href=\"https:\/\/sandbox.weareadaptive.com\/2019\/03\/29\/antlr4-expression-parsing-part-2\/\" target=\"_blank\" rel=\"noopener noreferrer\">Antlr4 and expression parsing (Part 2)<\/a><\/span><\/p>\n<h2>Author<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-3163\" src=\"https:\/\/sandbox.weareadaptive.com\/wp-content\/uploads\/2019\/03\/Carlos.png\" alt=\"\" width=\"248\" height=\"244\" srcset=\"https:\/\/sandbox.weareadaptive.com\/wp-content\/uploads\/2019\/03\/Carlos.png 464w, https:\/\/sandbox.weareadaptive.com\/wp-content\/uploads\/2019\/03\/Carlos-300x296.png 300w\" sizes=\"(max-width: 248px) 100vw, 248px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Carlos Fernandez<\/strong><\/p>\n<p>Senior Software Engineer, Adaptive Barcelona<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In an upcoming series of blog posts, we are going to talk about how we have developed and integrated a &#8230;<\/p>\n","protected":false},"author":24,"featured_media":3065,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-3091","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"_links":{"self":[{"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/posts\/3091"}],"collection":[{"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/users\/24"}],"replies":[{"embeddable":true,"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/comments?post=3091"}],"version-history":[{"count":0,"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/posts\/3091\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/media\/3065"}],"wp:attachment":[{"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/media?parent=3091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/categories?post=3091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sandbox.weareadaptive.com\/fr\/wp-json\/wp\/v2\/tags?post=3091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}