Web Scraping Tools

Puppeteer

  • Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium.
  • Puppeteer works faster than Selenium.
  • Puppeteer comes with built-in ChromeDriver. It is easy to use.
  • Puppeteer can screenshot a web page and save as PDF.
  • Home page: https://pptr.dev

Delphi: Web Application Development

Sencha Ext JS is made available under Commercial License or the GNU General Public License version 3 (GPLv3). The Commercial License requires the payment of a fee for each Designated User (i.e. developer). If you choose not to pay a fee and use the GPLv3 license, you are required to release the source code of any program that you distribute that uses Ext JS. If you choose to pay for a Commercial License, you are not required to disclose your source code.

Sencha

  • Based on ExtJS
  • Idera/Embarcadero bought it

UniGUI

  • Based on ExtJS

Kitto

  • Open Source
  • Kitto1 (Based on ExtJS 3)
  • Kitto2 (Based on ExtJS 6)
  • Kitto3 (Based on ExtJS 7)

RadPHP

  • Discontinued
  • Final Release: XE2 / 2011
  • Formerly: Delphi for PHP

RemObjects Elements

  • Oxygene (Object Pascal)

Delphi: HtmlToColor and ColorToHtml

function ColorToHtml(Color: TColor): string;
begin
  Result := IntToHex(Color, 6);
  Result := '#' + Copy(Result, 5, 2) + Copy(Result, 3, 2) + Copy(Result, 1, 2);
end;

function HtmlToColor(Color: string): TColor;
begin
  Result := StringToColor('$' + Copy(Color, 6, 2) + Copy(Color, 4, 2) + Copy(Color, 2, 2));
end;

Delphi: Long Path in Windows

Trying to access very long filenames (including path) is not allowed in Windows.

Default maximum filename (including path) is 260 chars.

How to use very long filenames in Delphi

function FixLongPath(Filename: string): string;
begin
  if Filename.StartsWith('\\')=false then // network share!
  begin
    Filename:='\\?\'+Filename; // allows very long path
  end;
  Result:=Filename;
end;

var
  Filename: string;
begin
  Filename:=FixLongPath('C:\Looooooong\looooooong.txt');
  ...
end;

Tested: Works in Windows XP to Windows 11

Known Limitations

  • Windows XP doesn’t support it in ShellExecute.

Delphi: Skia4Delphi Tips

  • TPicture allows many types (PNG, JPG, BMP) (and also WebP if you add the Skia.Vcl.pas, unit Skia.Vcl register WebP and SVG decoders to TPicture)

Resize Image

uses
  System.UITypes, Skia;

function GetResizedImage(const AImage: ISkImage; const ANewWidth, ANewHeight: Integer): ISkImage;
var
  LSurface: ISkSurface;
begin
  LSurface := TSkSurface.MakeRaster(ANewWidth, ANewHeight);
  LSurface.Canvas.Clear(TAlphaColors.Null);
  LSurface.Canvas.Scale(ANewWidth / AImage.Width, ANewHeight / AImage.Height);
  LSurface.Canvas.DrawImage(AImage, 0, 0, TSkSamplingOptions.High);
  Result := LSurface.MakeImageSnapshot;
end;

procedure TForm1.FormCreate(Sender: TObject);
var
  LImage: ISkImage;
begin
  LImage := TSkImage.MakeFromEncodedFile('a.png');
  LImage := GetResizedImage(LImage, 24, 24);
  LImage.EncodeToFile('a.png', 100);
end;

Convert PNG to Grayscale

uses
  System.UITypes, Skia;

procedure TForm1.FormCreate(Sender: TObject);
var
  LImage: ISkImage;
  LSurface: ISkSurface;
  LPaint: ISkPaint;
begin
  LImage := TSkImage.MakeFromEncodedFile('color.png');
  LPaint := TSkPaint.Create;
  LPaint.ColorFilter := TSkColorFilter.MakeHighContrast(TSkHighContrastConfig.Create(True, TSkContrastInvertStyle.NoInvert, 0));
  LSurface := TSkSurface.MakeRaster(LImage.Width, LImage.Height);
  LSurface.Canvas.Clear(TAlphaColors.Null);
  LSurface.Canvas.DrawImage(LImage, 0, 0, LPaint);
  LSurface.MakeImageSnapshot.EncodeToFile('grayscale.png');
end;

Delphi: SynEdit Tips

TurboPack SynEdit has now moved to text painting using DirectWrite. DirectWrite is much better than GDI at unicode. DirectWrite doesn’t support Windows XP. Latest GDI based version: https://github.com/TurboPack/SynEdit/releases/tag/LastGDIbased

Delphi: Assert

Assert is a procedure that should be used only in Debug builds!

Because it displays the current .pas file and line number!!!

How to use Assert in Delphi?

begin
  assert(myvar > 0, 'ATTENTION! The variable is not bigger than zero!');
end;

How to disable assertions for the entire project?

Use $C- from the command line, or configure it in ‘Project->Options->Compiler’ from the IDE (which configures it in the .dproj file).

Assert directives in Delphi

Syntax{$C+} or {$C-} {$ASSERTIONS ON} or {$ASSERTIONS OFF}
Default{$C+} {$ASSERTIONS ON}

Delphi: Simulating keyboard input

  • SendInput and keybd_event can be used to simulate keyboard inputs. (uses WinAPI.Windows)
  • MSDN states that keybd_event has been superseded by SendInput.
  • This github project helps to use SendInput: https://github.com/WladiD/SendInputHelper

Example code:

uses SendInputHelper;

var
  cc: Integer;
  SIH: TSendInputHelper;
begin
  SIH := TSendInputHelper.Create;
  try
    SIH.AddShift([ssWin], True, False);
    for cc := 1 to 20 do
    begin
      SIH.AddVirtualKey(VK_TAB, True, False);
      SIH.AddDelay(100);
    end;
    SIH.AddVirtualKey(VK_TAB, False, True);
    SIH.AddVirtualKey(VK_ESCAPE);
    SIH.AddShift([ssWin], False, True);
    SIH.Flush;
  finally
    SIH.Free;
  end;
end;

Tesseract OCR

Tesseract 3 vs Tesseract 4 vs Tesseract 5

  • Tesseract 4’s accuracy is better than a Tesseract 3.
  • Tesseract 4 uses deep learning model: Long Short-Term Memory (LSTM) neural network which is a kind of Recurrent Neural Network (RNN).
  • Starting from Tesseract 4, the recognition is done with neural networks (LSTM engine), so the font name is not available. The latest version that you can get the font name, is version 3, which uses the old engine.
  • Tesseract 5 brings faster performance with “fast floats” (Tesseract authors switching from double calculations to floats — which is said to come with the added bonus of needing less system memory)

Tesseract Tips

  • Latest official binary version for Windows is Tesseract 3.02. This includes the English training data.
  • Training data folder: tessdata
  • Increase OCR speed: tessdata_fast
  • Increase OCR accuracy: tessdata_best
  • UB-Mannheim Github page provides new binaries for Windows.
  • Tesseract promises to recognise more than 100 languages and supports a number
  • Output formats including plain text, HTML, and PDF.

Tesseract Alternatives