Inspiring Ingenuity

Alteryx, Bicycles and Teaching Kids Programming.

Alteryx: Open Source YXDB

A few years back, we mentioned a open source YXDB reader/writer on LinkedIn.  After that, a whole lot of nothing.  It turns out that Alteryx did release the open source YXDB code, but it was so stealth that no one noticed.  This code is used inside of an R plugin, which had to be GPL’s because of R’s licence.  But since it was never published as a way to read/write YXDBs, no one noticed.

The thread on LinkedIn was recently revived, so I decided it was time to expose it to a bigger audience.

1st, the code: https://github.com/AlteryxNed/Open_AlteryxYXDB

So, what is it, and what isn’t it?

  • This is open source code – warts and all.  Most of this code is the exact code that Alteryx uses internally.
  • It is written in C++ – there are no bindings for other languages.  That said, it is open source, so there is no reason it couldn’t be exposed to other languages.
  • The project is for Visual Studio 2012, but I have also tested it in VS 2015
  • It has not been compiled on Linux, but it has been compiled with GCC as part of the R plugin on windows
  • YXDB files have an optional spatial index.  This code does not support that.  When reading, if the source file has a spatial index, it will skip over it and read properly, but it will not utilize the index.  Writing will not attempt to create one.
  • YXDBs support spatial objects, but this code doesn’t help you much with them.  Alteryx stores spatial objects internally as blobs in the SHP format.  If you know how to deal with that, you can get and set spatial objects.
  • This is a personal blog post by me as a person, not as CTO at Alteryx.  That means there is no official support from Alteryx the company.  I will of course do my best to answer questions if they come up, but there is no guarantees of how quickly I can respond.
  • The only documentation for now is what I wrote in the sample in the project.  See below for examples of reading and writing a file.

Writing a File


void WriteSampleFile(const wchar_t *pFile)
{
       // the RecordInfo structure defines a record for the YXDB file.
       SRC::RecordInfo 
recordInfoOut;
       // Before you can create a YXDB, you need to tell the RecordInfo what fields it will expect
       // Use CreateFieldXML to 
properly format the XML that describes a field
       recordInfoOut.AddField(SRC::RecordInfo::CreateFieldXml(L”Number”
SRC::E_FT_Double));
       recordInfoOut.AddField(SRC::RecordInfo::CreateFieldXml(L”English”
SRC::E_FT_V_String, 256));

       //Now that we have defined 
the fields, we can create the file.
       Alteryx::OpenYXDB::Open_AlteryxYXDB 
fileOut;
       fileOut.Create(pFile
recordInfoOut.GetRecordXmlMetaData());

// in order to add a record 
to the file, we need to create an empty record and then fill it in
       SRC::SmartPointerRefObj<SRC::Record
pRec = recordInfoOut.CreateRecord();       // just creating some random #’s to have some data to put into the file.
       std::mt19937 r;
       for (unsigned x = 0; x<100; ++x)        {
              // this is very important. ALWAYS reset the record to clear out all the variable length data
              // if you skip this step, it will appear to work, but the record will get larger and larger
              // generating an extremely inefficient YXDB – and eventually running out of memory.
              // this will not actually clear out the values though.  You still need to either call:
              //            SetFromXXX(…)
              //     or
              //            SetNull(…)
              // for every field in the record.
              pRec->Reset();
              int v = r();
              // the recordInfo object contains an array for FieldBase objects.
              // although the fields are strongly typed, the field wrapper will convert when needed –
              // which is to say it will accept setting the field as a different type than what it is writing                  // in this case we are setting the value as an integer, but the field is actually a float.
              // that is OK because the Field library will convert it for us.
              recordInfoOut[0]->SetFromInt32(pRec.Get(), v);
              recordInfoOut[1]->SetFromString(pRec.Get(), EnglishNumber(v));
              // now that we have filled out the record, we can add it to the file
              fileOut.AppendRecord(pRec->GetRecord());
       }       // calling Close is actually optional, since the destructor will call it for you.
       // but if you don’t explicitly call it, it will not be able to throw exceptions if the final

       // write fails for any reason
       fileOut.Close();
}

Reading a File


void ReadSampleFile(const wchar_t *pFile)
{
       Alteryx::OpenYXDB::Open_AlteryxYXDB file;
       file.Open(pFile);

       // you can ask about how many fields are in the file, what are there names and types, etc…
       for (unsigned x = 0; x < file.m_recordInfo.NumFields(); ++x)
       {
              if (x != 0)
                     std::cout << “,”;
              // the FieldBase object has all kinds of information about the field
              // it will also help us (later) get a specific value from a record
              const 
SRC::FieldBase * pField = file.m_recordInfo[x];
              std::cout << 
SRC::ConvertToAString(pField->GetFieldName().c_str());
}
       std::cout << “\n”;       // read 1 record at a time from the YXDB.  When the file as read past
       // the last record, ReadRecord will return nullptr
       // You could have alsocalled file.GetNumRecords() to know the total ahead of time
       while (const SRC::RecordData *pRec = file.ReadRecord())
       {
              // we now have a record (pRec) but it is an opaque structure
              // we need to use the FieldBase objects to get actual values from it.
              for (unsigned x = 0; x<file.m_recordInfo.NumFields(); ++x)
              {
                     // the recordInfo object acts like an array of FieldBase objects
                     const SRC::FieldBase * pField = file.m_recordInfo[x];
                     // binary fields are not implicitly convertable to strings
                     if (!IsBinary(pField->m_ft))
                     {
                           if (x != 0)
                                  std::cout << “,”;
                           //you could (and probably should) as for GetAsWString to get the unicode value
                           std::cout << pField->GetAsAString(pRec).value.pValue;
                     }
}
              std::cout << “\n”;
       }
}
Advertisements

Comments are closed.