November 12, 2012

Playing with CLOB data in CUBRID

Hi.

One day, a colleague asked me how to update some CLOB data from the console (that being the CUBRID SQL command prompt)?
My answer was quick - "Why don't you use the CUBRID Manager client...?"
To my surprise, the next day my colleague came back: "Hey, I just can't figure this out - could you please gimme a hand?"...
OK - all said and done - time to work! :)

For the sake of simplicity, let's suppose we have a table (named "my_files"), with 2 columns:
- A "filename" (VARCHAR) column, which holds a file name
- A "content" column, which (obviously) stores the file content

Here is the SQL create table script:

CREATE TABLE "my_files"(
    "filename" character varying(256) NOT NULL,
    "content" clob,
    CONSTRAINT pk_my_files_filename PRIMARY KEY("filename")
); 


The problem is how to put the data (== the content of the file) into the "content" column?

OK, a quick search on www.cubrid.org will tell you that there is a very simple way - use the CLOB_FROM_FILE CUBRID function!
...and this is exactly what my colleague tried, but, here's the catch, it doesn't seem to work... :(





No matter what we tried, it just did not worked!! :(
(actually it looks to be a known bug - see http://www.cubrid.org/questions/359623)

Hmm, so what are the alternatives, if any...?

Well, the immediate solution is to use one of the many drivers CUBRID has (PHP, .NET, JDBC, Python etc.)!
...But what if you are not a programmer...?
...What if you don't wanna get caught up in compiling, debugging etc...?
...Is there no other way, really...?!

OK, so I become determined to find a solution using only plain SQL, no matter how complicated (and insane) it would be! :)
..at least, for the sake of the game! :)

The first thing to do was to figure out how exactly does CUBRID handle CLOB data?
The answer was simple - see http://www.cubrid.org/manual/90/en/Storing%20and%20Updating%20Columns.

Bottom line, LOB data (both BLOB and CLOB) is stored "outside" of the database in CUBRID, in external file(s).

So after all, it's just about some files and nothing else, right...?
Cool - now we have a plan! That is:
  1. Create some "fake"/start-up data in the table
  2. Find the "external" CUBRID file and update it manually with the new data
Let's do it!

...Let's put some "temp" data in the row:



...Let's get the file location used by CUBRID to store the data:



...Let's update manually the data in the file:



...Finally, let's confirm what we did:



Wow - that did not worked!! :(
...The new data is in the file, but CUBRID does not "see" it.... ...somehow, CUBRID seems to store the initial "length" and only shows you that much data in SELECTs...

BTW, I found a CUBRID bug - the CLOB_LENGTH function does now work ok from the CUBRID perspective, as the length reported is the "real" updated one, and not the length stored internally by CUBRID! ...So you end up with very confusing information:



I guess I will fill a Jira bug entry...

Ok, so what do we do now...? ...It seems pretty clear that we need to somehow handle the CLOB length in a way that CUBRID can "understand" it...
New plan:
  1. Create some "fake" data in the table (if not already in place)
  2. Get the length of the new data we want to put in the CLOB column! (take the length of the file containing the new data in bytes)
  3. Update the data in the CLOB column to some "fake" data which has exactly the desired length!
  4. Find the "external" CUBRID file and update it manually with the new data
Out of the the listed steps, I had only one concern - step 3.... ...how do you update the data in the column in an easy way, with the data having a desired length...?
In the end, it turned out to be quite simple - CUBRID has the REPEAT function! :) ...Step 3 become an UPDATE, using the REPLACE function to setup the desired new data length.

OK - Let's test all these:



And now everything works just great! :) ...We found a way to update CLOB data using just plain simple SQL commands! (and OS file editing)

Be aware! This is neither recommended neither supported! Don't do it at home work!!
What I showed you in this blog is only for fun!

As a side note, there are other things you might need to consider (unless you are a "mighty admin"):
  • Files editing permissions
  • Tables ownership/user rights

As always, let me know if there’s anything I could help with or anything unclear.
I hope you enjoyed the post – see you next time!

Bye-bye!
/Dee

P.S. I wonder what happens if the new data is shorter than the existing data... How does CUBRID handle that scenario...?
...What do you think...? Wanna try it...? ;)

November 10, 2012

Emulate long running SQL queries in CUBRID

Hi.

If you are here and reading this blog post, it’s most probably because you had (or have?) the need to get your hands on a long running query… (…and by “long-running query” I mean a SQL query which takes more than just a few seconds to execute – maybe minutes or more…)


Actually, any database programmer must deal with this, sooner or later, simply because you need to test how you application behaves when a database query execution takes like forever (== a long time)!
How will your application handle it…? It will gracefully wait for the query to complete…? It will crash your app…? It will hog your CPU…?


What’s the problem, after all?


So all it takes is to have a long running query and use it - that’s it, right…?
Well, this is much-much easier said than done! ;)  …Where is that “magical” long running universal soldier query…? How do you control, if possible, how much time its execution will take…? What does it take to make it work in CUBRID…?

OK, let’s start with a good old  Google search… …and another Google search
Hmm, not very promising… :(
Summarizing, it looks like we would have to look into:
-          Getting big data (millions of rows) in place
-          Doing heavy JOINs
-          Implement WHERE conditions which avoid indexes (SUBSTR etc.)
-          Use sub-queries
-          Use “slow” functions, if any available; see, for example, http://stackoverflow.com/questions/3892374/how-to-intentionally-create-a-long-running-mysql-query

Bottom line - not that plain simple at all…!

Searching for more, I found on SourceForge.net an interesting project named CUBRID Performance project, which actually does have some long running CUBRID queries:


 But it seems that I would have to go through some databases setups… …still not that as simple as I would like to…

OK, time to step back and clearly define what I want:
-          I want an easy way to get a long running query
-          I want an easy way to have a SQL (query) execution time controllable
-          I want NO CPU hogging
-          I want NO memory or any other resources hogging (so forget about big tables!)

How do we solve the problem?


As said before, it seems there are 2 main possible approaches:
-          Use query which runs on “big” data, and do “heavy” stuff in the database
-          Use procedures/functions which takes time to execute (some complex math stuff, for example)

Unfortunately, both of these approaches “proudly” fail my goals… :(
So I needed another way… …another solution…

…And suddenly, while thinking about it, the solution was right there waiting for me! :)
…Doesn’t CUBRID have support for Java stored procedures? Yes, it does!!
…Doesn’t CUBRID let you write Java code which you can execute from anywhere, via stored procedure/functions calls? Yes, it does!!
…Doesn’t this solve all my concerns…? Most probably, YES!!

Wow – That’s it! - Let’s do it!

Solution


I will not bother you with all the details of doing stored procedures in CUBRID – it’s all online
…And I will definitely not bother you with all my trying’s and failures – I’ll just show you my solution.

A few highlights:
-          I wanted to be able to run SQL queries on “top” of the timeouts, so I decided to go on with a function – timeout_str - which returns an empty string – and can be used in STRING/VARCHAR functions, like CONCAT etc. This way, I could do things like:

SELECT CONCAT(timeout_str(3000), code.s_name) FROM code;

-          For those times where an INT would be useful, I decided to implement a variant – timeout_int – which returns an Integer ( == timeout input parameter)

Here is the Java code – plain and simple:

/**
 * CUBRID utility stored procedures
 */
public class utils {
  /**
   * Timeout and return a string
   *
   * @param timeout Timeout values in ms.
   * @param ret     Returned string value
   * @return String
   */
  public static String timeout_str(Integer timeout, String ret) {
    try {
      Thread.sleep(timeout);
    } catch (Exception ex) {
      // do nothing
    }
    return ret;
  }
 
  /**
   * Timeout and return an integer value
   *
   * @param timeout Timeout values in ms.
   * @param ret     Returned integer value
   * @return Integer
   */
  public static Integer timeout_int(Integer timeout, Integer ret) {
    try {
      Thread.sleep(timeout);
    } catch (Exception ex) {
      // do nothing
    }
    return ret;
  }
}

Next, after compiling the Java code, I loaded the compiled class in CUBRID, using the loadjava tool:


Then, I created the 2 stored functions, using the CUBRID Manager GUI:
-          timeout_str
-          timeout_int



And finally, let’s have some fun – everything works just great! – see below:




Moreover, timeouts do add (as they are supposed to!):


All in all, pretty cool, right…?!

Of course, let me know if there’s anything else I could help with or anything unclear.
I hope you enjoyed the post – see you next time!

Bye-bye,
/Dee



P.S. You can download the latest CUBRID 9.0 beta release from here:

They say it’s 3x times faster…! …More reason to go on with my timeout solution – ha-ha-ha! :)

P.S.S. Let me know if you would be interested in a MySQL (similar) solution… …or maybe you already have one and you would like to share it…? :)

P.S.S. Let's register this on Technorati - 7CZ7ETN4TB89